1

I would like to convert (an) ANSI-encoded text file(s) (Win 1252) to UTF8 without BOM, ideally via a command-line call. My use-case: I export .tex files from Stata which I want to compile with LuaLaTeX. Stata apparently does not support UTF8, LuaLaTeX does not support anything but and thus chokes on some non-ASCII characters. From inside Stata I can call shell commands, so it would be nice if I could do the conversion on the fly from within my Stata scripts.

So ideally I would like to be able to call a command like e.g. convert2UTF.cmd file.tex. Another good option would be some batch conversion of files within a folder (e.g. convert all files with *stata.tex). In addition it would be great if the solution would work with default Windows tools (minimum Win 7, even better XP).

Similar questions have been asked here before. The Cygwin/GnuWin32 approach is problematic since I would like to be able to convert without having to install extra software on a machine. The powershell approach looks promising, but apparently out-file -en utf8 saves the file with BOM.

Another powershell approach that seems to convert to UTF8 without BOM is

foreach($i in ls -recurse -filter "*.*") {
    if (
        $i.Extension.ToLower() -eq ".tex"
    ) {
        $MyFile = Get-Content $i.fullname 
        [System.IO.File]::WriteAllLines($i.fullname, $MyFile)
    }
}

Unfortunately I cannot figure out how to run it. I saved it as a powershell script into the same folder as the .tex files, but when I run it, it does not touch them. So apparently there is something missing. Needless to say that my powershell knowledge is close to nothing. Also, I would like to pass a filename as an argument, when calling it from Stata.

4
  • There may be a way to use the powershell approach and not write the BOM. See Using PowerShell to write a file in UTF-8 without the BOM.
    – martineau
    Commented Feb 27, 2013 at 13:04
  • @martineau: That is quite similar to the code I posted, right? But how do I run it? E.g. when I paste [System.IO.File]::WriteAllLines(out.tex, $MyFile) to the powershell, I get a ParserError...
    – dpprdan
    Commented Feb 27, 2013 at 23:26
  • Yes, it's similar, however without seeing the exact ParserError you're getting it's hard to say exactly what's wrong -- my guess is you're passing the wrong arguments to WriteAllLines or passing them in the wrong order.
    – martineau
    Commented Feb 27, 2013 at 23:59
  • @martineau: Ok, I post $MyFile = Get-Content in.tex followed by [System.IO.File]::WriteAllLines(out.tex, $MyFile) and I get CategoryInfo: ParserError: (CloseParenToken:TokenId) [], ParentContainsErrorRecordExceptionFullyQualifiedErrorId : MissingEndParenthesisInMethodCall
    – dpprdan
    Commented Feb 28, 2013 at 0:16

1 Answer 1

-1

I think you may try to use VBS script and ADODB.Stream object.

Google search: "vbs convert file ansi to utf-8"

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .