7

(question re-written to be more useful)

I have a batch script which will interact with command line programs, take their output, and then perform decisions based on that output.

One of the programs I need to interact with is a fairly old one, so I am stuck with it's quirks. When I pipe it's output to a text file, that text file is in the UTF-16 LE encoding.

Here's how I do that:

program -parameter > resultat.txt

Under Windows 7, this encoding seems to be troublesome for cmd/batch work, because you cannot read the contents of such a text file into a variable.

Here is an example, (this only uses the first line of the text file):

set /p Var=<resultat.txt
echo %Var%
cmd /k

It just echoes nothing, saying "ECHO is on".

Also, if you use "type" to print the contents of the text file, there is weird spacing, suggesting it's not properly being processed.

Attempted solution [1] - Powershell

After research, I found that powershell can convert txt encodings, using the following method:

Get-Content -Path "path\file.txt" | Out-File -FilePath "path\new_file.txt" -Encoding <encoding>

Using Notepad++, I did some research, what encoding do I need to attain?

UTF-8 (no BOM), which is equivalent to "ANSI" in Notepad, is the encoding I need, loading text files to variables, and the "type" command, both work flawlessly when this encoding is used. How do I know? If I open the piped text file in Notepad, and resave as "ANSI" encoding, everything works flawlessly.

-Encoding ascii

...Is the option which should have worked, as this produces a result in UTF-8 (no BOM), but it seems to be unable to handle UTF-16 LE source encoding format, and does not produce useable output. When I opened the resultant file in Notepad++ it identified it as UTF-16 LE "Unix", which was odd.

Funny enough: if I resave piped txt file as "unicode" in Notepad, this produces a UTF-16 LE BOM file, which works with the above conversion parameter to produce a perfect UTF-8 file. At this point, I extended my research to also ask the question "How can I add BOM to UTF-16 LE encoding?" As I could combine such knowledge with the powershell knowledge. However, spoiler alert: I was unsuccessful in finding a decent answer.

-Encoding utf8

...Is another similar option, but it produces a UTF-8 BOM file (the equivalent of saving as "UTF-8" in Notepad), this produces an output with corruption.

So to sum up:

I am looking for a command line tool/method (open or proprietary, 1st or 3rd party), to be able to achieve a convesion as follows:

  1. UTF-16 LE - Windows(CR LF) straight to UTF-8 - Windows(CR LF)

  2. UTF-16 LE - Windows(CR LF) to UTF-16 LE BOM - Windows(CR LF)

12
  • Does Converting text file to UTF-8 on Windows command prompt - Super User answer your question?
    – DavidPostill
    Commented May 29, 2023 at 17:49
  • Are you using the chcp command? Try chcp 437 (United States) to see if with it the program generates an ANSI file.
    – harrymc
    Commented May 29, 2023 at 17:49
  • @DavidPostill that result produces a UTF-8 BOM result which is not displayed properly and gives garbled cmd result. But thanks for the reply. "Set-Content" certainly looked different to "Out-File" which I demonstrated here, but it seems it does the same thing
    – bfh47
    Commented May 29, 2023 at 18:01
  • @harrymc I did briefly come across some solutions which used "chcp" however I didn't have luck using them, but from the sounds of it, maybe it deserves revisiting. Could you potentially provide a working example, or link one? I will do some research later today when I have time. if I run "chcp" it tells me I am using code page 866.
    – bfh47
    Commented May 29, 2023 at 18:06
  • Code page 866 is "DOS Cyrillic Russian", so there is no reason that it will generate UTF16. However, try putting the line chcp 437 before the command.
    – harrymc
    Commented May 29, 2023 at 18:09

4 Answers 4

2

The type command will work if the UTF16 file does not contain a BOM:

type utf16.txt >ascii.txt

But as in your case the generated file does have a BOM, a sure-fire method for converting the file uses PowerShell:

powershell "Get-Content 'utf16.txt' | Out-File 'ascii.txt' -Encoding ascii"

Notice the use of two types of quotes to avoid the need to escape the inner quotes.

8
  • Hello! Do you happen to know if this solution differs from the powershell method I mentioned in my original post? It's just, on the surface it looks very similar. Just to be clear (I tried to make sure my post was detailed, but I considered less the prospect of making that information easily understandable.. so I apologise), the file created by piping recycle.exe output to a txt file, is UTF-16 LE, without BOM. That is the format I need to convert from. When I ran powershell's get-content out-file, it was in a seperate PS1 file, your suggestion is to do it within batch without a "middle-file"?
    – bfh47
    Commented May 30, 2023 at 9:21
  • I'm only making suggestions, the decision is yours what to do.
    – harrymc
    Commented May 30, 2023 at 9:29
  • It's fair enough, no problem, I am just looking at this and it looks like something I've tried already, that's all. I am not at home currently so will only be able to try this later today. Just to clarify, you suggest running "powershell -command "command in quotes"" from cmd/batch - right? On an unrelated note - do you think it's worth me tidying up the question, and maybe opening it in stackoverflow? Is this "superuser" material or more advanced?
    – bfh47
    Commented May 30, 2023 at 9:40
  • Hey again - I found a solution: someone on stackoverflow had the exact same query many years ago, and they were recommended to use gnuwin32 ("iconv") program. I was able to successfully use this program via CMD to get the correct conversion. Would you be offended if I were to delete my thread, as I see it as absolutely redundant now...
    – bfh47
    Commented Jun 2, 2023 at 12:18
  • I won't be offended, but a thread on stackoverflow doesn't mean that your post is redundant. You should rather post here your own answer and mark it as accepted.
    – harrymc
    Commented Jun 2, 2023 at 12:29
2

Path of least resistance: use libiconv for Windows

After about a day of searching (back when the question was asked), I noticed that Stackoverflow had a tag called [utf16-le] and I decided it would be worth my time to go through all of the threads using this tag.

I found a solution which shows off a program called "iconv", and even the full command needed to carry out the conversion. Unlike the powershell method, you need to accurately specify input encoding as well as the output encoding, but also unlike the powershell method, it produces a good result.

Here is the helpful thread:

https://stackoverflow.com/questions/17287713/using-iconv-to-convert-from-utf-16le-to-utf-8

iconv is not a Windows utility, but it was apparently ported to Windows, and whilst the question linked above was asked with the [Linux] tag, one of the answers contained an example which is somehow entirely compatible with Windows:

iconv -f UTF-16LE -t UTF-8 infile > outfile

I downloaded the files from here:

https://sourceforge.net/projects/gnuwin32/files/libiconv/1.9.2-1/

I only needed the "bin" (binary) and "dep" (dependencies), extract the contents of both into the same folder, and you are good to go.

2
find /v "" sourcefile > destinationFile

this will read the contents of a sourcefile, and print any line that DOES NOT match "" (nothing) - thereby printing the contents of the entire file.

the find command seems to parse UTF-16 fine for me, and also happens to output plain ascii, so, your destination file will contain the same text as source, but will be ascii.

0

For the "add missing BOM" option: I don't have 7, but in 8.1 (or 10):

  • open notepad, don't enter anything, and save as Unicode (UTF16LE in 10); this creates a file containing only littleendian BOM

  • copy bomfile+bomless_utf16le newfile

The result works for me with type and powershell get-content.

But it's not as devious as Charles' find /v ""!

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .