Skip to main content
I tried to remove unnecessary detail from the post, and make it easier to read/understand. It still describes accurately the original issue I had, without compromising
Source Link
bfh47
  • 103
  • 1
  • 1
  • 6

Convert UTF-16 LE to UTF-16 LE BOM or to UTF-8 in windows via command line

(question re-written to be more useful)

I am writinghave a batch script which pipeswill interact with command line programs, take their output, and then perform decisions based on that output.

One of the programs I need to interact with is a program'sfairly old one, so I am stuck with it's quirks. When I pipe it's output to a text file, the contents of that text file are then later meant to be read back intois in the script as a variable, which will be used for further decision makingUTF-16 LE encoding.

whenHere's how I pipe a program's output to a text file using cmd, like sodo that:


 program -parameter > resultat.txt
 

The resultant file (according to notepad++) is "UTF-16 Little Endian"Under Windows 7, accordingthis encoding seems to notepad, it is "Unicode".

The problem:

If I use the "type" command inbe troublesome for cmd to show/batch work, because you cannot read the contents of thissuch a text file, it comes out looking weird, with extra spaces between every letter of every word into a variable.

If I try to makeHere is an example, (this only uses the contentsfirst line of thisthe text file a variable, for use in a batch script, like so):


 set /p Var=<resultat.txt
echo %Var%
cmd /k
 

(This above 3 line script is my "test" script that I used later Also, if you use "type" to evaluate successprint the contents of different conversion attempts!)the text file, there is weird spacing, suggesting it's not properly being processed.

Attempted solution [1]Attempted solution [1] - Powershell

After research, I found athat powershell method for convertingcan convert txt encodings here is what it looks like, using the following method:


 Get-Content -Path "path\file.txt" | Out-File -FilePath "path\new_file.txt" -Encoding <encoding>
 

Here isUsing Notepad++, I did some research, what encoding do I have testedneed to attain?

UTF-8 (the results referno BOM), which is equivalent to test script output"ANSI" in Notepad, is the encoding I need, loading text files to variables, and the brackets show what notepad++ identified"type" command, both work flawlessly when this encoding is used. How do I know? If I open the piped text file in Notepad, and resave as "ANSI" encoding, everything works flawlessly.

-Encoding ascii

...Is the option which should have worked, as this produces a result in UTF-8 (no BOM):, but it seems to be unable to handle UTF-16 LE source encoding format, and does not produce useable output. When I opened the resultant file in Notepad++ it identified it as UTF-16 LE "Unix", which was odd.

cmd pipeFunny enough: if I resave piped txt file as "unicode" in Notepad, this produces a UTF-16 LE BOM file, which works with the above conversion parameter to textproduce a perfect UTF-8 file (notepad++ says. At this point, I extended my research to also ask the question "How can I add BOM to UTF-16 LE encoding?" As I could combine such knowledge with the powershell knowledge. However, notepad says unicodespoiler alert: I was unsuccessful in finding a decent answer.

-Encoding utf8

...Is another similar option, but it produces a UTF-8 BOM file (the equivalent of saving as "UTF-8" in Notepad), this produces an output with corruption.

So to sum up:

I am looking for a command line tool/method (open or proprietary, 1st or 3rd party), to be able to achieve a convesion as follows:

  1. by itself, unchanged: fails the "type" and "store contents as var" tests, (no var is stored)

  2. when resaved as "unicode" in notepad (UTFUTF-16 LE BOM): garbled cmd output

  3. when resaved as "ANSI" in notepad (UTF-8 Windows(CR LF): works flawlessly

  4. when resaved as "UTF straight to UTF-8" in notepad8 (UTF-8 BOM Windows(CR LF): garble + correct output

  5. piped txt conv. to "ascii" using PS (UTFUTF-16 LE "Unix"): no var is stored in cmd

  6. piped txt conv. to "utf8" using PS (UTF-8 BOM): corrupted output

  7. notepad "unicode" resaved txt conv. to "ascii" using PS Windows(UTF-8CR LF): works flawlessly

  8. notepad "unicode" resaved txt conv. to "utf8" using PS (UTFUTF-816 LE BOM - Windows(CR LF): garble + correct output

I am not an expert at how text file encoding works... but from the results of above testing, it seems to me that I need a method to convert either:

  1. UTF-16 LE - Windows(CR LF) straight to UTF-8 - Windows(CR LF)
  2. UTF-16 LE - Windows(CR LF) to UTF-16 LE BOM - Windows(CR LF)

Is there any such way to do this, using a command line, or powershell tool? I don't worry too much if it's 3rd party. I did evaluate some other solutions but of course I had to avoid anything that was either recommending to use a GUI application, or that wasn't specific to windows.

...Or there's a chance I am going about this the wrong way? I tried to provide some context to what I am doing, so maybe there is a smarter solution.

Any help whatsoever or criticism is much appreciated.

Thanks in advance!

Convert UTF-16 LE to UTF-16 LE BOM or to UTF-8 in windows via command line

I am writing a batch script which pipes a program's output to a text file, the contents of that text file are then later meant to be read back into the script as a variable, which will be used for further decision making.

when I pipe a program's output to a text file using cmd, like so:


 program -parameter > resultat.txt
 

The resultant file (according to notepad++) is "UTF-16 Little Endian", according to notepad, it is "Unicode".

The problem:

If I use the "type" command in cmd to show the contents of this text file, it comes out looking weird, with extra spaces between every letter of every word.

If I try to make the contents of this text file a variable, for use in a batch script, like so:


 set /p Var=<resultat.txt
echo %Var%
cmd /k
 

(This above 3 line script is my "test" script that I used later to evaluate success of different conversion attempts!)

Attempted solution [1]

I found a powershell method for converting txt encodings here is what it looks like:


 Get-Content -Path "path\file.txt" | Out-File -FilePath "path\new_file.txt" -Encoding <encoding>
 

Here is what I have tested (the results refer to test script output, and the brackets show what notepad++ identified the file as):

cmd pipe to text file (notepad++ says UTF-16 LE, notepad says unicode):

  1. by itself, unchanged: fails the "type" and "store contents as var" tests, (no var is stored)

  2. when resaved as "unicode" in notepad (UTF-16 LE BOM): garbled cmd output

  3. when resaved as "ANSI" in notepad (UTF-8): works flawlessly

  4. when resaved as "UTF-8" in notepad (UTF-8 BOM): garble + correct output

  5. piped txt conv. to "ascii" using PS (UTF-16 LE "Unix"): no var is stored in cmd

  6. piped txt conv. to "utf8" using PS (UTF-8 BOM): corrupted output

  7. notepad "unicode" resaved txt conv. to "ascii" using PS (UTF-8): works flawlessly

  8. notepad "unicode" resaved txt conv. to "utf8" using PS (UTF-8 BOM): garble + correct output

I am not an expert at how text file encoding works... but from the results of above testing, it seems to me that I need a method to convert either:

  1. UTF-16 LE - Windows(CR LF) straight to UTF-8 - Windows(CR LF)
  2. UTF-16 LE - Windows(CR LF) to UTF-16 LE BOM - Windows(CR LF)

Is there any such way to do this, using a command line, or powershell tool? I don't worry too much if it's 3rd party. I did evaluate some other solutions but of course I had to avoid anything that was either recommending to use a GUI application, or that wasn't specific to windows.

...Or there's a chance I am going about this the wrong way? I tried to provide some context to what I am doing, so maybe there is a smarter solution.

Any help whatsoever or criticism is much appreciated.

Thanks in advance!

Convert UTF-16 LE to UTF-8 in windows via command line

(question re-written to be more useful)

I have a batch script which will interact with command line programs, take their output, and then perform decisions based on that output.

One of the programs I need to interact with is a fairly old one, so I am stuck with it's quirks. When I pipe it's output to a text file, that text file is in the UTF-16 LE encoding.

Here's how I do that:

program -parameter > resultat.txt

Under Windows 7, this encoding seems to be troublesome for cmd/batch work, because you cannot read the contents of such a text file into a variable.

Here is an example, (this only uses the first line of the text file):

set /p Var=<resultat.txt
echo %Var%
cmd /k

Also, if you use "type" to print the contents of the text file, there is weird spacing, suggesting it's not properly being processed.

Attempted solution [1] - Powershell

After research, I found that powershell can convert txt encodings, using the following method:

Get-Content -Path "path\file.txt" | Out-File -FilePath "path\new_file.txt" -Encoding <encoding>

Using Notepad++, I did some research, what encoding do I need to attain?

UTF-8 (no BOM), which is equivalent to "ANSI" in Notepad, is the encoding I need, loading text files to variables, and the "type" command, both work flawlessly when this encoding is used. How do I know? If I open the piped text file in Notepad, and resave as "ANSI" encoding, everything works flawlessly.

-Encoding ascii

...Is the option which should have worked, as this produces a result in UTF-8 (no BOM), but it seems to be unable to handle UTF-16 LE source encoding format, and does not produce useable output. When I opened the resultant file in Notepad++ it identified it as UTF-16 LE "Unix", which was odd.

Funny enough: if I resave piped txt file as "unicode" in Notepad, this produces a UTF-16 LE BOM file, which works with the above conversion parameter to produce a perfect UTF-8 file. At this point, I extended my research to also ask the question "How can I add BOM to UTF-16 LE encoding?" As I could combine such knowledge with the powershell knowledge. However, spoiler alert: I was unsuccessful in finding a decent answer.

-Encoding utf8

...Is another similar option, but it produces a UTF-8 BOM file (the equivalent of saving as "UTF-8" in Notepad), this produces an output with corruption.

So to sum up:

I am looking for a command line tool/method (open or proprietary, 1st or 3rd party), to be able to achieve a convesion as follows:

  1. UTF-16 LE - Windows(CR LF) straight to UTF-8 - Windows(CR LF)

  2. UTF-16 LE - Windows(CR LF) to UTF-16 LE BOM - Windows(CR LF)

Source Link
bfh47
  • 103
  • 1
  • 1
  • 6

Convert UTF-16 LE to UTF-16 LE BOM or to UTF-8 in windows via command line

I am writing a batch script which pipes a program's output to a text file, the contents of that text file are then later meant to be read back into the script as a variable, which will be used for further decision making.

when I pipe a program's output to a text file using cmd, like so:


program -parameter > resultat.txt

The resultant file (according to notepad++) is "UTF-16 Little Endian", according to notepad, it is "Unicode".

The problem:

If I use the "type" command in cmd to show the contents of this text file, it comes out looking weird, with extra spaces between every letter of every word.

If I try to make the contents of this text file a variable, for use in a batch script, like so:


set /p Var=<resultat.txt
echo %Var%
cmd /k

It just echoes nothing, saying "ECHO is on".

(This above 3 line script is my "test" script that I used later to evaluate success of different conversion attempts!)

Attempted solution [1]

I found a powershell method for converting txt encodings here is what it looks like:


Get-Content -Path "path\file.txt" | Out-File -FilePath "path\new_file.txt" -Encoding <encoding>

Here is what I have tested (the results refer to test script output, and the brackets show what notepad++ identified the file as):

cmd pipe to text file (notepad++ says UTF-16 LE, notepad says unicode):

  1. by itself, unchanged: fails the "type" and "store contents as var" tests, (no var is stored)

  2. when resaved as "unicode" in notepad (UTF-16 LE BOM): garbled cmd output

  3. when resaved as "ANSI" in notepad (UTF-8): works flawlessly

  4. when resaved as "UTF-8" in notepad (UTF-8 BOM): garble + correct output

  5. piped txt conv. to "ascii" using PS (UTF-16 LE "Unix"): no var is stored in cmd

  6. piped txt conv. to "utf8" using PS (UTF-8 BOM): corrupted output

  7. notepad "unicode" resaved txt conv. to "ascii" using PS (UTF-8): works flawlessly

  8. notepad "unicode" resaved txt conv. to "utf8" using PS (UTF-8 BOM): garble + correct output

I am not an expert at how text file encoding works... but from the results of above testing, it seems to me that I need a method to convert either:

  1. UTF-16 LE - Windows(CR LF) straight to UTF-8 - Windows(CR LF)
  2. UTF-16 LE - Windows(CR LF) to UTF-16 LE BOM - Windows(CR LF)

Is there any such way to do this, using a command line, or powershell tool? I don't worry too much if it's 3rd party. I did evaluate some other solutions but of course I had to avoid anything that was either recommending to use a GUI application, or that wasn't specific to windows.

...Or there's a chance I am going about this the wrong way? I tried to provide some context to what I am doing, so maybe there is a smarter solution.

Any help whatsoever or criticism is much appreciated.

Thanks in advance!