Inspired by this question, can I use the iconv
command to generate UTF-16 output with a BOM and with specified endianness?
The iconv
command converts text from one encoding to another.
For example:
echo hello | iconv -f ascii -t utf-16
generates a UTF-16 representation of "hello\n"
.
UTF-16 files often, but not always, start with a Byte Order Mark (BOM), which is a 2-byte encoding of the Unicode character U+FEFF
. You can determine the endianness of a UTF-16 file with BOM by checking whether the first two bytes are FE FF
or FF FE
.
The iconv
command has several options for generating UTF-16 output:
$ iconv --list | grep -i utf-16
UTF-16//
UTF-16BE//
UTF-16LE//
This command:
echo hello | iconv -f ascii -t utf-16be
generates big-endian UTF-16 with no BOM; it seems to assume that if you specified the endianness, you don't need to indicate it in the output. Similarly, utf-16le
generates little-endian UTF-16 with no BOM.
This:
echo hello | iconv -f ascii -t utf-16
generates (on my x86 Ubuntu system) little-endian UTF-16 with a BOM -- but I've seen a report of a similar command generating big-endian UTF-16 with a BOM, even on a little-endian system.
I can always use utf-16be
or utf-16le
and prepend the BOM manually, but I'm looking for a solution that just uses the iconv
command.
Another workaround, if you know what endianness -t utf-16
generates, is:
echo hello | iconv -f ascii -t utf-16 | dd conv=swab 2>/dev/null
What I'd like to use is something like:
iconv -f ascii -t utf-16bebom # big-endian with BOM
iconv -f ascii -t utf-16lebom # little-endian with BOM
but iconv
doesn't support that.
EDIT :
Can someone with access to an x86 Mac OSX system post a comment showing the (copy-and-pasted) output of the following command?
echo hello | iconv -f ascii -t utf-16 | od -x
iconv
-- and wondering why-t utf-16
seems to leave the endianness unpecified.iconv -f UTF-8 -t UTF-16
, run on a little-endian system (MacOS), generating big-endian UTF-16 with a BOM, which seems very odd.