1

I've found a rather weird issue while investigating the twisted world of character encoding. In Windows, if I type 'tree' the command works as expected but if I then type 'chcp 65001' (which is UTF-8) and then 'tree' again it breaks.

i.e.

> tree
> chcp 65001
> tree

enter image description here

This is in Windows 7, vanilla cmd, spanish language. Also, when redirecting output to a file the contents of it are the same before and after the chcp (full of "ÀÄÄÄa").

Some research showed that the encoding is OEM-850.

I know this looks like a superfluous question but when compiling programs (with gcc mostly) I have the same problem.

enter image description here

Switches /A and /U for cmd didn't help either.

enter image description here

1 Answer 1

0

This problem with non-ASCII input is reproducible in the console for all Windows versions up to and including Windows 10. The console host process, i.e. conhost.exe, wasn't designed for UTF-8 (codepage 65001) and hasn't been updated to support it consistently.

In particular, non-ASCII input causes an empty read, and an empty read is taken to be an end-of-file, so the reading of the input by the console stops, resulting in truncated output.

The /U switch of cmd.exe is also not useful, as it works only for internal commands. You might get better results from some applications by directing command output to a file, but the file won't have a UTF-8 Byte order mark (BOM).

In short, don't expect much from chcp 65001 and you won't be disappointed. The only Unicode version that works well in Windows is 16-bit Unicode.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .