I've just realized I have a file on my system; it lists normally:
$ ls -la TΕSТER.txt
-rw-r--r-- 1 user user 8 2013-04-11 18:07 TΕSТER.txt
$ cat TΕSТER.txt
testing
... yet, it crashes a piece of software with a UTF-8/Unicode related error. I was really puzzled, since I couldn't tell why such a file is a problem; and finally I remembered to check the output of ls
with hexdump
:
$ ls TΕSТER.txt
TΕSТER.txt
$ ls TΕSТER.txt | hexdump -C
00000000 54 ce 95 53 d0 a2 45 52 2e 74 78 74 0a |T..S..ER.txt.|
0000000d
... Well, obviously there are some bytes in between/instead of some letters, so I guess it is a Unicode encoding problem. And I can try to echo the bytes back to see what is printed:
$ echo -e "\x54\xCE\x95\x53\xD0\xA2\x45\x52\x2E\x74\x78\x74"
TΕSТER.txt
... but I still cannot tell which - if any - Unicode characters these are.
So is there a command line tool, which I can to inspect a string on the terminal, and get Unicode information about it's characters?