I just installed some Arch Linux packages, which dumped this file onto my disk:
/etc/ssl/certs/EBG_Elektronik_Sertifika_Hizmet_Sağlayıcısı.pem
Note that the file name seems to contain Turkish characters. Here are different commands with their output:
> cd /etc/ssl/certs
> echo EBG*
EBG_Elektronik_Sertifika_Hizmet_Sağlayıcısı.pem
> ls -al EBG*
lrwxrwxrwx 1 root root 86 Nov 3 22:27 EBG_Elektronik_Sertifika_Hizmet_Sa??lay??c??s??.pem -> /usr/share/ca-certificates/mozilla/EBG_Elektronik_Sertifika_Hizmet_Sa??lay??c??s??.crt
Q1: Why do echo
and ls
produce different output?
So it's a symlink. If I dereference it:
> ls -alL EBG*
-rw-r--r-- 1 root root 2106 Sep 24 22:52 EBG_Elektronik_Sertifika_Hizmet_Sa??lay??c??s??.pem
Let's look at the target:
> cd /usr/share/ca-certificates/mozilla
> echo EBG*
EBG_Elektronik_Sertifika_Hizmet_Sağlayıcısı.crt
> ls -al EBG*
-rw-r--r-- 1 root root 2106 Sep 24 22:52 EBG_Elektronik_Sertifika_Hizmet_Sa??lay??c??s??.crt
Q2: What is the encoding used for non-ASCII characters in a Linux file system (here: ext4)? Am I correct that the encoding is not captured anywhere, and if I give you some random hard drive without instructions, you need to guess which encoding I used?
I noticed there was a problem because pacman
(the Arch Linux package manager) seemed to get confused about whether or not it had installed that file:
Q3: How do I prevent pacman
, or ls
, or anything else from getting confused about files like that? What if next week, some file is arabic or hebrew instead of Turkish?