0

On my Linux machine I found old files (at least from 2004 if not older), so possibly Win9x days. Maybe they came over some old FAT drive on my disk or some old Samba share.

Umlaute are very weirdly encoded. Examples:

$ ls -la
insgesamt 1862
drwx------ 10 user users      11 Feb 15  2006  .
drwx------ 11 user users      11 Dez  2  2004  ..
-rw-------  1 user users 1796429 Apr 13  2004 'Geb'$'\344''udeplan.jpg'
drwx------  2 user users      17 Feb 15  2006 'K'$'\374''che'

The names should be Gebäudeplan.jpg and Küche.

This does not seem to be ISO-8859-15, ANSI or similar. Indeed, the hex values seem to be greater than 256.

I have tried multiple options with convmv and detox but nothing seems to fit.

I would like to scan my entire harddisk for similar files and fix them (to UTF8).

4
  • 2
    (You should be surprised if greater or equal 256.) The numbers are less than 256. In $'\344' the number is in octal. \377 would be 255. Commented Dec 21, 2020 at 7:20
  • 1
    What exactly have you tried with convmv and detox? Please edit and be specific. Commented Dec 21, 2020 at 7:34
  • @KamilMaciorowski Good point, thanks! I think I forgot the -r flag. I used convmv -f iso-8859-15 -t utf8 . which only did the current directory node. So yes, it seems it is ISO-8859-15!
    – divB
    Commented Dec 21, 2020 at 7:42
  • 1
    I thought of and successfully tested with -f cp1250. Anyway convmv is the tool. Commented Dec 21, 2020 at 7:43

0

You must log in to answer this question.

Browse other questions tagged .