1

I have numerous MP3 (and perhaps files in other audio formats) whose meta-data tags (ID3v1 and/or ID3v2 for the case of MP3) include Hebrew characters in CP1255 charset (or ISO-8859-8i, essentially the same thing for our purposes). But - some tags are in UTF-8. I notice this when loading files in, say, Amarok - some show up as gibberish (UTF-8 decoding of CP1255), others properly.

I would like to convert all tags, at once, to UTF-8, assuming they're in CP1255 or ISO-8859-8i). How can I do this?

I'm running Debian GNU/Linux (version: Stretch). Command-line solutions are perfectly fine as are GUI-based ones.

2
  • Are the tags ID3v1 or ID3v2? Commented Apr 9, 2016 at 11:52
  • @grawity: I'm not sure they're all of one type, see edit.
    – einpoklum
    Commented Apr 9, 2016 at 11:52

2 Answers 2

3

On Ubuntu 20.04, make sure you have:

    apt-get install easytag
    apt-get install python3-mutagen

Then

    cd /Music
    find . -name "*.mp3" -print0 | xargs -0 mid3iconv -e windows-1255 -d

I used windows-1255 for Hebrew.

More info can be found here:

  1. https://mutagen.readthedocs.io/en/latest/man/mid3iconv.html
  2. https://help.ubuntu.com/community/ConvertingMP3Tags

Hope this helps the next guy

EDIT:

Running this command on a large music library crashed my system. Here is a script for breaking it down to smaller chunks.

import os
import re

path = "PATH/TO/MUSIC"
for subdir_obj in os.walk(path):
    subdir = subdir_obj[0]
    #Skip parent directory
    if subdir == path:
        continue
    #Escape path string
    escapePath = re.escape(subdir);
    #Add any charicters that were missed in the previuse command
    escapePath = escapePath.translate(str.maketrans({"'":  r"\'"}))

    #Test that all paths are reachable. RUN THIS FIRST
    #command = 'cd {}; pwd'.format(escapePath)

    #The encoding command
    #command = 'cd {}; find . -name "*.mp3" -print0 | xargs -0 mid3iconv -e windows-1255 -q'.format(escapePath)

    #Execute
    os.system(command)
2

Mutagen includes mid3iconv:

mid3iconv --dry-run --encoding=iso8859-8 foo.mp3

mid3iconv --dry-run --encoding=cp1255 bar.mp3

However, you'll probably have to individually specify which files to convert, as automatically detecting iso8859-* or cp125* in software is just guessing based on character frequencies.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .