3

On my Linux systems, I prefer the user interface to be in English. However, as a native speaker of German, I need spell checking to understand both English and German.

Yesterday I've learned that you can use the LANGUAGE environment variable to specify "a priority list of languages" for GNU gettext, which will then have precedence over LC_ALL and LANG. The Signal desktop app is using it to determine which spell checking dictionaries to use.

I have set my environment variables to

LANG=en_US.UTF-8
LANGUAGE=en_US:en:de_DE:de

and indeed, this allows me to have English and German spellings simultaneously in Signal.

However, now most of my system utilities speak German to me!

$ cat ''
cat: '': Datei oder Verzeichnis nicht gefunden

I have made several attempts to try to fix this:

  • for l in en{,_US}{,.utf8,.UTF-8}:de; do echo $l; LANGUAGE="$l" cat ''; done (i.e. different ways to specify the English locale) all speak German.
  • LANGUAGE=fi:de speaks Finnish, as do en:fi:de and :fi:de.
  • LANGUAGE=xy:en:de (invalid language first) speaks German.
  • LANGUAGE=C:fi:de uses English(!), while LANGUAGE=de:C:fi is German again.

Interestingly enough, the output of locale -a contains neither German nor Finnish:

$ locale -a
C
C.utf8
en_US.utf8
POSIX

LANG=C, as expected, disables LANGUAGE completely:

$ LANG=C cat ''
cat: '': No such file or directory

Now, I could simply write a wrapper that only sets LANGUAGE for Signal, and leave it unset in my normal environment, but this is more of a workaround and not an actual fix. What exactly am I doing wrong here? Why isn't en accepted? Is C really the way to go here?

This is a Debian 12 (bookworm) system. The output of locale is:

LANG=en_US.UTF-8
LANGUAGE=en_US:en:de_DE:de
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

1 Answer 1

1

Apparently what is happening here is that most software using gettext has no English language version at all. Instead, they use the English text as the "message ID" (msgid), i.e. the lookup key, and if the user is requesting a language that doesn't have a translation, these raw message IDs are being shown. Thus, the developers don't add an English translation at all and rely on gettext falling back to the message IDs to provide the English text.

As a result, as there is no English translation, the en_US:en part of my LANGUAGE setting does not match anything and the system falls back to German. Adding C after English makes it fall back to "untranslated" (which always exists) instead of German. However, Signal (and probably other Electron apps) doesn't understand C as meaning "English", and with a setting of C:de would only provide German spell checking, so I still need to keep en in it to make things work. I've settled on en_US:en:C:de_DE:de which now does the right thing both in gettext and in Signal.

Technically though, the C message IDs could be in another language than English, and apparently there is no way to tell gettext which language they're in. There is a GNU gettext bug report from 2018 about this already that has more details.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .