3

When I open one of my text files in Visual Studio Code, the text contains a lot of question marks where I had expected to see Swedish letters, such as å, ä, ö :

My text file is riddled with question marks.

^ click to enlarge

Down to the right (in the status bar of VS Code), I've noticed that it says UTF-8.
Is this somehow related to the problems I'm facing?


How can I make all these letters appear correctly?


As a side note, when I open the same file in plain old Windows Notepad, the text displays correctly :

My text file displays correctly in Notepad.

In this case, instead of UTF-8, the status bar says ANSI at the bottom right.

But in VS Code, even if I click on UTF-8, and then on Reopen with Encoding, I cannot find any encoding by the name ANSI.

VS Code: UTF-8 > Reopen with Encoding > there is no ANSI!

In case you want to reproduce the behavior with the exact file I've been using, here it is.

References

1 Answer 1

9

How can I make all these letters appear correctly?

I can think of two options :

  1. Convert the file to UTF-8. – This is what I recommend.
  2. Configure VS Code to auto-detect the most proper encoding.

The second option is preferable if you never want to change the encoding of any files.

Option 1. Convert the file to UTF-8

The acronym ANSI stands for American National Standards Institute.

The problem with ANSI encoding is that – although the name suggests that it's following a standard – it is conditional on what natural language the text is written in.
In the case of Western European (Latin) languages, "ANSI" encoding means the code page Windows-1252. 1

1a. Make VS Code use the correct encoding

In VS Code, instead of looking for ANSI encoding, look for Windows-1252.
I clicked UTF-8 > Reopen with Encoding, and VS Code displayed
"Western (Windows 1252) Guessed from content" as its top suggestion.

VS Code correctly guesses the encoding is Windows 1252.

^ click to enlarge

VS Code correctly guessed the encoding Windows 1252.
If you don't want to change the encoding, you're now all set and done.

Otherwise, it remains to convert the file to UTF-8 encoding.

1b. Convert to UTF-8

The status bar now displays Windows 1252 instead of UTF-8.
Click on Windows 1252 and then on Save with Encoding :

Click Windows 1252 > Save with Encoding.

Now click on "UTF-8 utf8" :

Click on UTF-8.

This converts the file's non-ASCII characters to UTF-8 and encodes the file as UTF-8.

Option 2. Configure VS Code to auto-detect the encoding

If you don't want to convert to UTF-8, and if you experience this problem every time you open another file – you may prefer to set VS Code to always auto-guess the encoding.

To achieve this, you need to enable the Auto Guess Encoding feature of VS Code.
Press Ctrl+, 2 (comma) and paste or type autoGuessEncoding. Check the box where it says :
"When enabled, the editor will attempt to guess the character set encoding when opening files. This setting can also be configured per language. Note, this setting is not respected by text search. Only Files: Encoding is respected." 3

Check the box to make VS Code to auto-guess the encoding.

3. The confusion about what "ANSI" encoding means

Searching the internet to find out what "ANSI" means in the context of encoding may cause confusion.
You might encounter that ANSI is "a misnomer", which is true but, not of much practical help.

What clears up the confusion is to realize that when Microsoft writes "ANSI" in the status bar of notepad.exe, it typically means Windows-1252. For natural languages other than Western European, see the table below.
Other well-known text editors, such as Notepad++, have adopted this convention and also write "ANSI" in the status bar.

Windows-1252 is sometimes called code page 1252 or CP-1252. Likewise for the other code pages.

ANSI encoding Language/Alphabet
Windows-1250 Slavic languages – Latin alphabet (e.g. Polish)
Windows-1251 Slavic languages – Cyrillic alphabet (e.g. Ukrainian)
Windows-1252 Western European languages (French, German, Scandinavian, Spanish, Swahili …)
Windows-1253 Greek
Windows-1254 Turkish, Latin Azeri, and Latin Uzbek
Windows-1255 Hebrew
Windows-1256 Arabic, Farsi, Urdu
Windows-1257 Baltic languages: Estonian, Latvian, Lithuanian
Windows-1258 Vietnamese
Windows-1270 Sami languages

References


1 For a list of what "ANSI" could mean, see the table in Section 3.

2 On macOS, press instead of Ctrl. For Linux users, "ANSI" typically means Windows-1252 – just as on Windows.
For macOS users, try to see what VS Code suggests as Guessed from content.
Or else have a look at Macintosh emulation code pages at Wikipedia.

3 See the default settings in VS Code.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .