2

I use VS Code to make a German site. I use a German special character in a style.css file. After restarting VS Code and changing the file encoding from UTF-8 to Windows-1252, I get what is shown in the image below.
My Auto Guess Encoding is unchecked and the default encoding is UTF-8.
How can I stop the auto-change encoding? My VS Code version of 1.32.3 and I use windows 10.

screenshot_showing_character_and_encoding

2
  • Did you check that the file is actually using UTF-8 encoding?
    – Seth
    Commented Apr 1, 2019 at 9:29
  • yes, I checked and files.autoGuessEncoding is false. If i change encoding UTF-8 or open a new file which encoding UTF-8 and write German Language and restarted my vs editor it change form UTF-8 to windows 1252. but if i do not use German special character then its not be change. @Seth Commented Apr 3, 2019 at 6:55

1 Answer 1

5

How can I stop the auto-change encoding?

– According to your own comment, the Auto Guess Encoding is already off.
The fact that VS Code encodes your file as Windows-1252 (code page 1252 or CP1252)
calls for some other explanation.

By assuming that you have a VS Code setting that specifically decodes your CSS files
as being Windows-1252, I've been able to reproduce your situation very accurately. 1

1. Reproducing the whole scenario

I use a simplified version of your style.css, containing just a single line :

/* Ü */

To make VS Code open the file with encoding Windows-1252 (with Auto Guess Encoding off),
I assume that the VS Code settings.json contains the following code/line : 2

"[css]": {"files.encoding": "windows1252"},

Such a setting will make VS Code encode all .css files as Windows-1252. 3

If you download style.css, then right-click it and Open with Code, expect to see :

With encoding Windows-1252, Ü is shown as Ãœ.

^ click to enlarge

The reason you see two Windows-1252 characters – Ãœ – instead of the single UTF-8 Ü character, is that Windows-1252 reads each byte as a single character – the non-ASCII characters à and œ.
UTF-8 on the other hand uses two bytes to read a single non-ASCII character like Ü. 4

1. a. How to display Ü correctly

To make the German letter Ü appear correctly, you need to click :
Reopen with Encoding > UTF-8 Guessed from content.

'Reopen with Encoding' changes how the file is decoded.

Choosing Reopen with Encoding doesn't change the file itself.
It changes how the file is displayed in VS Code – how it is decoded.

1. b. What you should not do

You'll get a problem if you instead click :
Save with Encoding > UTF-8 Guessed from content.

Save with Encoding > **UTF-8 changes the file itself.

This does change the file – all non-ASCII characters get converted to their corresponding UTF-8 characters. If you save the file, it is saved with these changes.

When you now close and reopen style.css, it will again be encoded as Windows-1252.
(Why? – Because that's exactly what the line "[css]": {"files.encoding": "windows1252"}, in settings.json is telling VS Code!)

Here is what you'll see.

Save with Encoding > **UTF-8 changes the file itself.

Note how Ü are the same characters as those displayed in the screenshot of your question.

The reason you now see four characters instead of two is the same as before.
– The single UTF-8 character à (2 bytes) is displayed as the two characters à (still 2 bytes) when decoded with Windows-1252.
And the single UTF-8 character œ is displayed as the two Windows-1252 characters Å“.

This completes my reproduction of your scenario.

2. How to repair the corrupted file

Given that you want to display Ü and not the corrupted Ãœ, you need to : \

  1. convert the file back,
  2. encode with UTF-8,
  3. close and reopen the file.

1. Convert the file back

Here is how to convert the corrupted style.css back to its original state.
Starting from the previous screenshot, in the status bar, click Windows 1252,
then Reopen with Encoding, and finally UTF-8.

Windows 1252 > Reopen with Encoding > UTF-8.

Expect to see Ü. The file is still corrupted, so now convert it to Windows-1252 by clicking :
UTF-8 > Save with Encoding > Windows 1252.

UTF-8 > Save with Encoding > Windows 1252.

The file has now been converted back to its original state.
What remains is to decode it correctly (with UTF-8).

2. Encode with UTF-8

In settings.json, delete "[css]": {"files.encoding": "windows1252"},.

3. Close and reopen the file

Close and reopen style.css. Check that you see UTF-8 in the status bar. Expect to see :

The corrupted file has been restored.

Yay! Mission accomplished.

3. Encoding vs converting in Notepad++

To better understand the difference between decoding/encoding and converting a file, it might help to see how this is done in another versatile text editor: Notepad++.
This helpful answer explains the difference in an instructive picture :

The difference between Encoding and Converting in Notepad++.

Encoding in Notepad++ corresponds to Reopen with Encoding in VS Code, whereas
Converting in Notepad++ corresponds to Save with Encoding in VS Code.

4. ASCII, ANSI, and UTF-8

A few facts may help the understanding of what ASCII, ANSI, and UTF-8 are.

  • An ASCII character uses just a single byte.
    Or if you will, it uses seven of the eight bits of a byte – the most significant bit is always zero.
    This corresponds to 0-127 in decimal numbers, 0x00-0x7F in hex numbers,
    and 0000 0000 - 0111 1111 in bits.

  • Both ANSI/Windows-1252 and UTF-8 encode an ASCII character as the ASCII character itself.
    For example, the character (letter) k is a pure ASCII character. This is one byte (eight bits) which has the decimal number 107, the hex number is 0x6B, in bits 0110 1011.
    As a consequence, it's wrong to say that the ASCII character k is not an ANSI character, nor that it's not a UTF-8 character. – It's both!
    If a text file contains only ASCII characters, then the ANSI and UTF-8 encodings coincide.
    You cannot tell one apart from the other. Such a file is both ANSI and UTF-8 encoded. 5

The windows-1252 (CP-1252) encoding table.

^ click to enlarge

The upper half of the Windows-1252 table above corresponds to numbers 0-127, and the lower half to numbers 128-255. The latter are the non-ASCII ANSI characters of Windows-1252.


The picture below is taken from UTF-8 and ASCII character charts,
and displays all those Windows-1252 characters once more, numbered 128-255.

The windows-1252 (CP-1252) non-ASCII characters.


If you want to know how many bytes (and what bytes) a UTF-8 character uses, try this online tool.

References


1 I think the scenario I present plausibly describes what might have happened.
Of course, I cannot know for sure what caused your situation.

2 To open settings.json, press Ctrl + , (comma), and then click the Open Settings icon in the top right corner :

Open Settings (JSON)

On macOS, use instead of Ctrl.

3 The term “ANSI” as used to signify Windows code pages is a historical reference […].
Microsoft still uses ANSI for Western Europe interchangeably with Windows-1252, for example in their notepad.exe text editor, typically located at C:\WINDOWS\System32. This is the convention I follow as well. See also this answer.

4 To be more precise, each non-ASCII UTF-8 character uses at least two (up to four) bytes.

5 Suppose you have a text file containing only pure ASCII characters. If you open that file in some text editor, and the status bar says ANSI, that doesn't mean the file is not UTF-8 encoded. It just means that this text editor uses ANSI as its default encoding. If the default encoding were UTF-8, the editor would display UTF-8 in the status bar for the same file.

0

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .