I use VS Code to make a German site.
I use a German special character in a style.css
file.
After restarting VS Code and changing the file encoding from UTF-8 to
Windows-1252, I get what is shown in the image below.
My Auto Guess Encoding is unchecked and the default encoding is UTF-8.
How can I stop the auto-change encoding?
My VS Code version of 1.32.3 and I use windows 10.
-
Did you check that the file is actually using UTF-8 encoding?– SethCommented Apr 1, 2019 at 9:29
-
yes, I checked and files.autoGuessEncoding is false. If i change encoding UTF-8 or open a new file which encoding UTF-8 and write German Language and restarted my vs editor it change form UTF-8 to windows 1252. but if i do not use German special character then its not be change. @Seth– Md. Mehedi HassanCommented Apr 3, 2019 at 6:55
1 Answer
How can I stop the auto-change encoding?
– According to your own
comment,
the Auto Guess Encoding is already off.
The fact that VS Code encodes your file as Windows-1252
(code page 1252 or CP1252)
calls for some other explanation.
By assuming that you have a VS Code setting that specifically decodes
your CSS files
as being Windows-1252,
I've been able to reproduce your situation very accurately.
1
1. Reproducing the whole scenario
I use a simplified version of your style.css
,
containing just a single line :
/* Ü */
To make VS Code open the file with encoding Windows-1252
(with Auto Guess Encoding off),
I assume that the VS Code settings.json
contains
the following code/line :
2
"[css]": {"files.encoding": "windows1252"},
Such a setting will make VS Code encode all .css
files as
Windows-1252.
3
If you download style.css
, then right-click it and
Open with Code, expect to see :
^ click to enlarge
The reason you see two Windows-1252 characters – Ãœ
– instead
of the single UTF-8 Ü
character, is that Windows-1252
reads each byte as a single character – the non-ASCII characters
Ã
and œ
.
UTF-8 on the other hand uses two bytes to read a single
non-ASCII character like Ü
.
4
1. a. How to display Ü
correctly
To make the German letter Ü
appear correctly, you need to click :
Reopen with Encoding > UTF-8 Guessed from content.
Choosing Reopen with Encoding doesn't change the file itself.
It changes how the file is displayed in VS Code – how it is
decoded.
1. b. What you should not do
You'll get a problem if you instead click :
Save with Encoding > UTF-8 Guessed from content.
This does change the file – all non-ASCII characters get converted to their corresponding UTF-8 characters. If you save the file, it is saved with these changes.
When you now close and reopen style.css
,
it will again be encoded as Windows-1252.
(Why? – Because that's exactly what the line
"[css]": {"files.encoding": "windows1252"},
in settings.json
is telling VS Code!)
Here is what you'll see.
Note how Ü
are the same characters as those displayed in
the screenshot of your question.
The reason you now see four characters instead of two is the same
as before.
– The single UTF-8 character Ã
(2 bytes) is displayed as the
two characters Ã
(still 2 bytes) when decoded with
Windows-1252.
And the single UTF-8 character œ
is displayed as the two
Windows-1252 characters Å“
.
This completes my reproduction of your scenario.
2. How to repair the corrupted file
Given that you want to display Ü
and not the corrupted Ü
,
you need to : \
- convert the file back,
- encode with UTF-8,
- close and reopen the file.
1. Convert the file back
Here is how to convert the corrupted style.css
back to its original
state.
Starting from the previous screenshot, in the status bar,
click Windows 1252,
then Reopen with Encoding, and finally UTF-8.
Expect to see Ü
.
The file is still corrupted, so now convert it to Windows-1252
by clicking :
UTF-8 > Save with Encoding > Windows 1252.
The file has now been converted back to its original state.
What remains is to decode it correctly (with UTF-8).
2. Encode with UTF-8
In settings.json
, delete
"[css]": {"files.encoding": "windows1252"},
.
3. Close and reopen the file
Close and reopen style.css
.
Check that you see UTF-8 in the status bar.
Expect to see :
Yay! Mission accomplished.
3. Encoding vs converting in Notepad++
To better understand the difference between decoding/encoding and
converting a file, it might help to see how this is done in
another versatile text editor: Notepad++.
This helpful answer explains the difference in an
instructive picture :
Encoding in Notepad++ corresponds to Reopen with Encoding
in VS Code, whereas
Converting in Notepad++ corresponds to
Save with Encoding in VS Code.
4. ASCII, ANSI, and UTF-8
A few facts may help the understanding of what ASCII, ANSI, and UTF-8 are.
An ASCII character uses just a single byte.
Or if you will, it uses seven of the eight bits of a byte – the most significant bit is always zero.
This corresponds to 0-127 in decimal numbers, 0x00-0x7F in hex numbers,
and 0000 0000 - 0111 1111 in bits.Both ANSI/Windows-1252 and UTF-8 encode an ASCII character as the ASCII character itself.
For example, the character (letter)k
is a pure ASCII character. This is one byte (eight bits) which has the decimal number 107, the hex number is 0x6B, in bits 0110 1011.
As a consequence, it's wrong to say that the ASCII characterk
is not an ANSI character, nor that it's not a UTF-8 character. – It's both!
If a text file contains only ASCII characters, then the ANSI and UTF-8 encodings coincide.
You cannot tell one apart from the other. Such a file is both ANSI and UTF-8 encoded. 5
^ click to enlarge
The upper half of the Windows-1252 table above corresponds to numbers 0-127, and the lower half to numbers 128-255. The latter are the non-ASCII ANSI characters of Windows-1252.
The picture below is taken from
UTF-8 and ASCII character charts,
and displays all those Windows-1252 characters once more,
numbered 128-255.
If you want to know how many bytes (and what bytes) a UTF-8 character uses, try this online tool.
References
- style.css | containing only
/* Ü */
- Post citing Cathy Wissink, Microsoft
- Each non-ASCII UTF-8 character uses at least two (up to four) bytes
- American Standard Code for Information Interchange table
- Answer to what ANSI is | table in Section 3
- Unicode Transformation Format - 8 bits explained
- The Windows-1252 (CP-1252) encoding table
- Notepad++ | download page
- How to convert ANSI to UTF-8 in Notepad++
- UTF-8 and ASCII character charts
- Converter, UTF-8 to bytes (hexadecimal)
1
I think the scenario I present plausibly describes what might
have happened.
Of course, I cannot know for sure what caused your situation.
2
To open settings.json
, press Ctrl + , (comma),
and then click the Open Settings icon in the top right corner :
On macOS, use ⌘ instead of Ctrl.
3
The term “ANSI” as used to signify Windows code pages is a historical
reference […].
Microsoft still uses ANSI for Western Europe interchangeably with
Windows-1252, for example in their notepad.exe
text
editor, typically located at C:\WINDOWS\System32
.
This is the convention I follow as well.
See also this answer.
4 To be more precise, each non-ASCII UTF-8 character uses at least two (up to four) bytes.
5 Suppose you have a text file containing only pure ASCII characters. If you open that file in some text editor, and the status bar says ANSI, that doesn't mean the file is not UTF-8 encoded. It just means that this text editor uses ANSI as its default encoding. If the default encoding were UTF-8, the editor would display UTF-8 in the status bar for the same file.