How can I stop the auto-change encoding?
How can I stop the auto-change encoding?
– By your descriptionAccording to your own
comment, the Auto Guess Encoding
the Auto Guess Encoding is already off.
The fact that VS Code still encodes your file as Windows-1252
(code page 1252 or CP1252) needs
calls for some other explanation.
I cannot know for sure what caused your situation.By assuming that you have a VS Code setting that specifically decodes
your CSS files
But I'veas being Windows-1252,
I've been able to reproduce ityour situation very accurately, and therefore
think the scenario I present below plausibly describes what might
have happened.
1
I will reproduce the essence of all your findings.
In doing so, I use a simplified version of your style.css
filesimplified version of your style.css
,
containing just a single line :
To make VS Code open the file with encoding Windows-1252
(with Auto Guess Encoding off),
I will assume that the VS Code settings.json
contains the
followingthe following code/line :
12
Such a setting will make VS Code encode all .css
files as
Windows-1252
(also known as CP-1252 – the most common type of ANSI encoding).
23
The file SuperUser-Q1419830.zip contains the file
style.css
(containing /* Ü */
encoded as UTF-8).
IfIf you download and unzip itstyle.css
, then right-click style.css
it and
Open with Code, expect to see :
The reason you see two Windows-1252 characters – Ãœ
– instead
of the single UTF-8 Ü
character, is that Windows-1252
reads each byte as a single character – the non-ASCII characters
Ã
and œ
.
UTF-8 on the other hand uses two bytes to read a single
non-ASCII character like Ü
.
34
1.A a. How to display Ü
correctly
To make the German letter Ü
appear correctly, you need to click :
Reopen with Encoding > UTF-8 Guessed from contentReopen with Encoding > UTF-8 Guessed from content.
Choosing Reopen with Encoding doesn't change the file itself.
It changes how the file is displayed in VS Code – how it is
encodeddecoded.
1.B b. What you should not do
You'll get a problem if you instead click :
Save with Encoding > UTF-8 Guessed from contentSave with Encoding > UTF-8 Guessed from content.
This does change the file – all non-ASCII characters get converted to their corresponding UTF-8 characters, and. If you save the file, it is saved saved with these changes.
The reason you now see four characters instead of two is the same
as before.
– The single UTF-8 character Ã
(2 bytes) is displayed as the
two characters Ã
(still 2 bytes) when encoded asdecoded with
Windows-1252.
And the single UTF-8 character œ
is displayed as the two
Windows-1252 characters Å“
.
Given that you want to display Ü
and not the corrupted Ü
,
you need to : \
The file has now been converted back to its original state.
What remains is to encodedecode it correctly (with UTF-8).
Yay! Mission accomplished.
To better understand the difference between decoding/encoding and
converting a file, it might help to see how this is done in
another versatile text editor: Notepad++.
This helpful answer explains the difference in an instructive
pictureinstructive picture :
An ASCII character uses just a single byte.
Or if you will, it uses seven of the eight bits of a byte – the most significant bit is always zero.
This corresponds to 0-127 in decimal numbers, 0x00-0x7F in hex numbers,
and 0000 0000 - 0111 1111 in bits.Both ANSI/Windows-1252 and UTF-8 encode an ASCII character as the pure ASCII characterASCII character itself.
For example, the character (letter)k
is a pure ASCII character. This is one byte (eight bits) which has the decimal number 107, the hex number is 0x6B, in bits 0110 1011.
As a consequence, it's wrong to say that the ASCII characterk
is not an ANSI character, nor that it's not a UTF-8 character. – It's both!
If a text file contains only ASCII characters, then the ANSI and UTF-8 encodings coincide.
You cannot tell one apart from the other. Such a file is both ANSI and UTF-8 encoded. 45
If you ever want to know how many bytes (and what bytes) a UTF-8
character uses,
you can try this online tool.
- SuperUser-Q1419830.zipstyle.css | containing only
/* Ü */
- The ISO-8859-1 character set (Western Europe)Post citing Cathy Wissink, Microsoft
- Each non-ASCII UTF-8 character uses at least two (up to four) bytes
- American Standard Code for Information Interchange table
- Answer to what ANSI is | table in Section 3
- Unicode Transformation Format - 8 bits explained
- The Windows-1252 (CP-1252) encoding table
- DownloadNotepad++ | download page for Notepad++
- How to convert ANSI to UTF-8 in Notepad++
- UTF-8 and ASCII character charts
- Converter, UTF-8 to bytes (hexadecimal)
1
I think the scenario I present plausibly describes what might
have happened.
Of course, I cannot know for sure what caused your situation.
2
To open settings.json
, press Ctrl+ + , (comma),
and then click the Open Settings icon in the top right corner :
On macOS, use ⌘ instead of Ctrl.
23
Some people would argue that the ANSI character set for Western
Europe is ISO-8859-1The term “ANSI” as used to signify Windows code pages is a historical
reference […].
My perception is that MicrosoftMicrosoft still uses ANSI for Western Europe
interchangeably interchangeably with
Windows-1252,
at least for example in their notepad.exe
text editor
editor, typically located at C:\WINDOWS\System32
.
This is the convention I follow as well. See
See also this answer.
34 To be more precise, each non-ASCII UTF-8 character uses at least two (up to four) bytes.
45 Suppose you have a text file containing only pure ASCII characters. If you open that file in some text editor, and the status bar says ANSI, that doesn't mean the file is not UTF-8 encoded. It just means that this text editor uses ANSI as its default encoding. If the default encoding were UTF-8, the editor would display UTF-8 in the status bar for the same file.