How does notepad.exe determine character encoding?

Question

I have a .txt file saved in UTF-8 format without a BOM. It contains an 'é' character.

How does notepad.exe determine that it is UTF-8 encoded?

Other .txt files containing only < 0x80 characters are opened as "ANSI" encoding.

Community · Accepted Answer · 2020-06-12 13:48:39Z

4

According to Raymond Chen:

Some files come up strange in Notepad

[...] When faced with a file that lacks a special prefix, Notepad is forced to guess which of those two encodings the file actually uses. The function that does this work is IsTextUnicode, which studies a chunk of bytes and does some statistical analysis to come up with a guess.

And as the documentation notes, “Absolute certainty is not guaranteed.” Short strings are most likely to be misdetected.

(Related follow-up blog post.)

edited Jun 12, 2020 at 13:48

CommunityBot

1

answered Apr 24, 2019 at 18:20

grawity_u1686

465k66 gold badges977 silver badges1.1k bronze badges

Thx for the Raymond Chen links @user1686. I have looked into this in the years past, but never come across his articles. I wonder if Excel uses the IsTextUnicode() function itself when opening a TXT file?
– David Carr
Commented May 9, 2023 at 3:34

Add a comment |

Stack Exchange Network

How does notepad.exe determine character encoding?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
windows
character-encoding
.

Hot Network Questions

How does notepad.exe determine character encoding?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged windowscharacter-encoding.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
windows
character-encoding
.