0

I have a .txt file saved in UTF-8 format without a BOM. It contains an 'é' character.

How does notepad.exe determine that it is UTF-8 encoded?

Other .txt files containing only < 0x80 characters are opened as "ANSI" encoding.

1 Answer 1

4

According to Raymond Chen:

Some files come up strange in Notepad

[...] When faced with a file that lacks a special prefix, Notepad is forced to guess which of those two encodings the file actually uses. The function that does this work is IsTextUnicode, which studies a chunk of bytes and does some statistical analysis to come up with a guess.

And as the documentation notes, “Absolute certainty is not guaranteed.” Short strings are most likely to be misdetected.

(Related follow-up blog post.)

1
  • Thx for the Raymond Chen links @user1686. I have looked into this in the years past, but never come across his articles. I wonder if Excel uses the IsTextUnicode() function itself when opening a TXT file?
    – David Carr
    Commented May 9, 2023 at 3:34

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .