I have a .txt file saved in UTF-8 format without a BOM. It contains an 'é' character.
How does notepad.exe determine that it is UTF-8 encoded?
Other .txt files containing only < 0x80 characters are opened as "ANSI" encoding.
According to Raymond Chen:
Some files come up strange in Notepad
[...] When faced with a file that lacks a special prefix, Notepad is forced to guess which of those two encodings the file actually uses. The function that does this work is IsTextUnicode, which studies a chunk of bytes and does some statistical analysis to come up with a guess.
And as the documentation notes, “Absolute certainty is not guaranteed.” Short strings are most likely to be misdetected.
(Related follow-up blog post.)