0

I edited the following batch file in notepad. At the bottom-right corner of notepad, it showed "UTF8". I saved the file in ANSI format.

Now, the bottom-right corner of notepad showed "ANSI". I closed the file and re-opened it. Notepad showed "UTF8" at the bottom-right corner. I have repeated the above process several times and got the same result each time.

Is it an ANSI file or UTF8 file?

Or maybe what's shown at the bottom-right corner of notepad doesn't mean anything?

This is on Windows 11 Pro 23H2 built 22631.3296 Windows Feature Experience Pack 1000.22687.1000.0. Windows Notepad 11.2401.26.0

[Sorry! Forgot to add the file]

date /t >C:\health.txt
time /t >>c:\health.txt
sfc /scannow >>c:\health.txt
time /t >>c:\health.txt
sfc /scannow >>c:\health.txt
time /t >>c:\health.txt
1
  • I suspect that Notepad converts it to UTF8 when opening. I've seen this happening in Windows 10 when Notepad did not recognize the layout. To be sure, open it with Notepad++, then you are guaranteed what layout the file has.
    – LPChip
    Commented Mar 22 at 14:20

3 Answers 3

1

Is it an ANSI file or UTF8 file?

Both

If it only contains ASCII characters then it is both ANSI and UTF-8.

It is also most other character sets and encodings. This is because most encodings include the ASCII set using the ASCII code-points (numeric values).

The exceptions would be character encodings such as IBM's EBCDIC - which was once very common.


As an aside, Microsoft historically used the term ANSI to refer to a character set that they were expecting the American National Standards Institute (ANSI) to publish as one of their many standards. ANSI did not do so. A more accurate or useful name would be Code Page 1252. Saying you wrote a file in ANSI is a bit like saying you painted your kitchen in the colour Pantone or RAL.

Microsoft applications generally write UTF-8 files with a Byte Order Mark (BOM) that helps their applications recognise various Unicode encodings such as UTF-16LE, UTF-16BE and UTF-8. Note that a BOM in a UTF-8 file only serves to identify the file content encoding, it cannot indicate byte order since that isn't applicable to UTF-8. Having a BOM in a text file can cause problems, for example preventing Linux shell scripts from working because the BOM displaces the script executable signature #!.

Microsoft applications use library functions to guess a file's encoding from the file's contents. This is notoriously unreliable, although it has improved over time.

Related

1

I suspect it doesn't matter. A file that contains only English text is often ASCII, and then there's just no difference between (unmarked) UTF-8 and ASCII/ANSI.

If you want to force the file to be UTF-8, you need to save it as UTF-8 with BOM. If there's no BOM ("Byte Order Mark", a special marker at the beginning of the file), the editor has to guess, and when there are no special characters in the file (e.g non-english-diacritics such as ä, ö or ê) it just doesn't matter, as the first 128 letters of all common character tables are equal.

0

This notepad indication of UTF-8 is bogus. I have saved a text file in both ANSI and UTF-8, and both files were completely identical.

It seems like the UTF-8 implementation of notepad is seriously lacking consistence. The save in UTF-8 format should have added a byte-order mark (BOM) to the beginning of the file, which it doesn't do.

To correctly handle the difference between ANSI and UTF-8 (with or without a BOM), you need a more evolved text editor, for example notepad++.

1
  • Even Notepad++ cannot do anything about it if there's no BOM and the file contains ASCII characters only. And the default Notepad does support saving files with BOM mark.
    – PMF
    Commented Mar 22 at 16:17

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .