Revisions to Is it an ANSI or UTF8 file?

added 149 characters in body

Source Link

edited Mar 24 at 0:03

83.6k
20
139
212

Is it an ANSI file or UTF8 file?

Both

If it only contains ASCII characters then it is both ANSI and UTF-8.

It is also most other character sets and encodings. This is because most encodings include the ASCII set using the ASCII code-points (numeric values).

The exceptions would be character encodings such as IBM's EBCDIC - which was once very common.

As an aside, Microsoft historically used the term ANSI to refer to a character set that they were expecting the American National Standards Institute (ANSI) to publish as one of their many standards. ANSI did not do so. A more accurate or useful name would be Code Page 1252. Saying you wrote a file in ANSI is a bit like saying you painted your kitchen in the colour Pantone or RAL.

Microsoft applications generally write UTF-8 files with a Byte Order Mark (BOM) that helps their applications recognise various Unicode encodings such as UTF-16LE, UTF-16BE and UTF-8. However this contravenes the Unicode consortium guidance which saysNote that a BOM in a UTF-8 files should not have a Byte Order Mark (since there isfile only one possibleserves to identify the file content encoding, it cannot indicate byte order forsince that isn't applicable to UTF-8). Having a BOM in a text file can cause problems, for example preventing Linux shell scripts from working because the BOM displaces the script executable signature #!.

Microsoft applications use library functions to guess a file's encoding from the file's contents. This is notoriously unreliable, although it has improved over time.

Windows 7 UTF-8 and Unicode

Is it an ANSI file or UTF8 file?

Both

If it only contains ASCII characters then it is both ANSI and UTF-8.

It is also most other character sets and encodings. This is because most encodings include the ASCII set using the ASCII code-points (numeric values).

The exceptions would be character encodings such as IBM's EBCDIC - which was once very common.

As an aside, Microsoft historically used the term ANSI to refer to a character set that they were expecting the American National Standards Institute (ANSI) to publish as one of their many standards. ANSI did not do so. A more accurate or useful name would be Code Page 1252. Saying you wrote a file in ANSI is a bit like saying you painted your kitchen in the colour Pantone or RAL.

Microsoft applications generally write UTF-8 files with a Byte Order Mark (BOM) that helps their applications recognise various Unicode encodings such as UTF-16LE, UTF-16BE and UTF-8. However this contravenes the Unicode consortium guidance which says UTF-8 files should not have a Byte Order Mark (since there is only one possible byte order for UTF-8)

Microsoft applications use library functions to guess a file's encoding from the file's contents. This is notoriously unreliable, although it has improved over time.

Windows 7 UTF-8 and Unicode

Is it an ANSI file or UTF8 file?

Both

If it only contains ASCII characters then it is both ANSI and UTF-8.

It is also most other character sets and encodings. This is because most encodings include the ASCII set using the ASCII code-points (numeric values).

The exceptions would be character encodings such as IBM's EBCDIC - which was once very common.

As an aside, Microsoft historically used the term ANSI to refer to a character set that they were expecting the American National Standards Institute (ANSI) to publish as one of their many standards. ANSI did not do so. A more accurate or useful name would be Code Page 1252. Saying you wrote a file in ANSI is a bit like saying you painted your kitchen in the colour Pantone or RAL.

Microsoft applications generally write UTF-8 files with a Byte Order Mark (BOM) that helps their applications recognise various Unicode encodings such as UTF-16LE, UTF-16BE and UTF-8. Note that a BOM in a UTF-8 file only serves to identify the file content encoding, it cannot indicate byte order since that isn't applicable to UTF-8. Having a BOM in a text file can cause problems, for example preventing Linux shell scripts from working because the BOM displaces the script executable signature #!.

Microsoft applications use library functions to guess a file's encoding from the file's contents. This is notoriously unreliable, although it has improved over time.

Windows 7 UTF-8 and Unicode

added 21 characters in body

Source Link

edited Mar 22 at 16:59

RedGrittyBrick

83.6k
20
139
212

Is it an ANSI file or UTF8 file?

Both

If it only contains ASCII characters then it is both ANSI and UTF-8.

It is also most other character sets and encodings since. This is because most encodings include the ASCII set using the ASCII code-points (numeric values).

The exceptions would be character encodings such as IBM's EBCDIC - which was once very common.

As an aside, Microsoft historically used the term ANSI to refer to a character set that they were expecting the American National Standards Institute (ANSI) to publish as one of their many standards. ANSI did not do so. A more accurate or useful name would be Code Page 1252. Saying you wrote a file in ANSI is a bit like saying you painted your kitchen in the colour Pantone or RAL.

Microsoft applications generally write UTF-8 files with a Byte Order Mark (BOM) that helps their applications recognise various Unicode encodings such as UTF-16LE, UTF-16BE and UTF-8. However this contravenes the Unicode consortium guidance which says UTF-8 files should not have a Byte Order Mark (since there is only one possible byte order for UTF-8)

Microsoft applications use library functions to guess a file's encoding from the file's contents. This is notoriously unreliable, although it has improved over time.

Windows 7 UTF-8 and Unicode

Is it an ANSI file or UTF8 file?

Both

If it only contains ASCII characters then it is both ANSI and UTF-8.

It is also most other character sets and encodings since most include the ASCII set using the ASCII code-points (numeric values).

The exceptions would be character encodings such as IBM's EBCDIC - which was once very common.

As an aside, Microsoft historically used the term ANSI to refer to a character set that they were expecting the American National Standards Institute (ANSI) to publish as one of their many standards. ANSI did not do so. A more accurate or useful name would be Code Page 1252. Saying you wrote a file in ANSI is a bit like saying you painted your kitchen in the colour Pantone or RAL.

Microsoft applications generally write UTF-8 files with a Byte Order Mark (BOM) that helps their applications recognise various Unicode encodings such as UTF-16LE, UTF-16BE and UTF-8. However this contravenes the Unicode consortium guidance which says UTF-8 files should not have a Byte Order Mark (since there is only one possible byte order for UTF-8)

Microsoft applications use library functions to guess a file's encoding from the file's contents. This is notoriously unreliable, although it has improved over time.

Windows 7 UTF-8 and Unicode

Is it an ANSI file or UTF8 file?

Both

If it only contains ASCII characters then it is both ANSI and UTF-8.

It is also most other character sets and encodings. This is because most encodings include the ASCII set using the ASCII code-points (numeric values).

The exceptions would be character encodings such as IBM's EBCDIC - which was once very common.

As an aside, Microsoft historically used the term ANSI to refer to a character set that they were expecting the American National Standards Institute (ANSI) to publish as one of their many standards. ANSI did not do so. A more accurate or useful name would be Code Page 1252. Saying you wrote a file in ANSI is a bit like saying you painted your kitchen in the colour Pantone or RAL.

Microsoft applications generally write UTF-8 files with a Byte Order Mark (BOM) that helps their applications recognise various Unicode encodings such as UTF-16LE, UTF-16BE and UTF-8. However this contravenes the Unicode consortium guidance which says UTF-8 files should not have a Byte Order Mark (since there is only one possible byte order for UTF-8)

Microsoft applications use library functions to guess a file's encoding from the file's contents. This is notoriously unreliable, although it has improved over time.

Windows 7 UTF-8 and Unicode

Source Link

created Mar 22 at 16:51

RedGrittyBrick

83.6k
20
139
212

Is it an ANSI file or UTF8 file?

Both

If it only contains ASCII characters then it is both ANSI and UTF-8.

It is also most other character sets and encodings since most include the ASCII set using the ASCII code-points (numeric values).

The exceptions would be character encodings such as IBM's EBCDIC - which was once very common.

As an aside, Microsoft historically used the term ANSI to refer to a character set that they were expecting the American National Standards Institute (ANSI) to publish as one of their many standards. ANSI did not do so. A more accurate or useful name would be Code Page 1252. Saying you wrote a file in ANSI is a bit like saying you painted your kitchen in the colour Pantone or RAL.

Microsoft applications generally write UTF-8 files with a Byte Order Mark (BOM) that helps their applications recognise various Unicode encodings such as UTF-16LE, UTF-16BE and UTF-8. However this contravenes the Unicode consortium guidance which says UTF-8 files should not have a Byte Order Mark (since there is only one possible byte order for UTF-8)

Microsoft applications use library functions to guess a file's encoding from the file's contents. This is notoriously unreliable, although it has improved over time.

Windows 7 UTF-8 and Unicode

Stack Exchange Network

Return to Answer