Skip to main content
added 149 characters in body
Source Link
RedGrittyBrick
  • 83.6k
  • 20
  • 139
  • 212

Is it an ANSI file or UTF8 file?

Both

If it only contains ASCII characters then it is both ANSI and UTF-8.

It is also most other character sets and encodings. This is because most encodings include the ASCII set using the ASCII code-points (numeric values).

The exceptions would be character encodings such as IBM's EBCDIC - which was once very common.


As an aside, Microsoft historically used the term ANSI to refer to a character set that they were expecting the American National Standards Institute (ANSI) to publish as one of their many standards. ANSI did not do so. A more accurate or useful name would be Code Page 1252. Saying you wrote a file in ANSI is a bit like saying you painted your kitchen in the colour Pantone or RAL.

Microsoft applications generally write UTF-8 files with a Byte Order Mark (BOM) that helps their applications recognise various Unicode encodings such as UTF-16LE, UTF-16BE and UTF-8. However this contravenes the Unicode consortium guidance which saysNote that a BOM in a UTF-8 files should not have a Byte Order Mark (since there isfile only one possibleserves to identify the file content encoding, it cannot indicate byte order forsince that isn't applicable to UTF-8). Having a BOM in a text file can cause problems, for example preventing Linux shell scripts from working because the BOM displaces the script executable signature #!.

Microsoft applications use library functions to guess a file's encoding from the file's contents. This is notoriously unreliable, although it has improved over time.

Related

Is it an ANSI file or UTF8 file?

Both

If it only contains ASCII characters then it is both ANSI and UTF-8.

It is also most other character sets and encodings. This is because most encodings include the ASCII set using the ASCII code-points (numeric values).

The exceptions would be character encodings such as IBM's EBCDIC - which was once very common.


As an aside, Microsoft historically used the term ANSI to refer to a character set that they were expecting the American National Standards Institute (ANSI) to publish as one of their many standards. ANSI did not do so. A more accurate or useful name would be Code Page 1252. Saying you wrote a file in ANSI is a bit like saying you painted your kitchen in the colour Pantone or RAL.

Microsoft applications generally write UTF-8 files with a Byte Order Mark (BOM) that helps their applications recognise various Unicode encodings such as UTF-16LE, UTF-16BE and UTF-8. However this contravenes the Unicode consortium guidance which says UTF-8 files should not have a Byte Order Mark (since there is only one possible byte order for UTF-8)

Microsoft applications use library functions to guess a file's encoding from the file's contents. This is notoriously unreliable, although it has improved over time.

Related

Is it an ANSI file or UTF8 file?

Both

If it only contains ASCII characters then it is both ANSI and UTF-8.

It is also most other character sets and encodings. This is because most encodings include the ASCII set using the ASCII code-points (numeric values).

The exceptions would be character encodings such as IBM's EBCDIC - which was once very common.


As an aside, Microsoft historically used the term ANSI to refer to a character set that they were expecting the American National Standards Institute (ANSI) to publish as one of their many standards. ANSI did not do so. A more accurate or useful name would be Code Page 1252. Saying you wrote a file in ANSI is a bit like saying you painted your kitchen in the colour Pantone or RAL.

Microsoft applications generally write UTF-8 files with a Byte Order Mark (BOM) that helps their applications recognise various Unicode encodings such as UTF-16LE, UTF-16BE and UTF-8. Note that a BOM in a UTF-8 file only serves to identify the file content encoding, it cannot indicate byte order since that isn't applicable to UTF-8. Having a BOM in a text file can cause problems, for example preventing Linux shell scripts from working because the BOM displaces the script executable signature #!.

Microsoft applications use library functions to guess a file's encoding from the file's contents. This is notoriously unreliable, although it has improved over time.

Related

added 21 characters in body
Source Link
RedGrittyBrick
  • 83.6k
  • 20
  • 139
  • 212

Is it an ANSI file or UTF8 file?

Both

If it only contains ASCII characters then it is both ANSI and UTF-8.

It is also most other character sets and encodings since. This is because most encodings include the ASCII set using the ASCII code-points (numeric values).

The exceptions would be character encodings such as IBM's EBCDIC - which was once very common.


As an aside, Microsoft historically used the term ANSI to refer to a character set that they were expecting the American National Standards Institute (ANSI) to publish as one of their many standards. ANSI did not do so. A more accurate or useful name would be Code Page 1252. Saying you wrote a file in ANSI is a bit like saying you painted your kitchen in the colour Pantone or RAL.

Microsoft applications generally write UTF-8 files with a Byte Order Mark (BOM) that helps their applications recognise various Unicode encodings such as UTF-16LE, UTF-16BE and UTF-8. However this contravenes the Unicode consortium guidance which says UTF-8 files should not have a Byte Order Mark (since there is only one possible byte order for UTF-8)

Microsoft applications use library functions to guess a file's encoding from the file's contents. This is notoriously unreliable, although it has improved over time.

Related

Is it an ANSI file or UTF8 file?

Both

If it only contains ASCII characters then it is both ANSI and UTF-8.

It is also most other character sets and encodings since most include the ASCII set using the ASCII code-points (numeric values).

The exceptions would be character encodings such as IBM's EBCDIC - which was once very common.


As an aside, Microsoft historically used the term ANSI to refer to a character set that they were expecting the American National Standards Institute (ANSI) to publish as one of their many standards. ANSI did not do so. A more accurate or useful name would be Code Page 1252. Saying you wrote a file in ANSI is a bit like saying you painted your kitchen in the colour Pantone or RAL.

Microsoft applications generally write UTF-8 files with a Byte Order Mark (BOM) that helps their applications recognise various Unicode encodings such as UTF-16LE, UTF-16BE and UTF-8. However this contravenes the Unicode consortium guidance which says UTF-8 files should not have a Byte Order Mark (since there is only one possible byte order for UTF-8)

Microsoft applications use library functions to guess a file's encoding from the file's contents. This is notoriously unreliable, although it has improved over time.

Related

Is it an ANSI file or UTF8 file?

Both

If it only contains ASCII characters then it is both ANSI and UTF-8.

It is also most other character sets and encodings. This is because most encodings include the ASCII set using the ASCII code-points (numeric values).

The exceptions would be character encodings such as IBM's EBCDIC - which was once very common.


As an aside, Microsoft historically used the term ANSI to refer to a character set that they were expecting the American National Standards Institute (ANSI) to publish as one of their many standards. ANSI did not do so. A more accurate or useful name would be Code Page 1252. Saying you wrote a file in ANSI is a bit like saying you painted your kitchen in the colour Pantone or RAL.

Microsoft applications generally write UTF-8 files with a Byte Order Mark (BOM) that helps their applications recognise various Unicode encodings such as UTF-16LE, UTF-16BE and UTF-8. However this contravenes the Unicode consortium guidance which says UTF-8 files should not have a Byte Order Mark (since there is only one possible byte order for UTF-8)

Microsoft applications use library functions to guess a file's encoding from the file's contents. This is notoriously unreliable, although it has improved over time.

Related

Source Link
RedGrittyBrick
  • 83.6k
  • 20
  • 139
  • 212

Is it an ANSI file or UTF8 file?

Both

If it only contains ASCII characters then it is both ANSI and UTF-8.

It is also most other character sets and encodings since most include the ASCII set using the ASCII code-points (numeric values).

The exceptions would be character encodings such as IBM's EBCDIC - which was once very common.


As an aside, Microsoft historically used the term ANSI to refer to a character set that they were expecting the American National Standards Institute (ANSI) to publish as one of their many standards. ANSI did not do so. A more accurate or useful name would be Code Page 1252. Saying you wrote a file in ANSI is a bit like saying you painted your kitchen in the colour Pantone or RAL.

Microsoft applications generally write UTF-8 files with a Byte Order Mark (BOM) that helps their applications recognise various Unicode encodings such as UTF-16LE, UTF-16BE and UTF-8. However this contravenes the Unicode consortium guidance which says UTF-8 files should not have a Byte Order Mark (since there is only one possible byte order for UTF-8)

Microsoft applications use library functions to guess a file's encoding from the file's contents. This is notoriously unreliable, although it has improved over time.

Related