Skip to main content

Questions tagged [character-sets]

Questions about character-sets, either from a standardization point of view, or in terms of the repertoire implemented by particular equipment.

29 votes
4 answers
10k views

What actual purpose do accent characters in ISO-8859-1 and Windows 1252 serve?

ISO-8859-1 and Win1252 have a couple of characters that are normally associated with accented letters of the Roman alphabet: 0xA8, ¨ (umlaut) 0xB4, ´ (acute accent) 0xB8, ¸ (cedilla) ... which ...
Bitbang3r's user avatar
  • 443
10 votes
1 answer
558 views

How to decode mojibake in old Macintosh text files?

I hope this is an OK place to ask this question. The Internet Archive has a Macintosh floppy image containing presets for an old E-mu synthesizer module. The page is here Proteus Preset Libraries ...
aMike's user avatar
  • 251
10 votes
0 answers
294 views

What used EBCDIC code pages 1 thru 5?

For US English, the most commonly used EBCDIC code page is 37, which is one of the CECP code pages (Country Extended Code Page). The old IBM globalisation database has a 1986 copyright date for code ...
Simon Kissane's user avatar
9 votes
1 answer
1k views

Why do BK computers have unusual representations of $ and ^

While programming in BASIC and FOCAL on my BK-0010-01, I wonder why both the keyboard and the character set have unusual representations of ASCII 36 and ASCII 94? ASCII 36: Standard:$ ; BK version: ¤ ...
harlandski's user avatar
  • 2,963
8 votes
2 answers
262 views

Did any system ever use the Privacy Message (PM) C1 control?

ECMA-48 (Fifth Edition, 1991) section 8.3.94 (page 53, PDF page 67) defines "PM - PRIVACY MESSAGE" as: PM is used as the opening delimiter of a control string for privacy message use. The ...
Simon Kissane's user avatar
8 votes
1 answer
325 views

Was ∆ used in APL as a substitute for space because ECMA-17/ISO 2047 specified △ as graphical representation for space?

Wikipedia on naming conventions in programming states (without source): In APL dialects, the delta (Δ) is used between words, e.g. PERFΔSQUARE (…) This is an unusual choice, but I notice that ECMA-...
Adám's user avatar
  • 668
10 votes
4 answers
2k views

Is there a common convention to describe the encoding of a legacy text file?

For the purpose of this question, a legacy textfile contains characters in the range 0x20 through 0x7e, with each line terminated by an OS-specific combination of 0x0d and/or 0x0a; it might be ...
Mark Morgan Lloyd's user avatar
9 votes
3 answers
793 views

Why does CP1252 have these unused codepoints?

The CP-1252 (sometimes called Windows-1252 or many more stupid names) encoding has five unused codepoints, 81h, 8Dh, 8Fh, 90h, 9Dh. The placement of these is not immediately obvious to me. Are they ...
Omar and Lorraine's user avatar
24 votes
1 answer
2k views

Why was PETSCII based on an obsolete version of ASCII?

PETSCII (sometimes PETASCII) is the character set developed by Commodore for use in its microcomputers. The first of these, the PET, started to be developed in early 1976. Why, then, did Commodore ...
Psychonaut's user avatar
  • 7,681
20 votes
2 answers
7k views

Why does the default base64 encoding use forward slash /? [closed]

As anyone who has been bitten by using base64 instead of base64url is quite well aware, the "original" base64 alphabet uses alphanumeric, +, = (both perfectly cromulent URL characters), and ...
DeusXMachina's user avatar
12 votes
1 answer
784 views

Can you read the character definitions (font) in an Apple II using PEEK in Applesoft BASIC?

Can you read the character definitions (font) from ROM in an Apple II using PEEK in Applesoft BASIC? You can do this on some other computers e.g. Sinclair ZX81, Commodore 64, and Amstrad PC1512, but ...
mobluse's user avatar
  • 505
32 votes
1 answer
3k views

How did the various Soviet ZX Spectrum clones support Cyrillic text?

There may be no one single answer to this question, since the various clones might have done this all in different ways. And of course, some clones do not have the Cyrillic text support at all. I'm ...
Omar and Lorraine's user avatar
13 votes
2 answers
922 views

Was `wchar_t` ever widely adopted by the Unix culture in actual practice?

My very rough understanding of character encoding history as it relates to the Unix family of platforms/languages is that: They started using single-byte (7/8/9-ish bit) character sets like ASCII/...
natevw's user avatar
  • 2,947
10 votes
6 answers
3k views

How prevalent is the CR (classic MacOS) line ending today? [closed]

In a parser library I am maintaining, I stopped recognizing singular Carriage Return characters as line endings to reduce complexity in the tokenizer's position tracking code, a perennial source of ...
Theodore Tsirpanis's user avatar
-5 votes
4 answers
695 views

Is UTF-8 responsible for a lot of the cpu-needed bloat in the last ten to fifteen years? [closed]

Some say UTF-8 was the best solution. The price you pay is that it basically makes all parsing optimizations that rely on a fixed relationship of byte offset to character position unusable. Compilers, ...
rackandboneman's user avatar

15 30 50 per page