Questions tagged [character-sets]
Questions about character-sets, either from a standardization point of view, or in terms of the repertoire implemented by particular equipment.
27
questions
29
votes
4
answers
10k
views
What actual purpose do accent characters in ISO-8859-1 and Windows 1252 serve?
ISO-8859-1 and Win1252 have a couple of characters that are normally associated with accented letters of the Roman alphabet:
0xA8, ¨ (umlaut)
0xB4, ´ (acute accent)
0xB8, ¸ (cedilla)
... which ...
10
votes
1
answer
558
views
How to decode mojibake in old Macintosh text files?
I hope this is an OK place to ask this question.
The Internet Archive has a Macintosh floppy image containing presets for an old E-mu synthesizer module. The page is here Proteus Preset Libraries
...
10
votes
0
answers
294
views
What used EBCDIC code pages 1 thru 5?
For US English, the most commonly used EBCDIC code page is 37, which is one of the CECP code pages (Country Extended Code Page). The old IBM globalisation database has a 1986 copyright date for code ...
9
votes
1
answer
1k
views
Why do BK computers have unusual representations of $ and ^
While programming in BASIC and FOCAL on my BK-0010-01, I wonder why both the keyboard and the character set have unusual representations of ASCII 36 and ASCII 94?
ASCII 36: Standard:$ ; BK version: ¤
...
8
votes
2
answers
262
views
Did any system ever use the Privacy Message (PM) C1 control?
ECMA-48 (Fifth Edition, 1991) section 8.3.94 (page 53, PDF page 67) defines "PM - PRIVACY MESSAGE" as:
PM is used as the opening delimiter of a control string for privacy message use. The ...
8
votes
1
answer
325
views
Was ∆ used in APL as a substitute for space because ECMA-17/ISO 2047 specified △ as graphical representation for space?
Wikipedia on naming conventions in programming states (without source):
In APL dialects, the delta (Δ) is used between words, e.g. PERFΔSQUARE (…)
This is an unusual choice, but I notice that ECMA-...
10
votes
4
answers
2k
views
Is there a common convention to describe the encoding of a legacy text file?
For the purpose of this question, a legacy textfile contains characters in the range 0x20 through 0x7e, with each line terminated by an OS-specific combination of 0x0d and/or 0x0a; it might be ...
9
votes
3
answers
793
views
Why does CP1252 have these unused codepoints?
The CP-1252 (sometimes called Windows-1252 or many more stupid names) encoding has five unused codepoints, 81h, 8Dh, 8Fh, 90h, 9Dh. The placement of these is not immediately obvious to me.
Are they ...
24
votes
1
answer
2k
views
Why was PETSCII based on an obsolete version of ASCII?
PETSCII (sometimes PETASCII) is the character set developed by Commodore for use in its microcomputers. The first of these, the PET, started to be developed in early 1976. Why, then, did Commodore ...
20
votes
2
answers
7k
views
Why does the default base64 encoding use forward slash /? [closed]
As anyone who has been bitten by using base64 instead of base64url is quite well aware, the "original" base64 alphabet uses alphanumeric, +, = (both perfectly cromulent URL characters), and ...
12
votes
1
answer
784
views
Can you read the character definitions (font) in an Apple II using PEEK in Applesoft BASIC?
Can you read the character definitions (font) from ROM in an Apple II using PEEK in Applesoft BASIC? You can do this on some other computers e.g. Sinclair ZX81, Commodore 64, and Amstrad PC1512, but ...
32
votes
1
answer
3k
views
How did the various Soviet ZX Spectrum clones support Cyrillic text?
There may be no one single answer to this question, since the various clones might have done this all in different ways. And of course, some clones do not have the Cyrillic text support at all. I'm ...
13
votes
2
answers
922
views
Was `wchar_t` ever widely adopted by the Unix culture in actual practice?
My very rough understanding of character encoding history as it relates to the Unix family of platforms/languages is that:
They started using single-byte (7/8/9-ish bit) character sets like ASCII/...
10
votes
6
answers
3k
views
How prevalent is the CR (classic MacOS) line ending today? [closed]
In a parser library I am maintaining, I stopped recognizing singular Carriage Return characters as line endings to reduce complexity in the tokenizer's position tracking code, a perennial source of ...
-5
votes
4
answers
695
views
Is UTF-8 responsible for a lot of the cpu-needed bloat in the last ten to fifteen years? [closed]
Some say UTF-8 was the best solution.
The price you pay is that it basically makes all parsing optimizations that rely on a fixed relationship of byte offset to character position unusable.
Compilers, ...