I'm trying to understand the full story behind how text gets on screens. For keeping things simple I stay with single-byte encodings (no Unicode).
On my disk there is a sequence of bytes, each with a value between 0 and 255. I can then tell my computer programs which character encoding they should use to display these bytes. I could use ISO-8859-1 where, for example, the byte with value 0xA4 is some circle with for dots (¤). Or I could switch to ISO-8859-15, then my byte with value 0xA4 is defined to be the Euro symbol (€).
This is all still simple to understand. But parallel to changing the character encoding, I can also change the font to define the exact shape of a symbol. Now, a font is meant to work with all character encodings. So, a font should have both symbols: ¤ and €.
So, the steps to get a text on my screen is obviously:
- Read byte sequence serially
- Use numeric value of current byte to lookup into the character encoding table
- Use [something] to lookup in font file to get exact shape of symbol found in step 2
- Draw symbol like defined in font file
In step 3, what is this "something" that is used to map character encoding to the font? Do font files depend on character encoding? So, does a font has some built-in "double switch" mechanism that works like (pseudocode)
get_symbol(code, encoding) {
switch code{
case 0xA4: switch(encoding) {
case 'ISO-8859-1' : return '¤';
case 'ISO-8859-15': return '€';
}
}
}
?
What are the details how to get from a given byte sequence and a given character encoding to the actual symbol from the font? How is this mapped to always give the correct symbol?