27

I know about "Chinese Typewriters", the question here is how Chinese was represented in earlier information systems.

For example, Japanese Telegrams were always written in Katakana and many early video games, up to the early 1990s, almost completely relied on Kana for technical reasons. It follows that early Chinese systems may also have relied on Pinyin or Bopomofo (in Taiwan). Can sb. confirm this?


(The following is not part of the question, but facts I discovered and want to share here although they don't amount to answers themselves.)

Edit 1

For what it's worth, this page https://classictech.wordpress.com/computer-companies/acer-groupmultitech-electronics-inc-sunnyvale-calif/ claims that the MPF-II-C from 1982 was the first microcomputer to be capable of handling Chinese characters.

Edit 2

A commenter linked to http://www.njstar.com/cms/chinese-commercial-telegraph-code-lookup, it contains the following valuable information regarding the CCC:

There are total of 9297 Chinese Commercial Codes defined, of which 2593 codes represent different Chinese characters in Mainland and in Taiwan/Hong Kong. Codes below 7902 are caused by the fact of Mainland Chinese character simplification. Such as ccc:0948 --> 国 U+56FD (CN) / 國 U+570B (TW/HK). Due to un-coordinated extension of original codes, codes above 7902 can respresent totally different characters in Mainland and in Taiwan/Hong Kong. Such as ccc:9154 --> 舺 U+823A (CN) / 螵 U+87B5 (TW/HK)

4

5 Answers 5

28

For telegrams, the Chinese Commercial Code (中文電碼) was used.

https://en.wikipedia.org/wiki/Chinese_telegraph_code

Roughly, to send a telegram, someone needed to go to the post office, write down the text message, info of sender & receiver - by hand, on a form.

Then, the worker would "translate" each Chinese character into 4 digits, according to the Chinese Commercial Code book, something like this:

Three Way CCC book

Then, the message was sent by morse code to the nearest post office to the receiver. The morse code was "translated" back to 4 digits, and looked up again in the Chinese Commercial Code book. Afterwards, the message was sent by a post office worker to the receiver by hand.

The CCC is still used today in Hong Kong: the residents' identity card shows their name in Chinese, English & CCC.

Page from the CCC

Another service, the telex, which was widely used in the past, cannot handle Chinese characters. As I remember, we only sent Latin characters (A-Z) & numbers (0-9).

It was in the pre-1980 timeframe.

The first computer (OK, personal computer) that could handle Chinese was the ETen Chinese system (倚天中文系統); it dates from about mid-1980.

https://en.wikipedia.org/wiki/ETen_Chinese_System

Those were the days... :)

6
  • Just to be sure, was the CCC a HK only affair, or also used on the Mainland/Taiwan? Did you write in pinyin or just English when using the telex?
    – T Nierath
    Commented Jul 18, 2018 at 13:15
  • 1
    it's used in taiwan, hong kong, mainland and diasporas worldwide. we type in english. the terminal i used, in 1980s, is composed by a keyboard and monitor, and a leased line. if you know someone in aviation, or marine services, ships & planes still used it nowadays. Commented Jul 18, 2018 at 13:23
  • Out of curiosity, is 人 encoded as 0086?
    – Frenzy Li
    Commented Jul 18, 2018 at 16:56
  • 2
    @FrenzyLi looking it up on njstar.com/cms/chinese-commercial-telegraph-code-lookup, 人 is indeed 0086
    – muru
    Commented Jul 18, 2018 at 17:06
  • You mention Morse code. Are you sure it was Morse code? I think that by 1900 or so, most countries had replaced Morse code by the Baudot code or something similar for sending telegrams. Commented Jul 19, 2018 at 12:00
5

No. We use characters directly. At least in mainland China.

However, characters are not mapped int morse code directly. Each character is first mapped into a four-digit decimal number. The numbers are then sent through telegram using morse code.

People have to look up a table (a book, actually) to map the characters into the numbers. Some experienced workers can even recite the whole book, which contains thousands of characters.

In early computers, that is a little different. The first Chinese encoding I remember is GB2312, what is standardized in 1980. It maps a Chinese character into a two-byte sequence. English letters use one byte and have the same value as in ASCII. The current standardized encoding GB18030 is compatible with GB2312, with a much larger character set, and some rare characters may use more than two bytes.

The encoding used in the computer is different from the code used in the telegram.

To input a character into a computer, pinyin was used, from the early days, until now. There are also some other character-shape based input methods, like Wubi(五笔), which was very popular in the early days, as pinyin has a lot of duplicates (a lot of different characters share the same pinyin). Nowadays, as we input with pinyin in words, phrases, even sentences, the duplicates is not a very big problem as the characters can be determined by its context.

In other Chinese speaking regions, things may be different. I do not know what it is like in other Chinese speaking regions.

8
  • Was a standardized character set used and was it related to the later GB(K)? Do you also have knowledge of early computers?
    – T Nierath
    Commented Jul 18, 2018 at 10:47
  • I don't know if the character set used in telegram and GB are related. A quick search didn't give me the answer. But they should both contain the most common characters in Chinese, so I think they must overlap a lot.
    – fefe
    Commented Jul 18, 2018 at 11:16
  • The telegram code has 10,000 code points (from 0000 to 9999), most of them are characters. GB2312 contains 6763 Chinese characters.
    – fefe
    Commented Jul 18, 2018 at 11:17
  • Thanks again, I'm holding out for answers regarding pre-1980 computers. After all, not only encoding but also display of Chinese Characters was a highly demanding task for early machines.
    – T Nierath
    Commented Jul 18, 2018 at 11:21
  • 1
    In theory, in reality it was also a memory problem in a world where 64K "ought to be enough". The Japanese solved Chinese Character display and input in the early 1980s, but only for high end machines, initially via custom hardware.
    – T Nierath
    Commented Jul 18, 2018 at 11:37
2

Historically, 4 digit numeric code is used for telegram transmission for country that not using latin character, one just need the translation "dictionary" to find the meaning of the code. Error can be prevented by adding pad number to separate the code. This happen since the era of telegram .

Following is a telegram code "dictionary" used by Japan military that already decoded the Chiang Kai-shek KMT army telegram code during World war 2. Even USA military know about such "open code", but USA never notify Chiang Kai-shek about the issue of leaking military secret, because USA itself want to know KMT army movement. historical

Similar mapping concept is used by the first computer that display Chinese Character. The early version is mapping multiple computing bytes to form a character. In fact, the modern day Unicode mapping for international character is pretty similar. Even mapping is still used by various version of Unicode to ensure different unicode version(UTF8, UTF16, UTF32, etc) conversion accuracy.

Theoretically, unicode will solved the multilingual conversion and mapping issue. Current unicode working space is around 1.1 million character that suppose to cover all the language in the world.

4
1

The following link is a very interesting and informative article (in Chinese) written by 马伯庸 (a well-known contemporary writer)on the history and features of Chinese telegram coding and sending. 惜墨如金 中文电报的奥秘

0

To add to what others have said, the telegram practice was already widely used in Mainland China at least in the 1910s. Politicians at that time often sent a nationwide telegram (通电全国) when they made an announcement.

enter image description here From this picture, you can clearly see how people translated the code into Chinese characters and wrote down each character alongside the corresponding code. (source; the bottom right corner is likely a watermark added by the author of the article)

Not the answer you're looking for? Browse other questions tagged or ask your own question.