Showing posts with label CJK. Show all posts
Showing posts with label CJK. Show all posts

Tuesday, September 12, 2023

Announcing The Unicode® Standard, Version 15.1


Version 15.1 of the Unicode Standard is now available. This minor version update includes updated code charts, data files and annexes. The core specification is unchanged from Unicode Version 15.0.

This version adds 627 characters, bringing the total number of characters to 149,813. The additions include 622 CJK unified ideographs in a new block, CJK Unified Ideographs Extension I. These new ideographs are urgently needed in China for use in public service databases, and are expected to be included in a forthcoming amendment to China’s GB 18030-2022 standard. The other new characters are five ideographic description characters that enhance the ability to describe rare or not-yet-encoded CJK ideographs.

There are six completely new emoji, such as for phoenix and lime and (finally) an edible mushroom. For 108 people emoji, you can now switch the direction that they are facing (for example, person walking facing right versus facing left).

Security-related updates have been made to UAX #9, Unicode Bidirectional Algorithm and UAX #31, Unicode Identifiers and Syntax along with updates to UTS #39, Unicode Security Mechanisms. These updates complement the release of a new Unicode Technical Standard, UTS #55, Unicode Source Code Handling.

The new characters are limited to three blocks, and the code charts for several other blocks have changed. The most significant change to charts is for the CJK Unified Ideographs, CJK Unified Ideographs Extension A and CJK Unified Ideographs Extension B blocks with the addition of representative glyphs and source references for over 24,000 KP-source (North Korea) ideographs. There are also many other glyph corrections and improvements—see the 15.1 delta code charts for details.

Significant updates have been made to UAX #14, Unicode Line Breaking Algorithm and UAX #29, Unicode Text Segmentation adding better support for scripts of South and Southeast Asia, including grapheme cluster support for aksaras and consonant conjuncts, and line breaking at orthographic syllable boundaries.

For complete details on Unicode Version 15.1, see https://www.unicode.org/versions/Unicode15.1.0/.



Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

[badge]

Wednesday, March 22, 2023

Remembering John H. Jenkins (井作恆)

The Unicode community is greatly saddened and affected by the recent and sudden loss of John H. Jenkins, a long-time colleague and friend. John was most recently the Vice-Chair of the Unicode CJK & Unihan Group. The vast majority of characters in the Unicode Standard are Chinese, Japanese, and Korean (aka Han) ideographs, which are historically used with a broader range of languages. These have been challenging characters to deal with in script encoding, because of significant regional drift over hundreds of years. As an expert in Han ideographs, John contributed a non-trivial amount of work and effort, sometimes needing to make difficult character encoding decisions for the benefit of the large user community.

Many people have worked with John and appreciated his substantial contributions. Here are some reflections from two people who worked with him most closely.

From Lee Collins:

I met John when he joined our team at Apple in 1991. He came from an internship in Apple's Advanced Technology Group (ATG), having graduated in math and ancient Greek at UC Berkeley. In addition to his technical skills, he could read, write and speak Cantonese. All in all, he was a perfect addition to the team, since one of our main tasks was completion of the first version of the Unicode standard, in particular the Unified Han character set. A key component was the database we had built to track all the different Han character encodings, beginning with Xerox, later adding Mac OS version of JIS, GB, Big5, and KSC, then the unified simplified and traditional mappings provided by Mr Zhang Zhoucai of China. The database was a Hypercard stack that ran on a version of Mac OS I cobbled together to allow Chinese, Japanese and Korean text to be edited and displayed simultaneously. John took over management of that system and database and began to learn the arcane art of Chinese character encoding. He also found time to write a Risk-like game based on the classical world. I don't remember the name of that game, but it was a nice diversion from work.

I had been the primary Unicode representative at the first meetings of international experts to refine what became the ISO 10646 Unified Repertoire and Ordering / Unicode V1.0. The group, initially known as the CJK-JRG (Chinese Japanese, Korean Joint Research Group) later became the current IRG. Hoping he would take over my work, I invited John to join one of the early meetings in Hong Kong, November 1991, and he later became the primary representative. John continued to contribute to the IRG and the Unihan database for the rest of his career.

We both joined the ill-fated Taligent effort, where we developed the internationalization classes that later became the foundation for ICU. Those designs were probably one of the few things of value that came out of Taligent. I left Taligent and went back to Apple. John came back sometime later after IBM took it over completely. I was manager of the team charged with developing Apple's first Unicode-based text library, which we called ATSUI (Apple Type Services for Unicode Imaging). It was largely based on the model of text layout developed for Quickdraw GX. John was the engineer charged with developing the library. That role was not a good fit for John's talents, so he moved to the Typography group where he was responsible for the font tools Apple used to develop our Truetype fonts. My team also developed support for complex scripts like Hindi and Thai, so I often used John's tools to create fonts with the required layout tables.

I moved on to other areas of Apple, ceased to work directly with John, and eventually left Apple. But, since 2015 or so, I again became involved in the IRG as the representative for Vietnam. That allowed me to work with John once more in his various capacities on the Unicode Technical Committee, especially his responsibility for the Unihan database and participation in the IRG. I enjoyed being able to work with him again. Knowing the size and complexity of the work he did for Unicode, he will not be easily replaced.

While we had our differences on technical and work issues at times, he was always a kind and thoughtful person. The world is a lesser place without him.

John was much more familiar with Cantonese than Mandarin due to his missionary work in Hong Kong. I think John’s characters, 井作恆, satisfied two criteria: they are close to his name phonetically (zeng2 zok3 hang4) and look like an actual Chinese name. Purely phonetic transcriptions often use a limited set of characters that look obviously foreign. These don't.

From Ken Lunde:

Nothing brought more joy to John than attending IRG (Ideographic Research Group) meetings, particularly when they took place in Chinese-speaking regions, especially Hong Kong, which held a special place in John’s heart. For those who are unaware, the IRG is responsible for reviewing and preparing the thousands of characters in the growing number of CJK Unified Ideographs blocks, which comprise approximately one-third of the total number of characters in the Unicode Standard.

Fun fact: John and I had an unwritten and informal agreement that he would attend these one-week IRG meetings when they took place in Chinese-speaking regions, and I would attend those hosted elsewhere, in a quasi yin and yang relationship. This would completely explain why I have never attended an IRG meeting in a Chinese-speaking region. This relationship was also evident in John’s focus on all things Chinese and my focus on all things Japanese, though both of us performed sufficiently dangerous dabbling in the other language.

John and I began working much more closely together as a result of COVID-19, which necessitated the formation of the Unicode CJK & Unihan Group, with me serving as the Chair, and John serving as the Vice-Chair. This group, which was formed in early 2020, pre-digests proposals and public feedback, interacts with the IRG, and provides its recommendations to the UTC.

[Photo of Ken Lunde and John Jenkins, October 2022]
Please visit John’s obituary to read more about his extraordinary life, or to express condolences to John’s family:

https://www.larkinmortuary.com/obituary/view/john-howard-jenkins/
[Silver badge]

Thursday, March 2, 2017

PRI #349: Registration of additional sequences in the Adobe-Japan1 collection

PRI349 image The Unicode Consortium has posted a new issue for public review and comment.

Public Review Issue #349: A submission for the "Registration of additional sequences in the Adobe-Japan1 collection" has been received by the IVD registrar.

This submission is currently under review according to the procedures of UTS #37, Unicode Ideographic Variation Database, with an expected close date of 2017-06-02. Please see the submission page for details and instructions on how to review this issue and provide comments: http://www.unicode.org/ivd/pri/pri349/

The IVD (Ideographic Variation Database) establishes a registry for collections of unique, and sometimes shared, variation sequences for CJK Unified Ideographs, which enables standardized interchange in plain text, in accordance with UTS #37, Unicode Ideographic Variation Database.

For further information on Public Review Issues, please see: http://www.unicode.org/review/

Wednesday, January 11, 2017

New Unicode Character Property EquivalentUnifiedIdeograph

sample image A new character property EquivalentUnifiedIdeograph is proposed for addition to Unicode 10.0. This property associates (where possible) the 365 characters in the CJK Radicals Supplement, Kangxi Radicals, and CJK Strokes blocks to an appropriate CJK Unified Ideograph.

For details of the proposal, a link to the proposed data, and information about how to provide feedback, please see Public Review Issue #344.