Thursday, March 30, 2023

The Unicode CLDR v43 Beta is now available for integration testing

[image] CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.). For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)

Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.

It is important to review the Migration section for changes that might require action by implementations using CLDR directly or indirectly (eg, via ICU), and the Specification changes, since those are new since the Alpha.

We appreciate feedback from both ICU and non-ICU consumers of CLDR data. (The Beta has already been integrated into the development version of ICU.) Feedback can be filed at CLDR Tickets. Any tickets should be filed as soon as possible, because the target release date is 2023 Apr 12, Wed.

CLDR 43 is a limited-submission release, focusing on just a few areas:
  1. Formatting Person Names
    • Completing the data for formatting person names, allowing it to advance out of “tech preview”. For more information on the benefits of this feature, see Background.
  2. Locales
    • Adding substantially to the LikelySubtags data: This is used to find the likely writing system and country for a given language, used in normalizing locale identifiers and inheritance. The data has been contributed by SIL
    • Inheritance: Adding components to parentLocales, and documenting the different inheritance for rgScope data, which inherits primarily by region
  3. Other data updates
    • Alternate names for Turkey / Türkiye
    • Name for the new timezone Ciudad Juárez
  4. Structure
    • Adding some structure and data needed for ICU4X & JavaScript, for calendar eras and parentLocales.
  5. Collation & Searching
    • Treat various quote marks as equivalent at a Primary strength, also including Geresh and Gershayim.
To find out more about these and other changes, see the draft CLDR v43 release page, which has information on accessing the date, reviewing charts of the changes, and — importantly — Migration issues.



Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

[badge]

Wednesday, March 22, 2023

Remembering John H. Jenkins (井作恆)

The Unicode community is greatly saddened and affected by the recent and sudden loss of John H. Jenkins, a long-time colleague and friend. John was most recently the Vice-Chair of the Unicode CJK & Unihan Group. The vast majority of characters in the Unicode Standard are Chinese, Japanese, and Korean (aka Han) ideographs, which are historically used with a broader range of languages. These have been challenging characters to deal with in script encoding, because of significant regional drift over hundreds of years. As an expert in Han ideographs, John contributed a non-trivial amount of work and effort, sometimes needing to make difficult character encoding decisions for the benefit of the large user community.

Many people have worked with John and appreciated his substantial contributions. Here are some reflections from two people who worked with him most closely.

From Lee Collins:

I met John when he joined our team at Apple in 1991. He came from an internship in Apple's Advanced Technology Group (ATG), having graduated in math and ancient Greek at UC Berkeley. In addition to his technical skills, he could read, write and speak Cantonese. All in all, he was a perfect addition to the team, since one of our main tasks was completion of the first version of the Unicode standard, in particular the Unified Han character set. A key component was the database we had built to track all the different Han character encodings, beginning with Xerox, later adding Mac OS version of JIS, GB, Big5, and KSC, then the unified simplified and traditional mappings provided by Mr Zhang Zhoucai of China. The database was a Hypercard stack that ran on a version of Mac OS I cobbled together to allow Chinese, Japanese and Korean text to be edited and displayed simultaneously. John took over management of that system and database and began to learn the arcane art of Chinese character encoding. He also found time to write a Risk-like game based on the classical world. I don't remember the name of that game, but it was a nice diversion from work.

I had been the primary Unicode representative at the first meetings of international experts to refine what became the ISO 10646 Unified Repertoire and Ordering / Unicode V1.0. The group, initially known as the CJK-JRG (Chinese Japanese, Korean Joint Research Group) later became the current IRG. Hoping he would take over my work, I invited John to join one of the early meetings in Hong Kong, November 1991, and he later became the primary representative. John continued to contribute to the IRG and the Unihan database for the rest of his career.

We both joined the ill-fated Taligent effort, where we developed the internationalization classes that later became the foundation for ICU. Those designs were probably one of the few things of value that came out of Taligent. I left Taligent and went back to Apple. John came back sometime later after IBM took it over completely. I was manager of the team charged with developing Apple's first Unicode-based text library, which we called ATSUI (Apple Type Services for Unicode Imaging). It was largely based on the model of text layout developed for Quickdraw GX. John was the engineer charged with developing the library. That role was not a good fit for John's talents, so he moved to the Typography group where he was responsible for the font tools Apple used to develop our Truetype fonts. My team also developed support for complex scripts like Hindi and Thai, so I often used John's tools to create fonts with the required layout tables.

I moved on to other areas of Apple, ceased to work directly with John, and eventually left Apple. But, since 2015 or so, I again became involved in the IRG as the representative for Vietnam. That allowed me to work with John once more in his various capacities on the Unicode Technical Committee, especially his responsibility for the Unihan database and participation in the IRG. I enjoyed being able to work with him again. Knowing the size and complexity of the work he did for Unicode, he will not be easily replaced.

While we had our differences on technical and work issues at times, he was always a kind and thoughtful person. The world is a lesser place without him.

John was much more familiar with Cantonese than Mandarin due to his missionary work in Hong Kong. I think John’s characters, 井作恆, satisfied two criteria: they are close to his name phonetically (zeng2 zok3 hang4) and look like an actual Chinese name. Purely phonetic transcriptions often use a limited set of characters that look obviously foreign. These don't.

From Ken Lunde:

Nothing brought more joy to John than attending IRG (Ideographic Research Group) meetings, particularly when they took place in Chinese-speaking regions, especially Hong Kong, which held a special place in John’s heart. For those who are unaware, the IRG is responsible for reviewing and preparing the thousands of characters in the growing number of CJK Unified Ideographs blocks, which comprise approximately one-third of the total number of characters in the Unicode Standard.

Fun fact: John and I had an unwritten and informal agreement that he would attend these one-week IRG meetings when they took place in Chinese-speaking regions, and I would attend those hosted elsewhere, in a quasi yin and yang relationship. This would completely explain why I have never attended an IRG meeting in a Chinese-speaking region. This relationship was also evident in John’s focus on all things Chinese and my focus on all things Japanese, though both of us performed sufficiently dangerous dabbling in the other language.

John and I began working much more closely together as a result of COVID-19, which necessitated the formation of the Unicode CJK & Unihan Group, with me serving as the Chair, and John serving as the Vice-Chair. This group, which was formed in early 2020, pre-digests proposals and public feedback, interacts with the IRG, and provides its recommendations to the UTC.

[Photo of Ken Lunde and John Jenkins, October 2022]
Please visit John’s obituary to read more about his extraordinary life, or to express condolences to John’s family:

https://www.larkinmortuary.com/obituary/view/john-howard-jenkins/
[Silver badge]

Thursday, February 23, 2023

The Unicode CLDR v43 Alpha is now available for integration testing

[image] CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.). For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)

Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.

The Alpha has already been integrated into the development version of ICU. We would especially appreciate feedback from non-ICU consumers of CLDR data and on Migration issues. Feedback can be filed at CLDR Tickets.

Alpha means that the main data and charts are available for review, but the specification, JSON data, and other components are not yet ready for review. Data may change if release-blocking bugs are found. The planned schedule is:
  • 2023 Mar 15, Wed — public Beta (data)
  • 2023 Mar 29, Wed — public Beta2 (data & spec)
  • 2023 Apr 12, Wed — Release
CLDR 43 is a limited-submission release, focusing on just a few areas:
  1. Formatting Person Names
    • Completing the data for formatting person names, allowing it to advance out of “tech preview”. For more information on the benefits of this feature, see Background.
  2. Adding substantially to the LikelySubtags data
    • This is used to find the likely writing system and country for a given language, used in normalizing locale identifiers and inheritance.
    • The data has been contributed by SIL.
  3. Other data updates
    • Alternate names for Turkey / Türkiye
    • Name for the new timezone Ciudad Juárez
  4. Structure
    • Adding some structure and data needed for ICU4X & JavaScript, for calendar eras and parentLocales.
    • Cleanup of the inheritance structure in CLDR
  5. Collation & Searching
    • Treat various quote marks as equivalent at a Primary strength, also including Geresh and Gershayim.

To find out more about these and other changes, see the draft CLDR v43 release page, which has information on accessing the date, reviewing charts of the changes, and — importantly — Migration issues.


Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

[badge]

Tuesday, February 7, 2023

Unicode 15.1 Alpha Review Opens for Feedback

[image] The repertoire for Unicode 15.1 is now open for early review and comment. As a reminder, during alpha review the repertoire is reasonably mature and stable, but is not yet completely locked down. Discussion regarding whether certain characters should be removed from the repertoire for publication is welcome. Character names and code point assignments are reasonably firm, but suggestions for improvement may still be entertained.

This early review is provided so that reviewers may consider the character repertoire and data file issues prior to the start of beta review (currently scheduled to start in May 2023). Once beta review begins, the repertoire, code points, and character names will all be locked down, and no longer be subject to changes.

Notable Changes

Unicode 15.1 adds exactly five characters, for a total of 149,191 characters. The five new characters are Ideographic Description Characters that are used in Ideographic Description Sequences, which represent a mechanism to visually describe the structure of ideographs.

In addition, the code charts for the CJK Unified Ideographs, CJK Unified Ideographs Extension A, and CJK Unified Ideographs Extension B blocks now include representative glyphs and source references for nearly 24,000 KP-source ideographs. Furthermore, the format of the code charts for the CJK Unified Ideographs block was updated to accommodate KP-source ideographs through the addition of a seventh column.

Version 15.1 does not add new emoji characters, however, 118 new RGI emoji ZWJ sequences will be defined.

Feedback for the alpha review should be reported under PRI #473 using the Unicode contact form by April 4, 2023.



Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

[badge]

Monday, February 6, 2023

Announcing New Unicode Adopt-a-Character Site

[image]
The Adopt-a-Character program was launched in 2015. Since that time, AAC funds have supported Unicode's mission to ensure everyone can communicate in their own language. This includes preserving historical scripts such as Egyptian hieroglyphics and providing better language support for digitally disadvantaged and under-resourced languages such as Hanifi Rohingya used in Myanmar and Bangladesh.

Now you can more easily adopt a character and show off your hobby or business, favorite sport, or love – while also supporting a good cause. You can also give the gift of a letter to someone in your life. The possibilities are endless – and each adoption helps Unicode’s goal to support the world’s languages.

All character adoptions are permanent. Adoption of a specific character at the limited gold and silver levels is on a first-come-first-served basis. All sponsors receive a digital badge and are recognized on Unicode’s website, Twitter feed, and Friends of Unicode Facebook page.

To start your adoption, visit our new page!

Unicode, Inc. is a non-profit, 501(c)3 organization and contributions may be eligible for a tax deduction. Please consult with a tax expert for details.



[badge]

Monday, January 23, 2023

New Unicode Consortium CEO

— Mark Davis, President & Unicode Cofounder


In January 1991 I became the first president of the Unicode Consortium, and in that position have presided over the board of directors since then. I’ve had the honor of occupying those roles for just over a gigasecond now, and it's time for a change.

Over time, it became apparent to me, the Consortium’s other officers, and the Board of Directors that our management model was no longer sufficient for what the organization had become over time, and what it needed to be in the future. So, we began to explore a new, more sustainable governance and management model. And an important part of that was succession planning

Among the first major steps in implementing this model was the hiring of Toral Cowieson as our first Executive Director and COO in 2021. Since then, Toral has helped professionalize the management of the Consortium. Working with the Board and the other Officers, Toral has also contributed to strengthening the Consortium’s governance.

The Board and I have also recognized that, as President, I have effectively occupied two distinct roles — CEO and CTO — and that these two different roles require the full attention of two different people. Accordingly, the Board has decided to split these two roles, formally creating the positions of CEO and CTO, while retiring the title of President.

And as its next step — I am delighted to announce — the Board has elected Toral Cowieson as CEO to replace me.

Toral has brought a wealth of experience in leadership across non-profits, corporations, and board service to Unicode. As executive director, she has connected with the people in the organization, provided thoughtful leadership, and instituted and guided changes in our operations and governance.

I’m not stepping off the stage completely. The Board has re-elected me as Chair of the Board, and elected me to the new position of CTO. I’ll also be continuing as chair of the CLDR technical committee as well as contributing to ICU and the UTC in focused areas.

The Unicode Consortium is the forum for companies, countries and other groups to work together on interoperable standards, code, and data — to support internationalizing software around the world. As a simple example, whenever you glance at the date on your cell phone, the text you see is Unicode characters, is formatted for your language according to CLDR language data (including for English), and uses ICU code libraries to make that all work.

As CTO, my main goal this year will be to work with the board, technical groups, and invited experts to continue maintaining and extending that foundation for so much of the world’s software, while formulating a strategy for meeting upcoming requirements and taking advantage of new technologies.

In addition, I am also pleased to announce some additional changes. I’ve worked extensively with each of these people, and have the fullest confidence that they will do great work in these new roles.

  • Peter Constable is a Technical Vice President and the Chair of the UTC. Since 2003, Peter has worked for Microsoft on various projects related to Unicode, internationalization, text display and fonts. He became a Unicode technical director in 2008 and later served as Treasurer.
  • Addison Phillips is the new Chair of the Message Formatting Working Group. Addison is also the chair of the W3C Internationalization Working Group and an active participant in the creation of internationalization standards such as Unicode. He and I are co-authors of IETF BCP 47, which is the standard for language and locale identifiers.
  • Elango Cheran is the Vice-Chair of the recently formed Community Engagement team and an internationalization engineer at Google. He actively contributes to the ICU and ICU4X projects, and to the MessageFormat Working Group.
Additional information available here:
Unicode Executive Officers
Unicode Fellows, Staff and Support
Unicode Technical Committee Chairs
Unicode Organization Chart

Photo by Michael Dziedzic on Unsplash


Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

[badge]

Tuesday, January 17, 2023

What’s New in Emoji 15.1?

Doing more, with less

By: Jennifer Daniel, Chair of the Emoji Subcommittee

[image phoenix]

This past Fall, the Unicode Technical Committee announced the delay of Unicode 16.0. This wasn’t without precedent — COVID slowed down the release of Unicode 14.0 in 2020 and the world seemed to survive 😉. Subcommittees were well prepared and adjusted accordingly, discussing what this meant for their respective areas of expertise.

For the Emoji Subcommittee (ESC) — the group responsible for defining the rules, algorithms, and properties necessary to achieve interoperability between different platforms for those smiley faces that appear on your keyboard (Shout out 😁🥰🥹🤔🫣🫡😵‍💫!) — this delay presented an opportunity. Sure, we were so close to exhaling a sigh of relief (the intake period for Emoji 16.0 proposals had just completed). But upon learning we couldn’t ship any new codepoints until 2024 we turned our energy towards recommending new emoji based on existing ones. (These are called emoji ZWJ sequences. That's when a combination of multiple emoji display as a single emoji … like 👩 🏽 +🏭 = 🧑🏽‍🏭).

When Less is More

An incredibly powerful aspect of written language is that it consists of a finite number of characters that can "do it all". And yet, as the emoji ecosystem has matured over time our keyboards have ballooned and emoji categories are about to hit or have hit a level of saturation. Upon reflecting on how emoji are used, the ESC has entered a new era where the primary way for emoji to move forward is not merely to add more of them to the Unicode Standard. Instead, the ESC approves fewer and fewer emoji proposals every year.

But our work is not done. Not by a longshot. Language is fluid and doesn’t stand still. There is more to do! This “off-cycle” gives us a chance to address some long-standing major pain points using emoji. The first one that came to mind: skin-tone.

What is a family?

The encoding of multi-person multi-tone support has matured over the years; however, the implementation can seem random to the average person: While it’s true, all people emoji have toned options (with the exception of characters where you can’t see skin like 🤺) there are … misfits. Some two people emoji offer tone support ( 🧑🏻‍❤️‍🧑🏿) others do not ( 👯). A few non RGI emoji render with tone but with no affordance to change one of the two characters (For example, 🤼🏾‍♂).

And then … There is the suite of family emoji (👨‍👦👨‍👦‍👦👨‍👧👨‍👧‍👦👨‍👧‍👧👩‍👦👩‍👦‍👦👩‍👧👩‍👧‍👦👩‍👧‍👧 👨‍👨‍👦👨‍👨‍👦‍👦👨‍👨‍👧👨‍👨‍👧‍👦👨‍👨‍👧‍👧👩‍👩‍👦👩‍👩‍👦‍👦👩‍👩‍👧👩‍👩‍👧‍👦👩‍👩‍👧‍👧👨‍👩‍👦👨‍👩‍👦‍👦👨‍👩‍👧👨‍👩‍👧‍👦👨‍👩‍👧‍👧👪). These characters include two people, three people, sometimes four and none of them have any tone support (!). We seem to have a lot of family emoji and yet simultaneously not enough.

The 26 “family” emoji can be broken down into four groups:

[image families]

Despite the Unicode Standard containing 26 “family” emoji, each one of these glyphs is overly prescriptive with regard to delivering on a visual representation of a family. The inclusion of many permutations of families was well intentioned. But we can’t list them all, and by listing some of the combinations, it calls attention to the ones that are excluded.

What even is a family? For some, family is the people you were raised with. Others have embraced friends as their chosen family. Some families have children, other families have pets. There are multi-generational families, mutli-racial families and of course many families are any combination of all of these characteristics and more.

Fortunately, we don’t need to add 7000 variants to your keyboards (even this would fall short of capturing the breadth of "family" as a concept). Instead we can juxtapose individual emoji together to capture a concept with some reasonable level of specificity — not too unlike arranging letters together to create words to convey concepts 😉

[image toned families]

For emoji keyboards to advance in creating more intuitive and personalized experiences the Emoji Subcommittee is recommending a visual deprecation of the family emoji. This small set of emoji will be redesigned as part of a multi-phase effort to “complete the set” of toned variants for the remaining multi-person emoji. This of course begs the question: when there are as many families as there are people in the world, is there an effective way at conveying the concept of “family” without being overly prescriptive in defining what is and is not a family? Well, thankfully icons can do a lot of heavy lifting without requiring very much detail.
[image before-after]

When is an emoji running for the police or getting chased by them?

Another area the ESC is actively exploring is how the semantics of emoji sequences can differ when writing directionality changes. Some emoji characters have semantics that encode implicit directionality but when the string is mirrored and their meaning may be unintentionally lost or changed.

[image rightwards]
Left to Right Emoji Sequence
Quickly running towards an “exciting” police chase


[image leftwards]
Right to Left Emoji Sequence
Running away from the coppers


What, if anything, can we do to aid in ensuring that messages are meaningfully translated be them tiny pictures or tiny letters? As part of 15.1 we’re proposing a small set of emoji with strong directionality — with an initial focus on people — to face the opposite direction. Soon you too can run towards or away from … excitement.

Emoji 15.1

Given that the intake cycle of emoji proposals for Unicode 16.0 ended last July, the Emoji Subcommittee has also decided to temporarily delay the intake of Unicode Version 17.0 proposals until April 2024. Fortunately, you won’t have to wait until then to get new emoji. Among the list of recommendations includes 578 characters (most of them the candidates described above to support directionality). The list also includes a few humble additions including a broken chain, a lime, a non-poisonous mushroom, a nodding and shaking face, and a phoenix bird. Each one of these leverages a unique valid ZWJ sequence of emoji so while they look like atomic characters made of a single codepoint they are composed of two or more codepoints.

[image candidates]

Broken chain is the result of 🔗💥, with a variety of meanings, such as freedom, breaking a cycle, or perhaps a broken url ;-). Like the bi-directional emoji touched on above, nodding face and shaking face are the result of 🙂↔️and 🙂↕️ respectively. Oh, and of course there is a phoenix rising from the ashes (🐦🔥), a perfect metaphor to capture where we are today.

The Unicode Technical Committee (UTC) will review the required documents at its first meeting of 2023 in January – and if these candidates move forward, you can expect an update from the UTC later this Spring and Summer.


Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

[badge]

Wednesday, December 21, 2022

Unicode in 2022

2022 Image

Hello Everyone!

As we go into the New Year, the Unicode team thought we’d share some highlights from this past year. From source-code spoofing to preserving indigenous languages, the Unicode team has had another full year, including expanding the number of characters that appear on billions of devices around the world.


Nearly 150,000 characters!

On the character side, we reached a total of just shy of 150,000 characters (149,186 to be exact). Of the 4,489 characters added in the 15.0 release, the biggest set was 4,192 ideographs for use in Chinese, Japanese, and Korean. There are also two new scripts, Nag Mundari and Kawi. Nag Mundari is a script used to write the Mundari language of India, a language with 1.1 million speakers. Kawi is an important historic script of insular Southeast Asia, found in inscriptions and on artifacts in several languages dating from the 8th to the 16th centuries — and is undergoing a revival today amongst enthusiasts.

And we can’t forget the 20 new emoji characters — we’re looking forward to seeing which are the most popular: shaking face? Goose? Maracas? Pink heart? If you’re involved in implementing emoji, you’ll also want to look at latest changes in UTS #51 Unicode Emoji.

See the Unicode15.0.0 page for more details. We’re also changing how we do releases — for more, see 2023 Release Planning.

The Launch of ICU4X

ICU is used in every major device and operating system; it’s how you see a date or number on your phone, for example. This new project, ICU4X, was created to solve the needs of clients who wish to provide client-side internationalization for their products in resource-constrained environments and across many programming languages. After 2½ years of work by Google, Mozilla, Amazon, and community partners, the Unicode Consortium has published ICU4X 1.0, its first stable release. Built from the ground up to be lightweight, portable, and secure, ICU4X learns from decades of experience to bring localized date formatting, number formatting, collation, text segmentation, and more to devices that, until now, did not have a suitable solution. For details, see Announcing ICU4X 1.0.

When does i ≠ і?

Can you tell the difference between i and і? Yeah, most people can’t. The first set of changes to help counter source-code spoofing were included in the 15.0 versions of the UAX #9 Unicode Bidirectional Algorithm, UAX #31 Unicode Identifier and Pattern Syntax, and UTS #39 Unicode Security Mechanisms.

For 2023, there is a new draft UTS #55 Unicode Source Code Handling, providing guidance for programming language designers and tooling developers, and specifying mechanisms to avoid usability and security issues arising from improper handling of Unicode. More changes are on their way for UAX #9, UAX #31, and UTS #39 as well.

Åge Møller, Πέτρος Νικόλαος Καρατζής, ராஜேந்திர சோழன்

We’re making great progress on internationalized formatting of people’s names. What does that mean? Software needs to be able to format people's names, such as John Smith or 宮崎駿. The formatting can be surprisingly complicated: for example, people may have a different number of names, depending on their culture — they might have only one name (“Zendaya”), only two (“Albert Einstein”), or three or more. So the software needs to handle missing or extra name fields gracefully.

There are many more complexities — for more details, see Formatting people’s names.

You have 2 unread messages.

Or, you have 3 items in your cart. Whenever a computer needs to construct a sentence using “placeholders” such as 3, it is formatting a message. The current industry standard is ICU’s message formatting; a project started about 3 years ago, with the goal of improving on that to build a more robust and extensible mechanism. There is now a Tech Preview in ICU — we’d urge developers to try it out!

See message-format-wg for details on the syntax and message2/package-summary.html for the API (note that the ICU’s convention for tech previews is to mark as Deprecated), and the test code in MessageFormat2Test.java for examples of usage.

(There are of course other fixes, upgrades and new features in ICU: see ICU 72 and ICU 71 for more details.)

Māori, ‎Wolof, тоҷикӣ, ‎‎کٲشُر, ‎ትግርኛ, कॉशुर‎, ‎মৈতৈলোন্, ‎ᱥᱟᱱᱛᱟᱲᱤ

In CLDR, we now have 95 languages at the Modern level (suitable for full UI internationalization), 6 at the Moderate level (suitable for “document content” internationalization), and 29 at the Basic level (suitable for locale selection). We added a tech preview of formatting for person names, plus additions for Unicode 15.0 (emoji names and search keywords), names for new scripts, new CJK collation, and so on. For more information, see CLDR v42.

Revitalization and Preservation of Indigenous Languages

The Nattilik language community was unable to use their language reliably for even simple, everyday digital text exchanges such as email or text messaging. The Typotheque Syllabics Project, an initiative based out of Toronto and The Hague, Netherlands, undertook research with language keepers across various Syllabics-using Indigenous communities in Canada. By collaborating with Nattilik language keepers and elders in the community, key issues the Nattilik community of Western Nunavut faced were identified, and it was discovered that there were 12 missing syllabic characters from the Unicode Standard. The Consortium worked with the Typotheque Syllabics Project to add 16 characters to the script to support Nattilik and other languages in Unicode version 14.0, and improved the glyphs in Unicode version 15.0. See this blog post from June.

The Past and Future of Flag Emoji

Despite being the largest emoji category with a strong association tied to identity, flags are by far the least used. Flag emoji have always been subject to special criteria due to their open-ended nature, infrequent use, and burden on implementations. The addition of other flags and thousands of valid sequences into the Unicode Standard has not resulted in wider adoption. They don’t stand still, are constantly evolving, and due to the open-ended nature of flags, the addition of one creates exclusivity at the expense of others. Curious to learn more? Read more about the Past and Future of Flag Emoji.

Available Now! New YouTube Playlist and Technical Quick Start Guide

On September 28th, Unicode held a webinar on the “Overview of Internationalization and Unicode Projects” for Unicode enthusiasts. Unicode technical leadership and other experts shared background on our core projects with participants from more than 30 countries. If you missed the webinar, no worries! The recorded sessions are available on this YouTube playlist. And if you are new to Unicode and internationalization or simply want a refresh, you can also check out our Technical Quick Start Guide. This handy guide explains what Unicode is, including answering the question, “What is Internationalization and Why it Matters.” There are also useful links to more detailed information and how you can get involved. Read more here.

Support Unicode 💞💕💌💯✨🌟🤠🛟🎁

Finally, if you are already a contributor to — or member of Unicode (or your company or organization is!), thank you, Danke, Děkuju, धन्यवाद, merci, 谢谢你, grazie, நன்றி, and gracias! What we have accomplished is only possible because of supporters like you.

And if you want to support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode is a US-based non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.