Showing posts with label Unicode. Show all posts
Showing posts with label Unicode. Show all posts

Monday, May 20, 2024

Unicode CLDR Version 46 Submission Open

[image] The Unicode CLDR Survey Tool is open for submission for version 46. CLDR provides key building blocks for software to support the world’s languages (dates, times, numbers, sort-order, etc.) All major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)

Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.

Version 46 is focusing on:
  • Unicode 16 additions: new emoji, script names, collation data (Chinese & Japanese), …
  • Emoji search keywords: Expanding keyword coverage to make it easier for users to find the right emoji
  • New Languages targeting Basic:
    • Ewe (ee),
    • Ga (gaa)
    • Kinyarwanda (rw)
    • Northern Sotho (nso)
    • Oromo (om),
    • Sesotho (st)
    • Setswana (tn),
  • Up-leveling: Akan (ak)
Submission of new data opened recently, and is slated to finish on June 11. The new data then enters a vetting phase, where contributors work out which of the supplied data for each field is best. That vetting phase is slated to finish on July 1. A public alpha makes the draft data available around August 28, and the final release targets October 16.

Each new locale starts with a small set of Core data, such as a list of characters used in the language. Submitters of those locales need to bring the coverage up to Basic level (very basic basic dates, times, numbers, and endonyms) during the next submission cycle.

Once a language reaches Basic coverage, it has the minimum support for use in language selection, such as on mobile devices. In the next submission cycle, the name for that language is also added for translation for all languages at Modern coverage.

If you would like to contribute missing data for your language, see Survey Tool Accounts. For more information on contributing to CLDR, see the CLDR Information Hub.


Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Friday, March 8, 2024

Breaking the Cycle 🔗💥

by Jennifer Daniel

(This article was originally published on Jennifer’s Substack, January 17, 2023. Republished here with minor revision.)

Phoenix image
In the fall of 2022, the Unicode Technical Committee announced that the 2023 release of the Unicode Standard would be a “dot” release with limited character additions, with the next major release in 2024. This wasn’t without precedent — COVID slowed down the release of Unicode 14.0 in 2020 and the world seemed to survive 😉. Subcommittees were well prepared and adjusted accordingly, discussing what this meant for their respective areas of expertise.

For the Emoji Subcommittee (ESC) — the group responsible for defining the rules, algorithms, and properties necessary to achieve interoperability between different platforms for those smiley faces that appear on your keyboard (Shout out 😁🥰🥹🤔🫣🫡😵‍💫!) — this delay presented an opportunity. Sure, we were so close to exhaling a sigh of relief (the intake period for Emoji 16.0 proposals had just completed). But upon learning we couldn’t ship any new codepoints until 2024 we turned our energy towards recommending new emoji based on existing ones. (These are called emoji ZWJ sequences. That's when a combination of multiple emoji display as a single emoji … like 👩 🏽 +🏭 = 🧑🏽‍🏭).

When Less is More

An incredibly powerful aspect of written language is that it consists of a finite number of characters that can "do it all". And yet, as the emoji ecosystem has matured over time our keyboards have ballooned and emoji categories are about to hit or have hit a level of saturation. Upon reflecting on how emoji are used, the ESC has entered a new era where the primary way for emoji to move forward is not merely to add more of them to the Unicode Standard. Instead, the ESC approves fewer and fewer emoji proposals every year.

But our work is not done. Not by a longshot. Language is fluid and doesn’t stand still. There is more to do! This “off-cycle” gives us a chance to address some long-standing major pain points using emoji. The first one that came to mind: skin-tone.

What is a family?

The encoding of multi-person multi-tone support has matured over the years; However, the implementation can seem random to the average person: While it’s true, all people emoji have toned options (with the exception of characters where you can’t see skin like 🤺) there are … misfits. Some two people emoji offer tone support ( 🧑🏻‍❤️‍🧑🏿) others do not ( 👯). A few non RGI emoji render with tone but with no affordance to change one of the two characters (For example, 🤼🏾‍♂ renders with skintone on Android but as gold on iOS. WHY. This is why we standardize these things, people).

And then ... There is the suite of family emoji (👨‍👦👨‍👦‍👦👨‍👧👨‍👧‍👦👨‍👧‍👧👩‍👦👩‍👦‍👦👩‍👧👩‍👧‍👦👩‍👧‍👧 👨‍👨‍👦👨‍👨‍👦‍👦👨‍👨‍👧👨‍👨‍👧‍👦👨‍👨‍👧‍👧👩‍👩‍👦👩‍👩‍👦‍👦👩‍👩‍👧👩‍👩‍👧‍👦👩‍👩‍👧‍👧👨‍👩‍👦👨‍👩‍👦‍👦👨‍👩‍👧👨‍👩‍👧‍👦👨‍👩‍👧‍👧👪). These characters include two people, three people, sometimes four and none of them have any tone support (!). We seem to have a lot of family emoji and yet simultaneously not enough.

The 26 “family” emoji can be broken down into four groups:

Families image

Despite the Unicode Standard containing 26 “family” emoji, each one of these glyphs is overly prescriptive with regard to delivering on a visual representation of a family. The inclusion of many permutations of families was well intentioned. But we can’t list them all, and by listing some of the combinations, it calls attention to the ones that are excluded.

What even is a family? For some, family is the people you were raised with. Others have embraced friends as their chosen family. Some families have children, other families have pets. There are multi-generational families, mutliracial families and of course many families are any combination of all of these characteristics and more.

Fortunately, we don’t need to add 7000 variants to your keyboards (even this would fall short of capturing the breadth of "family" as a concept). Instead we can juxtapose individual emoji together to capture a concept with some reasonable level of specificity — not too unlike arranging letters together to create words to convey concepts 😉

Different families image
For emoji keyboards to advance in creating more intuitive and personalized experiences the Emoji Subcommittee is recommending a visual deprecation of the family emoji. This small set of emoji will be redesigned as part of a multi-phase effort to “complete the set” of toned variants for the remaining multi-person emoji. This of course begs the question: when there are as many families as there are people in the world, is there an effective way at conveying the concept of “family” without being overly prescriptive in defining what is and is not a family? Well, thankfully icons can do a lot of heavy lifting without requiring very much detail.

Famiy, symbol image

When is an emoji running for the police or getting chased by them?

Another area the ESC is actively exploring is how the semantics of emoji sequences can differ when writing directionality changes. Some emoji characters have semantics that encode implicit directionality but when the string is mirrored and their meaning may be unintentionally lost or changed.

Left to Right emoji image
Left to Right Emoji Sequence: Quickly running towards an “exciting” police chase

Right to Left emoji image
Right to Left Emoji Sequence: Running away from the coppers

What, if anything, can we do to aid in ensuring that messages are meaningfully translated be them tiny pictures or tiny letters? As part of 15.1 we’re proposing a small set of emoji with strong directionality — with an initial focus on people — to face the opposite direction. Soon you too can run towards or away from ... excitement.

Emoji 15.1

Given that the intake cycle of emoji proposals for Unicode 16.0 ended last July, the Emoji Subcommittee has also decided to temporarily delay the intake of Unicode Version 17.0 proposals until April 2024. Fortunately, you won’t have to wait until then to get new emoji. (Note: I know it sounds like I’m talking about the past and future simultaneously ... the emoji lifecycle is looooong and as a result overlaps with multiple releases. Expect a future blog post about the Emoji 15.0 candidates landing early this year (Shout out goose, pink heart, and pushing hands). I’ve been holding off writing about this set until you can actually see them on your phones but given that we’re already talking about 2024 maybe it’s time I dust that blog post off).

Emoji 2023 timeline image

Anyways, among the list of Emoji 15.1 recommendations for 2024 includes 578 characters (most of them the candidates described above to support directionality). The list also includes a few humble additions including a broken chain, a lime, a non-poisonous mushroom, a nodding and shaking face, and a phoenix bird. Each one of these leverages a unique valid ZWJ sequence of emoji so while they look like atomic characters made of a single codepoint they are composed of two or more codepoints.

Broken chain and other emoji image

Broken chain is the result of a 🔗💥 ZWJ and contains a variety of meanings, such as freedom, breaking a cycle, or perhaps a broken url ;-). Nodding face and shaking face are composed of arrows to imply movement in a still image (🙂↔️) and (🙂↕️). Oh, and of course there is a phoenix rising from the ashes (🐦🔥), an ancient metaphor that captures the zeitgeist of today.

The Unicode Technical Committee (UTC) will review the required documents at its first meeting of 2023 in January – and if these candidates move forward, you can expect an update from the UTC later this Spring and Summer.


Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Wednesday, November 15, 2023

Looking to give back differently for this #GivingTuesday?

[image of 3 badges]
Adopt a Character or Emoji to give it the attention it deserves!

Now you can adopt a character and show off your hobby or business, favorite sport, or love. For that special someone who seems to have everything, you can also give a unique gift.

Allergies? 🤧 Traveling? ✈️ No worries, the cat emoji 😺 has no fur and requires no feeding! The dog emoji 🐶? No need to go out for a 3 am walk! Looking to be a Scrabble champ? The strong and fast letter Z is right for you!

​Your good friend is studying to be a doctor. How about the stethoscope emoji as a gift? 🩺Or even an emoji to support your favorite college football team this season! 🏈

With nearly 150,000 characters there's something for everyone! The possibilities are endless! It's also a tax-deductible donation in the United States, to the extent allowed by law. Your company may also provide matching funds.

☯🏏 🏈 ⚽ 🔥🎁💍爱戀🥳 🙌 🎂💗💟₨ ₪ € ₭ ₱🥰 😍♕Ωπ

About Adopt-a-Character

The Adopt-a-Character program was launched in 2015 to support Unicode's mission to ensure everyone can communicate in their own languages. Adopt-a-Character funds have supported work on historic scripts, including Old Uyghur, Old Sogdian, Sogdian, Seal Script (China), and Mayan Hieroglyphs, and Egyptian Hieroglyphs. Additional support has been provided to encode the modern scripts Hanifi Rohingya, Tolong Siki, and Sunuwar, among others.

Characters can be adopted at three levels:

Gold - $5,000
For any particular character there can only be one Gold adoption! Be the only!

Silver - $1,000
For any particular character there can only be five Silver adoptions! Be one of the five to adopt your favorite characters as a Silver adopter!

Bronze - $100
For any character, there are an unlimited number of Bronze-level adoptions! Also a wonderful option!

Each adoption is recognized with a digital badge that you (or your recipient!) can proudly share via your social channels and via websites. Adoptions also come with a digital certificate that you can print to display or email to your giftee!

About the Unicode Consortium

The Unicode Consortium is the premier 501(c)3 non-profit, open source, open standards body for the Internationalization of software and services. It is arguably the most widely deployed software in the world available across 20 billion devices and counting! At its core, Unicode enables people around the world to communicate in any language.

And - if you want to simply make a donation to support Unicode’s work, you can do that, too!

This Giving Tuesday, let's come together to continue to celebrate and preserve linguistic diversity. Adopt a character and make a difference!

Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

[badge]

Wednesday, November 1, 2023

What do a leafless tree, a fingerprint, and a harp have in common?

This is not a set up to a riddle. This is Emoji 16.0.

By Jennifer Daniel, Chair of the ESC


This week, the Unicode Technical Committee gathered for our last meeting of 2023 to discuss the encoding, data files, and list of characters related to digitizing the world’s languages. Amongst the topics discussed were emoji and as a result seven new characters are on their way for inclusion into the Unicode Standard, into your keyboards, and into your hearts ;-)

emoji table image
The final recommendations culminated in seven emoji: one emoji per major category.

An incredibly powerful aspect of written language is that it consists of a finite number of characters that can “do it all”. And yet, as the emoji ecosystem has matured over time our keyboards have ballooned and emoji categories are about to hit — or have hit — a level of saturation. Upon reflecting on how emoji are used, the Unicode Emoji Subcommittee (ESC) has entered a new era where the primary way for emoji to move forward is not merely to add more of them to the Unicode Standard, but to consider how the ones added provide the most linguistic flexibility. As a result, the ESC approves fewer and fewer emoji proposals every year.

The few that are added this year have demonstrated their adaptability in different contexts — take for example, fingerprint. It is commonly used to represent multiple concepts. Fingerprints are a symbol of identity (unique as you), security (as a passkey), and forensics (what crime show logo is complete without a fingerprint?). While we think of fingerprints as a relatively modern phenomenon according to Forensics Digest, the earliest use of fingerprints dates back to 1000 B.C.

In fact all of this year’s emoji candidates have deep roots in history. Harps have been known since antiquity in Asia, Africa, and Europe, dating back at least as early as 3000 BCE. Today it has political, sporting, corporate, and religious symbolism 👼 Leafless trees have been around as long as ... well, trees (and poetry!) I suppose. Leafless trees literally represent droughts or winter and metaphorically indicate a state of barrenness and death.

Shovel isn’t just another noun — sure, yes, it’s a tool commonly found in your shed — in our keyboards, however, it’s also a verb. Digging yourself out of a hole, digging yourself into a hole, shoveling 💩, it does it all. But wait, there’s more. Splatter is one of those stealth emoji that when you look at you might be thinking, “really, another sex emoji?” (To be honest, show me someone who doesn’t think an emoji is a sex emoji and I’ll show you someone who lacks imagination). Splatter is a spill. Splatter is expressive. Splatter is soft —  a perfect counterpoint to collision 💥 — the bouba to 💥’s kiki.

When can you get these new emoji?

A simple question that deserves a simple answer. Alas, you’re dealing with Unicode so the answer is complex. Did you know it can take up to two years to encode an emoji? It’s true. If we want the symbols we digitize to truly “just work” across the entirety of not just the Internet but all digital surfaces … it takes time. So, don’t expect to see these characters anytime soon. In fact, despite the previous batch of emoji (phoenix, lime, broken chain, etc.) getting approved last year they still haven’t landed on your device of choice yet but are well on their way to pop up in the first half of 2024.

emoji at a glance
Emoji 16.0 has a long road ahead and will appear on most devices in May-June 2025.



Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

[badge]

Tuesday, September 12, 2023

Announcing The Unicode® Standard, Version 15.1


Version 15.1 of the Unicode Standard is now available. This minor version update includes updated code charts, data files and annexes. The core specification is unchanged from Unicode Version 15.0.

This version adds 627 characters, bringing the total number of characters to 149,813. The additions include 622 CJK unified ideographs in a new block, CJK Unified Ideographs Extension I. These new ideographs are urgently needed in China for use in public service databases, and are expected to be included in a forthcoming amendment to China’s GB 18030-2022 standard. The other new characters are five ideographic description characters that enhance the ability to describe rare or not-yet-encoded CJK ideographs.

There are six completely new emoji, such as for phoenix and lime and (finally) an edible mushroom. For 108 people emoji, you can now switch the direction that they are facing (for example, person walking facing right versus facing left).

Security-related updates have been made to UAX #9, Unicode Bidirectional Algorithm and UAX #31, Unicode Identifiers and Syntax along with updates to UTS #39, Unicode Security Mechanisms. These updates complement the release of a new Unicode Technical Standard, UTS #55, Unicode Source Code Handling.

The new characters are limited to three blocks, and the code charts for several other blocks have changed. The most significant change to charts is for the CJK Unified Ideographs, CJK Unified Ideographs Extension A and CJK Unified Ideographs Extension B blocks with the addition of representative glyphs and source references for over 24,000 KP-source (North Korea) ideographs. There are also many other glyph corrections and improvements—see the 15.1 delta code charts for details.

Significant updates have been made to UAX #14, Unicode Line Breaking Algorithm and UAX #29, Unicode Text Segmentation adding better support for scripts of South and Southeast Asia, including grapheme cluster support for aksaras and consonant conjuncts, and line breaking at orthographic syllable boundaries.

For complete details on Unicode Version 15.1, see https://www.unicode.org/versions/Unicode15.1.0/.



Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

[badge]

Tuesday, May 23, 2023

Unicode 15.1 Beta Review Open

[image] The beta review period for Unicode 15.1 has started, and is open until July 4, 2023. The beta is intended primarily for review of character property data and changes to algorithm specifications (Unicode Standard Annexes).

Normally at this phase of a release, the character repertoire is considered stable and very unlikely to change. Also, the plan for Unicode 15.1 had been for a minor release with only a very limited set of new characters.

Recent developments have led to a tentative change in those plans, however.

China has a very urgent need for encoding of certain CJK ideographs used in public services databases. To accommodate this urgent need, the Unicode Technical Committee (UTC) decided at its April 2023 meeting to encode 603 new characters in Unicode 15.1 as CJK Unified Ideographs Extension I. This new block is included in the delta charts for the Unicode 15.1 beta. However, inclusion of these characters in Unicode 15.1 is contingent on support for this addition from China, and on support for this addition in the corresponding ISO/IEC 10646 standard from ISO/IEC JTC 1/SC 2 at their upcoming meeting in June. While support for the new block is anticipated, there is a small chance that minor changes to this repertoire will be made after the beta, or that UTC will pull this block entirely from the 15.1 release.

Several of the Unicode Standard Annexes have significant modifications and associated data changes for version 15.1. For example, UAX #14, Unicode Line Breaking Algorithm has significant enhancements to support line breaking at orthographic syllable boundaries in several South and Southeast Asian scripts. Also, in conjunction with the parallel development of a new standard, UTS #55, Unicode Source Code Handling (see Public Review Issue #474), there are significant revisions to UAX #31, Unicode Identifiers and Syntax that will provide better specifications and guidance related to security, and also improved guidance for applications that define identifier systems using Unicode.

While draft content for the beta has been published as of May 23rd, the work groups preparing updates to the content could continue to make changes to data or specs during the Beta review period. Any substantive changes for the beta will be frozen by June 5th.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by July 4, 2023. The review period will only be for six weeks, so prompt feedback is appreciated. Feedback instructions are on the beta page.

See https://www.unicode.org/versions/beta-15.1.0.html for more information about testing and providing feedback on the 15.1.0 beta.

See https://www.unicode.org/versions/Unicode15.1.0/ for the current draft summary of Unicode Version 15.1.0.



Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

[badge]

Monday, November 14, 2022

The Unicode® Standard – 2023 Release Planning

By Peter Constable, Chair of the Unicode Technical Committee

[image] At the Q4 Unicode Technical Committee (UTC) meeting held from November 1-3, our member representatives unanimously agreed to a release plan for 2023 and tentative plan for 2024. Along with some tooling updates, our plans aim to ensure that we are more agile to meet the evolving internationalization landscape and better able to meet the needs of Unicode members and other consumers of the Standard.

More information can be found in the Release Management Group’s Recommendations for 2023-2024.

BACKGROUND

For several years now, the UTC has worked on an annual cycle for new versions of The Unicode Standard and related specifications. New versions used to be released in March of each year, but in 2021, due to COVID-19, the release was delayed until September. 

MOVING FORWARD

Going forward, our plan is to continue with a new release each year in September. That annual, predictable cycle works well for Unicode's other major projects—CLDR and ICU—and helps implementers in their planning. 

In 2023, we will keep up that cadence with a September release, but we also need to take some time to evaluate and update our processes for developing each new version of the Standard.

Therefore, the 2023 release will be a “dot” release: Unicode 15.1. It will include important updates to Unicode Standard Annexes and to the Unicode Character Database, and have a limited set of new characters — but new scripts and most other character additions will be held until the 2024 release. A major new area is the planned release of a Unicode Technical Standard for avoiding source-code spoofing, along with associated changes in other specifications.

Regarding emoji, if there are any new emoji in the 15.1 release, they would leverage existing code points, as was done for the 13.1 release, rather than the addition of entirely new characters.

2024 AND BEYOND

For 2024, we anticipate returning to our regular cadence, with a major release in September 2024. Unicode 16.0 will include additional new scripts, emoji and other characters, as well as other updates.



Learn more about how you can support the Unicode Consortium and our mission, including information on our Adopt-a-Character program, here!
[badge]

Tuesday, September 13, 2022

Announcing The Unicode® Standard, Version 15.0

[Nag Mundari image] Version 15.0 of the Unicode Standard is now available, including the core specification, annexes, and data files. This version adds 4,489 characters, bringing the total to 149,186 characters. These additions include two new scripts, for a total of 161 scripts, along with 20 new emoji characters, and 4,193 CJK (Chinese, Japanese, and Korean) ideographs. The new scripts and characters in Version 15.0 add support for modern language groups including:
  • Nag Mundari, a modern script used to write Mundari, a language spoken in India
  • A Kannada character used to write Konkani, Awadhi, and Havyaka Kannada in India
  • Kaktovik numerals, devised by speakers of Iñupiaq in Kaktovik, Alaska for the counting systems of the Inuit and Yupik languages
Among the popular symbol additions are 20 new emoji, including hair pick, maracas, jellyfish, khanda, and pink heart. For the full list of new emoji characters, see emoji additions for Unicode 15.0, and Emoji Counts. For a detailed description of support for emoji characters by the Unicode Standard, see UTS #51, Unicode Emoji.

[Image credit Noto Emoji]

Other symbol and notational additions include:
Support for other languages and scholarly work includes:
  • Kawi, a historical script found in Southeast Asia, used to write Old Javanese and other languages
  • Three additional characters for the Arabic script to support Quranic marks used in Turkey
  • Three Khojki characters found in handwritten and printed documents
  • Ten Devanagari characters used to represent auspicious signs found in inscriptions and manuscripts
  • Six Latin letters used in Malayalam transliteration
  • Sixty-three Cyrillic modifier letters used in phonetic transcription
Important chart font updates include:
  • A set of updated glyphs for Egyptian hieroglyphs, in addition to standardized variation sequences to support rotated glyphs found in texts
  • Improved glyphs for Unified Canadian Aboriginal Syllabics, which provide better support for Carrier and other languages
  • A new Wancho font, with improved and simplified shapes
Updates to the CJK blocks add:
  • 4,192 ideographs in the new CJK Unified Ideographs Extension H block
  • One ideograph in the CJK Unified Ideographs Extension C block
Unicode properties and specifications determine the behavior of text on computers and phones. The following six Unicode Standard Annexes and Technical Standards have noteworthy updates for Version 15.0:
  • UAX #9, Unicode Bidirectional Algorithm, amends the note in UAX9-C2 to emphasize the use of higher-level protocols to mitigate potential source code spoofing attacks.
  • UAX #31, Unicode Identifier and Pattern Syntax, provides more guidance on profiles for default identifiers, clarifies the use of default ignorable code points in identifiers, and discusses the relationship between Pattern_White_Space and bidirectional ordering issues in programming languages.
  • UAX #38, Unicode Han Database, adds the kAlternateTotalStrokes property. The kCihaiT property’s category was changed to Dictionary Indices, the kKangXi property was expanded, and Sections 3.0, 3.10, and 4.5 were added.
  • UTS #39, Unicode Security Mechanisms, changes the zero width joiner (ZWJ) and zero width non-joiner (ZWNJ) characters from Identifier_Status=Allowed to Identifier_Status=Restricted; they are therefore no longer allowed by the General Security Profile by default.
  • UAX #45, U-Source Ideographs, has records for new ideographs in its data file, “ExtH” was added as a new status, the status identifiers for the existing CJK Unified Ideographs blocks were improved, and Section 2.5 was added.
  • UTS #46, Unicode IDNA Compatibility Processing, clarified the edge case of the empty label in ToASCII and added documentation regarding the new IDNA derived property data files.

About the Unicode Standard

The Unicode Standard provides the basis for processing, storage and seamless data interchange of text data in any language in all modern software and information technology protocols. It provides a uniform, universal architecture and encoding for all languages of the world, with over 140,000 characters currently encoded.

Unicode is required by modern standards such as XML, Java, C#, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc., and is the official way to implement ISO/IEC 10646. It is a fundamental component of all modern software.

For additional information on the Unicode Standard, please visit https://home.unicode.org/.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. For a complete member list go to https://home.unicode.org/membership/members/.
For more information, please contact the Unicode Consortium https://home.unicode.org/connect/contact-unicode/.


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Wednesday, June 8, 2022

Unicode CLDR Version 42 Submission Open

[ballot box image] The Unicode CLDR Survey Tool is open for submission for version 42. CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.) For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)

Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.

Version 42 is focusing on:
  • Additional Coverage
    • Unicode 15.0 additions: new emoji, script names, collation data (Chinese & Japanese), …
    • New Languages: Adding Haryanvi, Bhojpuri, Rajasthani at a Basic level.
    • Up-leveling: Xhosa, Hinglish (Hindi-Latin), Nigerian Pidgin, Hausa, Igbo, Yoruba, and Norwegian Nynorsk.
  • Person Name Formatting: for handling the wide variety in the way that people’s names work in different languages.
    • People may have a different number of names, depending on their culture--they might have only one name (“Zendaya”), two (“Albert Einstein”), or three or more.
    • People may have multiple words in a particular name field, eg “Mary Beth” as a given name, or “van Berg” as a surname.
    • Some languages, such as Spanish, have two surnames (where each can be composed of multiple words).
    • The ordering of name fields can be different across languages, as well as the spacing (or lack thereof) and punctuation.
    • Name formatting need to be adapted to different circumstances, such as a need to be presented shorter or longer; formal or informal context; or when talking about someone, or talking to someone, or as a monogram (JFK).
Submission of new data opened recently, and is slated to finish on June 22. The new data then enters a vetting phase, where contributors work out which of the supplied data for each field is best. That vetting phase is slated to finish on July 6. A public alpha makes the draft data available around August 17, and the final release targets October 19.

Each new locale starts with a small set of Core data, such as a list of characters used in the language. Submitters of those locales need to bring the coverage up to Basic level (very basic basic dates, times, numbers, and endonyms) during the next submission cycle. In version 41, the following levels were reached:

Level Languages Locales* Notes
Modern 89 361 Suitable for full UI internationalization
Afrikaans‎, ‎… Čeština‎, ‎… Dansk‎, ‎… Eesti‎, ‎… Filipino‎, ‎… Gaeilge‎, ‎… Hrvatski‎, ‎Indonesia‎, ‎… Jawa‎, ‎Kiswahili‎, ‎Latviešu‎, ‎… Magyar‎, ‎…Nederlands‎, ‎… O‘zbek‎, ‎Polski‎, ‎… Română‎, ‎Slovenčina‎, ‎… Tiếng Việt‎, ‎… Ελληνικά‎, ‎Беларуская‎, ‎… ‎ᏣᎳᎩ‎, ‎ Ქართული‎, ‎Հայերեն‎, ‎עברית‎, ‎اردو‎, … አማርኛ‎, ‎नेपाली‎, … ‎অসমীয়া‎, ‎বাংলা‎, ‎ਪੰਜਾਬੀ‎, ‎ગુજરાતી‎, ‎ଓଡ଼ିଆ‎, ‎தமிழ்‎, ‎తెలుగు‎, ‎ಕನ್ನಡ‎, ‎മലയാളം‎, ‎සිංහල‎, ‎ไทย‎, ‎ລາວ‎, ‎မြန်မာ‎, ‎ខ្មែរ‎, ‎한국어‎, ‎… 日本語‎, ‎…
Moderate 13 32 Suitable for full “document content” internationalization, such as formats in a spreadsheet.
Binisaya, … ‎Èdè Yorùbá, ‎Føroyskt, ‎Igbo, ‎IsiZulu, ‎Kanhgág, ‎Nheẽgatu, ‎Runasimi, ‎Sardu, ‎Shqip, ‎سنڌي, …
Basic 22 21 Suitable for locale selection, such as choice of language in mobile phone settings.
Asturianu, ‎Basa Sunda, ‎Interlingua, ‎Kabuverdianu, ‎Lea Fakatonga, ‎Rumantsch, ‎Te reo Māori, ‎Wolof, ‎Босанск�� (Ћирилица), ‎Татар, ‎Тоҷикӣ, ‎Ўзбекча (Кирил), ‎کٲشُر, ‎कॉशुर (देवनागरी), ‎…, ‎মৈতৈলোন্, ‎ᱥᱟᱱᱛᱟᱲᱤ, ‎粤语 (简体)‎
* Locales are variants for different countries or scripts.

If you would like to contribute missing data for your language, see Survey Tool Accounts. For more information on contributing to CLDR, see the CLDR Information Hub.



Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Wednesday, May 4, 2022

Out of this World: New Astronomy Symbols Approved for the Unicode Standard

Five Trans-Neptunian Objects to Join Character Set

By Deborah Anderson, Chair of Unicode Script Ad Hoc Committee

In January 2022, the Unicode Technical Committee approved five new symbols to be published in Unicode 15.0. With the projected release date of September 2022, these symbols are based on newly discovered trans-Neptunian objects (TNOs) in the Solar System. They resulted from research efforts such as those led by astronomer and professor Dr. Michael Brown at the California Institute of Technology (CalTech).

These five objects orbit the Sun at a distance far larger than the major planets. They are currently believed to be large enough to be round, planetary worlds, in a category of objects called “dwarf planets” that also includes Ceres, Pluto, Eris and probably Sedna. The most famous trans-Neptunian object is Pluto, which historically had been considered to be the ninth planet from the Sun, but was reclassified as a dwarf planet in 2006 by the International Astronomical Union (IAU).[1]

[Pluto image]

How did this happen?

Individuals or organizations who want to propose new characters have to check existing characters to avoid duplicates, find out if there are equivalent forms already in existence, and most critically, determine the need for a digital interchange of them, such as symbols that have been encoded for use by NASA and other agencies. The proposal authors then must submit a proposal that articulates how their request meets the criteria.

Once a proposal is submitted, the Unicode Technical Committee determines whether to review the proposal and accept or decline it. This process can take a couple of years or more. In the case of these five characters, the proposers demonstrated the need, clearing the path for approval. 

Tell me more about these new characters. What are their names?

The International Astronomical Union (IAU) has standard conventions for naming objects both within and outside of the solar system. Objects orbiting the Sun outside the orbit of Neptune are named after mythological figures, particularly those associated with creation. But the subset that orbit in a two-to-three resonance with Neptune — the so-called “plutinos”, such as Pluto and Orcus — are named after figures associated with the underworld. In this case, the five TNOs, ordered by distance from the sun, are named:
  • Orcus: the Etruscan and Roman god of the underworld.
  • Haumea: the Hawaiian goddess of fertility; the telescope used to discover this object is located on Hawaiʻi.
  • Quaoar: an important mythological figure of the Tongva, the indigenous people who originally occupied the land where CalTech is located.
  • Makemake: the creator god of the Rapanui of Easter Island.
  • Gonggong: a destructive Chinese water god.
What information is there on the actual symbols that will be available?

All five symbols were designed by Denis Moskowitz, a software engineer in Massachusetts who had previously designed the Unicode symbol for Sedna. He drew inspiration from existing symbols and the “native name or culture” of the objects’ namesakes [2] to create the characters.

[TNO glyphs image]

Denis explains his inspiration for each symbol below:
  • Orcus: The symbol for Orcus is a combination of the Latin letters “O” and “R”, stylized to resemble a skull and an orca’s grin.
  • Haumea: The symbol created for Haumea was a combination and simplification of Hawaiian petroglyphs for “childbirth” and “woman”.
  • Quaoar: The symbol is the Latin letter “Q” with the tail fashioned into the shape of a canoe. The angular shape is intended to reflect Tongva rock art.
  • Makemake: The Makemake symbol is a traditional petroglyph of the face of the creator god Makemake, stylized to suggest an “M”. The design was a collaboration with John T. Whelan.
  • Gonggong: Gonggong’s symbol was based on the first Chinese character in the god’s name, 共 gòng, with a snaky tail replacing the lower section.
What else should we know?

The five symbols supplement a set of other characters for planetary objects that were published in 2018 (Unicode 11.0) and earlier. Two of the newly approved characters appear in a NASA poster. Other people have used the symbols in various media, including tattoos and art. Ultimately, these five new characters will join the 149,180 other characters in the Unicode Standard Version 15.0 and be accessible to anyone, anywhere in the world, who is using a computer or mobile device.

Where can I learn more?
Acknowledgments

Special thanks to Sarah Rivera and Kirk Miller for their contributions to this blog.


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Monday, March 28, 2022

The Past and Future of Flag Emoji

Emoji Flags are dead, long live Emoji Flags 🏁 🏁 🏁

By Jennifer Daniel, Unicode Emoji Subcommittee Chair

With Emoji 16.0 submissions open from April 4, 2022 through July 31, 2022, the Unicode Emoji Subcommittee members stand with open arms for your future hair pick, khanda, and pink heart emoji proposals (BTW, if you were planning to prepare proposals for those concepts, we have some good news for you: they are already Emoij 15.0 draft candidates!).

That being said, there is one particular type of emoji for which the Unicode Consortium will no longer accept proposals. Flag emoji of any category.

Flag emoji have always been subject to special criteria due to their open-ended nature, infrequent use, and burden on implementations. Today nine out of ten are in the top twenty most frequently shared flags. (The only outlier is Russia.) The addition of other flags and thousands of valid sequences into the Unicode Standard has not resulted in wider adoption. They don’t stand still, are constantly evolving, and due to the open-ended nature of flags, the addition of one creates exclusivity at the expense of others.

Why do flag emoji exist in the first place?

Well, the shorter, more technical answer is: The country flags use a generative mechanism, and were encoded early on for compatibility reasons.

The longer answer requires a flashback to the 1990’s. KDDI and SoftBank — two Japanese mobile phone carriers — had early emoji sets which included 10 country flags: 🇨🇳 🇩🇪 🇪🇸 🇫🇷 🇬🇧 🇮🇹 🇯🇵 🇰🇷 🇷🇺 🇺🇸¹. A possibly apocryphal explanation is that they were used to denote what to grab for dinner: "American 🇺🇸 or Italian 🇮🇹?" (Such an innocent time in emoji history, pre-hamburger 🍔 emoji). Alas, as Unicode stepped in to create meaningful interoperability between these carrier-specific encodings, they were presented with a problem: why should these 10 countries have flag emoji when others do not?

The original emoji set included ten flags (shown above).
¹ Interestingly, Windows has never supported flag emoji 🔮. So, if you are reading this on a Windows device and flags aren't displaying, simply refer to the image above of the ten original flag emoji.

Various ideas were considered. The Unicode Consortium isn’t in the business of determining what is a country and what isn’t. That’s when the Consortium chose ISO 3166-1 alpha 2 as the source for valid country designations. ISO 3166 is a widely-accepted standard, and this particular mechanism represents each country with 2 letters, such as “US” (For United States), “FR” (France), or “CN” (China).

It wasn’t a perfect solution, but by allowing the 10 flag emoji — and the rest of the country flags — to be accurately interchanged between DoCoMo, KDDI, SoftBank, Google, and Apple, and others, it worked just fine.

Why this flag emoji but not that one?

Today, the largest emoji category is flags (Out of only ~3600 emoji, there are over 200 flags!). But, did you know that there are over 5,000 geographically-recognized regions that are also “valid”? These are known as subdivision regions and are based on ISO 3166-2. (These include states in the US, regions in Italy, provinces in Argentina, and so on.)

First, what does “valid” mean to the Unicode Standard? Well, think of it this way. Today, anyone could make a font of 5,000 emoji flags using these sequences. They are valid sequences. They are legit sequences. They won’t break. Any platform, application, or font can implement them. The significant difference here is that valid doesn’t mean they are recommended for implementation.

Back to ISO. ISO groups countries in a more formal way than say FIFA or The Olympics. For example, the four regions of the UK are regularly used in sport but not recognized in ISO 3166-1. In 2016, the Unicode Consortium started looking into solutions to support their inclusion (with the technical feasibility of adding more if needed in the future). This was the impetus for adding a general mechanism to make all ISO 3166-2 codes be valid for flags. However, only three of the 5,000 ISO 3166-2 codes have widely adopted emoji— England, Scotland, and Wales. (Northern Ireland remains in limbo until an “official flag” is formalized).

Flags for England, Scotland, and Wales were included in Emoji 5.0

So, with so many “valid sequences” why hasn’t anyone taken advantage of this sweet sweet rich flag opportunity?

At the time, in 2016, adding a few flags seemed reasonable but in retrospect was short-sighted. If the Emoji Subcommittee recommends the addition of a Catalonia flag emoji, then it looks like favoritism unless all the other subdivisions of Spain are added. And if those are added, what about the subdivisions of Japan or Namibia, or the Cantons of Liechtenstein? The inclusion of new flags will always continue to emphasize the exclusion of others. And there isn’t much room for the fluid nature of politics — countries change but Unicode additions are forever — once a character is added it can never be removed. (That being said, font designers can always update the designs as regimes change).

How are flag emoji used?

Flags are very specific in what they mean, and they don’t represent concepts used multiple times a day or even multiple times a year. You could say flag emoji have transcended the messaging experience and are primarily found in more auto-biographical contexts. (Like your TikTok bio. Or, maybe you add a flag to your username on Twitter.) But, even then flags are not as commonly found in biographical spaces as you may expect. (The top five emoji found in Twitter bios? ❤️✨💙💜💛.)

Despite being the largest emoji category with a strong association tied to identity, flags are by far the least used. (There are exceptions: usage of the rainbow flag is above median!) That begs the question, “So, why not encode more identity flags?” Well, we have seen the same results for flags as we have seen for other emoji — a very long tail of rarely used options. They also tend to change over time! In the past six years since adding a Pride Flag to the Unicode Standard (2019) it’s already been redesigned. Many times. Identities are fluid and unstoppable which makes mapping them to a formal unchanging universal character set incompatible.

Why does usage matter in selecting emoji?

Any emoji additions have to take into consideration usage frequency, trade-offs with other choices, font file size, and the burden on developers (and users!) to make it easier to send and receive emoji. That’s why the Emoji Subcommittee set out to reduce the number of emoji we encode in any given year. Flags are also super hard to discern at emoji sizes — it’s quite easy to send a different flag than you intended (and with each additional flag the problem gets worse). The simple truth is that if more people used flags then there would be more of an argument to encode them. The Unicode Standard subset is just not a viable solution here for implementers nor users. Fortunately, there are seemingly infinite other ways to exchange images of flags that are more flexible and decentralized, such as stickers, gifs, and image attachments.

What is Unicode doing about it?

We realize closing this door may come as a disappointment — after all, flags often serve as a rallying cry to be seen, heard, recognized, and understood.

The Internet is a different place now than it was in the 90’s — the distribution of imagery online is unstoppable! Given how flags are commonly used this is a reasonable path forward: If you care to denote your affiliation with a region be it geographic, political, or identity (or all three) you can add a flag to your avatar image, share videos, or send a gif or sticker to razz your friend during a sports game (and of course there is always ⚽ ⚽ ⚽ ⚽ ⚽).


The more emoji can operate as building blocks, the more versatile, fluid, and useful they become! Rather than relying on Unicode to add new emoji for every concept under the Sun (this is simply not attainable) the citizens of the world have proven to be infinitely creative and fluid: often using existing emoji like the colored hearts (❤️️ 🧡 💛 💚 💙 💜 🤎 🖤 🤍) to express themselves. Hearts are among the most frequently used type of emoji and the nine colored hearts are often juxtaposed next to each other to denote markers of emotion (“I’m sorry 💙” or “love you ❤️”) and identity or affiliation that are not represented with atomic emoji in the Unicode Standard (ex. “Pan African pride ❤️️💚🖤”, “Hi I’m bi 💖💙💜”, and yes even sports teams “Go Mets! 💙🧡” ).

With this in mind, the Emoji Subcommittee has put forth a strategy to add a pink heart, a light blue heart, and a gray heart to the Unicode Standard. These are colors commonly found in gender flags (gender fluid pride flag), sexuality flags (bisexual pride flag), in sports team colors (Go Spurs!) and even some regional flags (Brussels). As of this year, these three heart emoji advanced as draft candidates, and you can expect them to land on your device of choice sometime next year.

In some ways we have returned to where we first started: Adding three new emoji to support a seemingly infinite number of concepts. This time if it fails, at least we’ll be left with lots of heart emoji that have multiple uses. ❤️🧡💛💚💙💜🤎🖤🤍



In light of this change, we’d like to clarify a few additional frequently asked questions with regards to emoji flags

Wait, if a country gains independence and is recognised by ISO, does that mean no flag emoji for them?
Flags for countries with Unicode region codes are automatically recommended, with no proposals necessary! First their codes and translated names are added to Unicode’s Common Locale Data Repository [CLDR], and then the emoji become valid in the next version of Unicode. These emoji are also automatically recommended for general interchange and wide deployment.

What about flags that change designs for geopolitical reasons?
Unicode does not specify the appearance of flag emoji. It is the responsibility of font designers to update their fonts as politics change. EG: no Unicode changes required for https://emojipedia.org/flag-mauritania/

My region was assigned a 3166-2 code. Do we have to submit a proposal?
No, the Emoji Subcommittee is no longer taking in any proposals for flags of any kind.

As a recent example, Kurdistan (a subdivision of Iraq) became an official subdivision in ISO 3166-2 (IQ-KR) on May 3, 2021. The corresponding Unicode subdivision code (iqkr) is slated for release in CLDR v41 on Apr 6, 2022. At that point the flag for Kurdistan will officially be valid — any platform, app, or font could support it. But that doesn’t mean it automatically gets in the queue for everyone’s phone. Only countries with ISO 3166-1 region codes are automatically recommended and require no proposal to move forward.

So what warrants an ISO 3166-1 assignment vs ISO 3166-2?
ISO 3166-1 is for countries recognized by the United Nations and ISO 3166-2 is for parts of countries.

Why is Antarctica part of ISO 3166-1 but Africa isn’t? There seems to be no rational explanation with regard to why islands with no inhabitants have a flag while regions with millions of people have no emoji flag.
It’s true, there are "Exceptional reservations." Antarctica has an ISO 3166-1 alpha 2 code: AQ. But WHY does it have an ISO 3166-1 code? Because ISO 3166 decided to (ages ago) include it, probably since the whole continent is "shared."

For historical reasons, you may see other exceptions like 🇦🇨 AC Ascension Island, 🇨🇵 CP Clipperton Island, or 🇩🇬 DG Diego Garcia.

Why don’t we have asexual, bisexual, pansexual, and non-binary pride flags? And if 🏴󠁧󠁢󠁷󠁬󠁳󠁿 and 🏴󠁧󠁢󠁳󠁣󠁴󠁿 get Unicode flags, surely there’s room for the Aboriginal and Torres Strait Islander flags?
Before diving into the facts of why these flags are not part of the universal character set, we want to first take a moment to consider what people mean when they ask these questions and what Unicode means when they decline these flag proposals. Because this question is not one we take lightly. In the course of world history, groups have used flags as a rallying cry to be seen, heard, recognized, and understood. In the Unicode Consortium’s mission to digitize the world’s languages, improve communication online, and achieve meaningful interoperability between platforms, the requests for flags have become a lightning rod for these rallying cries.

When people ask for a new flag emoji, we recognize that the underlying request is about more than simply a new emoji. And when we say, “We aren’t adding more flags,” we are only saying changing the Unicode Standard is not an effective mechanism for this recognition.

What if I submit a proposal for a flag despite this policy?
Your proposal will not be processed.

Relevant docs/Further Reading
https://www.unicode.org/L2/L2021/21128-esc-recs.pdf
https://www.unicode.org/L2/L2021/21167.htm
https://www.unicode.org/L2/L2021/21172-esc-recs.pdf
https://www.unicode.org/emoji/proposals.html#Flags
http://www.unicode.org/L2/L2019/19084-trans-flag.pdf

Wednesday, November 10, 2021

ICU4X 0.4 Released

ICU LogoUnicode® ICU4X 0.4 has just been released. This revision brings an implementation of Unicode Properties, major performance and memory improvements for DateTimeFormat, and extends the data provider data loading models with BlobDataProvider.

ICU4X 0.4 also adds initial time zone support in DateTimeFormat, week of month/year, iteration APIs in Segmenter and experimental ListFormatter.

The ICU4X team is shifting to work on the 0.5 release in accordance with the roadmap and a product requirements document setting sights on a stable 1.0 release in Q2 2022.

ICU4X aims to develop a highly modular set of internationalization components for resource-constrained environments, portable across programming languages.

Multiple early adopters use ICU4X in pre-release software in Rust, C, C++, and WebAssembly. The team is ready to onboard additional early adopters to refine the APIs, build processes, and feature sets before the 1.0 release. The team is also looking for contributors to write code generation for additional target programming languages. For more information, please open a discussion on the ICU4X GitHub.

For details, please see the changelog.


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Tuesday, July 20, 2021

The Unicode Consortium Welcomes Toral Cowieson as Executive Director & COO

keyboard photo Since its founding, the Unicode Consortium has grown and expanded its charter and scope. We’re embarking on a new chapter in the evolution of the Consortium and are pleased to announce the appointment of Toral Cowieson in the newly-created position of Executive Director & COO.

“We are thrilled to have Toral joining the team,” said Mark Davis, President and cofounder of the Consortium. “She brings a wealth of experience in leadership across non-profits, corporations, and board service. Her recent time at the Internet Society, including as head of Strategy and Impact Measurement, puts Unicode in good stead for this next stage of growth."

In this senior executive position reporting to the Chair of the Board of Directors, Ms. Cowieson will collaborate with the Board, officers, and team to extend the technical mission and impact, set the future agenda and program priorities, and ensure the long-term health and sustainability of the organization.

“Unicode standards are at the heart of how users seamlessly receive and share information across the nearly 22 billion devices around the world. I’m honored and excited to be joining the Consortium at this juncture, and look forward to working with the Board, staff, and the extended Unicode community to advance the mission and have an even greater impact in the years to come,” commented Ms. Cowieson.

In addition to Ms. Cowieson joining as Executive Director, the Consortium is also pleased to announce the following changes:

Board and Other Leadership Updates

Iris Orriss, who joined the Unicode Board in 2019, has been elected as the Treasurer of the Consortium. She is VP of Internationalization, Product Quality, and Product Experience Analytics at Facebook. Ms. Orriss is also Chair of the Board’s Finance and Funding Committee.

Greg Welch, member of the Board since 2013, has been elected as the Secretary of the Consortium and carries forward the excellent work done in this office by Michel Suignard for more than a decade. Mr. Welch is also Chair of the Board’s Governance & Nominating Committee.

Markus Scherer, the Chair of the ICU Technical Committee, has been appointed a Vice President. He is a member of the Google software internationalization team, focusing on the effective use of Unicode and on the development and deployment of cross-product internationalization libraries.

Announcing Unicode Fellows

The Consortium has recently created a new category for distinguished contributors, whose deep, long-term knowledge of internationalization and dedication to work on standards has greatly benefited the Consortium for many years. The Consortium is pleased to announce its two inaugural Unicode Fellows.

Peter Edberg has been named a Unicode Fellow. He has worked on internationalization, text and language support at Apple since 1988. He has been Apple’s representative to the Consortium for many years, and has been actively involved since 2008 with the Unicode CLDR and ICU projects.

Michel Suignard has been named a Unicode Fellow after serving as Secretary for the Unicode Consortium from 2007 to 2020. He worked for more than twenty-five years at Microsoft, where he held various positions in the development and sales divisions, many involving the development of the Unicode Standard. He is currently an independent consultant working on character encoding related matters, such as Internationalized Domain Names (IDN) and typography. Michel is the code chart editor for the Unicode Standard and is also the project editor of ISO/IEC 10646, which is the ISO standard aligned with the Unicode Standard.



Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Thursday, May 6, 2021

ICU4X 0.2 Released

ICU LogoUnicode® ICU4X 0.2 has just been released. This revision improves completeness of the components in ICU4X 0.1 and introduces a number of lower-level utilities.

ICU4X 0.2 adds minimal decimal formatting, time zone formatting, datetime skeleton resolution, and locale canonicalization.

This release comes with new low-level utilities for fixed decimal operations, ICU patterns, and foundational components allowing use of ICU4X from other ecosystems via Foreign Function Interfaces.

Additionally, the ICU4X team released a roadmap and a product requirements document setting sights on a stable 1.0 release.

ICU4X aims to develop a highly modular set of internationalization components for resource-constrained environments.

For details, please see changelog.


Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Friday, October 23, 2020

Announcing ICU4X 0.1

ICU LogoWe are thrilled to announce the first pre-release version of the ICU4X internationalization components. ICU4X aims to provide high quality internationalization components with a focus on:
  • Modularity
  • Flexible data management
  • Performance, memory, safety and size
  • Universal access from programming languages and ecosystems (FFI)
ICU4X draws from the experience of projects such as ICU4C, ICU4J, ECMA-402, CLDR, and Unicode.

Target

ICU4X is initially focusing on a subset of internationalization APIs standardized in ECMA-402 in order to cover the needs of client-side ecosystems and thin clients.

ICU4X targets a wide range of programming languages and environments, aiming to expose its APIs to languages such as Javascript, WebAssembly, Dart, C++, Python, PHP, and others.

With our focus on client-side ecosystems a lot of effort will be placed on minimizing the size, memory, and CPU utilization, and allowing for asynchronous data management.

More information on the design can be found in the project’s Announcement article.

Status

This first pre-release 0.1 version is written in Rust and introduces a small subset of APIs and scaffolding for flexible data management.

We would like to invite everyone to try it out. Take a look at the documentation and provide feedback on the API design. We’re also looking for feedback on the algorithms and data structures we use, especially from contributors with experience in Rust and ICU algorithms

More information on the release can be found in the Release Notes.

Roadmap

The next version, 0.2, will focus on validating the ability to expose ICU4X APIs to other programming environments and extending the data management system to be asynchronous.

The project is fully open source and invites all interested parties to join the effort of designing and developing a modular internationalization components system in Rust.

To learn more on how to contribute to the project, visit the CONTRIBUTE document in the project’s repository.


Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Wednesday, April 8, 2020

Unicode 14.0 Delayed for 6 Months

Due to COVID-19, the Unicode Consortium has decided to postpone the release of version 14.0 of the Unicode Standard by 6 months, from March to September of 2021. This delay will also impact related specifications and data, such as new emoji characters.

The Unicode Consortium relies heavily on the efforts of volunteers. “Under the current circumstances we’ve heard that our contributors have a lot on their plates at the moment and decided it was in the best interests of our volunteers and the organizations that depend on the standard to push out our release date,” said Mark Davis, President of the Consortium. “This year we simply can’t commit to the same schedule we’ve adhered to in the past.”

ICU and CLDR to stay on schedule

The two other main Unicode projects, ICU and CLDR, are maintaining their 6-month cycles for releases in the spring and fall, although the feature sets this year may be lighter. The CLDR project supplies language- and locale-specific data and specifications, while the ICU project supplies internationalization code libraries that allow operating systems and applications to use Unicode and CLDR data and specifications. These projects are impacted less by current conditions since they have always operated via virtual meetings and are more compartmentalized, meaning that it is easier to withhold a particular feature if it falls behind schedule without jeopardizing the whole release. Sub-projects of CLDR and ICU, such as the CLDR Message Formatting project, will also be little affected.

Emoji

This announcement does not affect the new emoji included in Unicode Standard version 13.0 announced on March 10, 2020.

Because of the lead time for developers to incorporate emoji into mobile phones, emoji that are finalized in January don’t appear on phones until the following September or so. For example, the emoji that were included in Release 13.0 in March 2020 won’t generally be on phones until the fall of 2020. With the delay of the release of Unicode 14.0, the deadline for submission of new emoji character proposals for Emoji 14.0 is also being postponed until September 2020.

The Consortium is considering whether it is feasible to release emoji sequences in an Emoji 13.1 release. These sequences make use of existing characters. An example from Emoji 13.0 is the black cat, which is internally a combination of the cat emoji and black large square emoji. Since sequences rely only on combinations of existing characters in the Unicode Standard, they can be implemented on a separate schedule, and don’t require a new version of Unicode or the encoding of new characters. Such an Emoji 13.1 release would be in time for release on mobile phones in 2021.

The Emoji Subcommittee will be accepting new emoji character proposals for Emoji 14.0 from June 15, 2020 until September 1, 2020. Any new emoji characters incorporated into Emoji 14.0 would appear on phones and other devices in 2022.


Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Wednesday, July 17, 2019

The Unicode Consortium Launches New Website in Celebration of World Emoji Day

The New Unicode.org Also Offers Emoji Enthusiasts the Chance to “Adopt a Character”

The Unicode Consortium, a nonprofit that maintains text standards to support all the world’s written languages across every device, today debuted a new look for unicode.org. The redesigned website will make information about the emoji proposal process more easily accessible while encouraging public participation and engagement in all Unicode initiatives.

“Unicode is a global technology standard that is one of the core building blocks of the internet,” said Unicode board member Greg Welch. “Unicode has helped facilitate the work of programmers and linguists from around the world since the 1990s. But with the rise of mobile devices and public enthusiasm for emoji, we knew it was time to redesign the Unicode website to make information more easily accessible, and increase community involvement.”

Emoji were adopted into the Unicode Standard in 2010 in a move that made the characters available everywhere. Today, emoji have been used by 92% of the world’s online population. And while emoji encoding and standardization make up just one small part of the Consortium’s text standards work, the growing popularity and demand for emoji have put the organization in the international spotlight.

“We’ve been working with the Unicode Consortium for several years to open up the emoji proposals process by making it more accessible and understandable,” said Jennifer 8. Lee, co-founder of Emojination. “While I personally found the late-90s aesthetic of the developer-centric Unicode.org site very retro and nerd charming, the new site redesign is a reflection of Unicode’s deep desire to engage the public in its work.”

In addition to offering a clearer picture of the emoji submission and standardization process, the new Unicode.org website offers information about the Consortium and its mission to enable people everywhere in the world to use any language on any device.

“Emoji are just one element of our broader mission,” said Mark Davis, president and co-founder of the Unicode Consortium. “The Consortium is a team of largely volunteers who are dedicated to ensuring that people all over the world can use their language of choice in digital communication across any computer, phone or other device. From English and Chinese to Cherokee, Hindi and Rohingya, the Consortium is committed to preserving every language for the digital era.”

A team of designers from Adobe provided design and branding support, as well as free access to leading design tools, to bring Unicode’s new website to life.

“The Unicode Consortium’s work to keep digitally disadvantaged languages alive is incredibly important,” said Adobe Design Program Manager Lisa Pedee. “We collaborated closely with the Consortium to develop a unique visual brand and streamlined web interface that makes everything from contributing language data to proposing an emoji more accessible, inclusive and user-friendly.”

The Consortium’s recent language work includes adding language data for Cherokee, encoding the Hanifi Rohingya script, and developing the Mayan hieroglyphic script.

The Consortium invites emoji and language enthusiasts to celebrate World Emoji Day on July 17 and “Adopt a Character” to support its ongoing efforts. More than 136,000 characters are up for adoption — including this new Emoji 12.0 additions such as the sloth, the sea otter, the waffle and Saturn.

sloth image otter image waffle image ice image ringed planet image

Those who choose to adopt will receive a custom digital badge they can display to publicly show their support, whether on their website or social media. The Unicode Consortium is a 501(c)(3) charitable organization and “adoption fees” are tax-deductible in the U.S. Additionally, some companies may provide matching funds. Learn more and adopt your character here.

About the Unicode Consortium
The Unicode Consortium is a nonprofit on a mission to enable anyone to use any language across every device, globally. The Consortium develops, extends and promotes the use of the Unicode Standard, freely-available specifications and data that form the foundation for software internationalization in all major operating systems, search applications and the web.

The Unicode Consortium is open to all and comprises individuals, companies, academic institutions and governments. Members include Adobe, Apple, Emojipedia, Facebook, Google, IBM, Microsoft, Netflix, Oracle and SAP, among others. For more information, please visit http://www.unicode.org.