The Unicode Blog: 2024

Thursday, July 11, 2024

Unicode Technology Workshop 2 — Call for Submissions Now Open!

Event Dates: October 22-23, 2024

Where: San Francisco Bay Area (Hosted at Google’s Sunnyvale, California campus). For planning purposes, the closest airports are San Francisco International Airport (SFO) and San Jose Mineta International Airport (SJC) and the recommended public transportation option is via VTA Light Rail to the Lockheed Martin Station.

The Second Annual Unicode Technology Workshop (UTW 2) builds upon the success of last year’s event, which brought together more than 80 internationalization enthusiasts for two days of connecting, learning, and envisioning the possibilities.

UTW 2 is designed to be accessible to GILT professionals and students, while providing enough information and depth to be relevant to established internationalization experts. The primary goals of the workshop are to strengthen the internationalization community and further the adoption of Unicode standards and technology. For this year’s workshop, Unicode will also be adding case studies and speed networking to provide an even more engaging experience for all attendees.

Call for Submissions Now Open!

Unicode is pleased to announce that session proposals for UTW 2 are now being accepted!

We are seeking proposals for workshops, seminars, free-form discussions, and lightning talks that center around Unicode i18n libraries, locale data frameworks, globalization tooling, localization pipelines, input methods, and text rendering. Come connect with other Unicode users, share your knowledge and experience, and help us envision the future of Unicode technology. You will come away with deeper knowledge on how to solve tough problems in the i18n and l10n space and how to engineer products that work better for global users. Note: To encourage maximum collaboration amongst the attendees, this is an in-person-only event.

If you have an idea for a session that you would like to lead, you can register your interest in contributing by using the following link: Submissions. We are interested in proposals that represent a variety of perspectives, so i18n and l10n technical specialists, working GILT professionals, and university students are all encouraged to propose ideas. Deadline for submissions is August 12, 2024 by 5:00PM PT. Proposals will be reviewed in late August and session hosts will be notified mid-September.

Sponsorship Opportunities

Sponsorship opportunities are available at various levels. Sponsorship benefits include complimentary registrations, opportunities to lead a session or workshop, recognition on the event website, program and event materials, visibility on social media, and much more. Specific offerings vary by sponsorship level.

If you want to demonstrate your industry leadership, enhance your brand, share your knowledge, promote your products and services, and foster community building, contact events@unicode.org today to learn more. Sponsorship discounts are available to Unicode Full and Supporting Members.

About the Unicode Consortium

The Unicode Consortium is the premier non-profit open source, open standards body for the internationalization of all software and services.

For more than 30 years, the Unicode Consortium has coordinated the efforts of a world-wide team of volunteer programmers and linguists to standardize, evolve, and maintain a global software foundation that allows virtually every computer system and service to help people connect using their native language.

For additional information, visit home.unicode.org.

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

Tuesday, June 11, 2024

Announcing Updated Technical Group Procedures at Unicode

We are pleased to announce that the Unicode Consortium has updated its Technical Group Procedures. Last revised in January 2022, these new procedures are more closely aligned with our current practices and the evolving needs of our technical governance. This change comes as a result of the collaborative efforts from our technical leadership and Board Governance Committee. Additional input was solicited from the Technical Committee delegates of Unicode’s membership and the Unicode Board.

Aligning Practices with Needs

Over time, it has become evident that our former procedures were not fully reflective of what our technical groups needed. As a non-profit open source organization at the forefront of developing standards, code, and data that enable people around the world to use computers in any language, our technical procedures must facilitate effective and efficient governance.

Clarity and Efficiency in Governance

The newly revised procedures are designed to provide clear and comprehensive guidelines that will support our technical leadership in managing and directing our technical work more effectively. It also rationalizes the structures and terminology: for example, so that there are two technical groups:

The Technical Committees (TCs), responsible for all of the decisions
Their Work Groups (WGs) that carry out TC actions, investigate and research issues, and make recommendations to their TCs

The updates to procedures are crucial for ensuring that our decision-making processes remain transparent and aligned with our organizational goals.

Thanks to Our Community

We extend our deepest thanks to everyone involved in the drafting of these updated procedures. The collaboration and dedication of our community members ensure that the Unicode Consortium continues to be a leader in the internationalization of software and services. We also appreciate the Unicode Board for its unanimous consent to adopt these new procedures.

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

Wednesday, June 5, 2024

Interview with Unicode Volunteer -
Addison Phillips

Throughout the year, we engage in in-depth discussions with Unicode contributors to spotlight their vital contributions and share their stories. These conversations are a key part of our initiative to recognize the often unseen efforts of our volunteers, offering a more personal glimpse into the lives of those who drive our mission forward.

In our latest feature, meet Addison Phillips, a dedicated volunteer who brings a wealth of enthusiasm, expertise, and passion to the Unicode community.

Q: What do you do now and what’s your role with Unicode?

A: I am currently Chair of the Unicode Message Format Working Group. I retired having spent the last 14 years at Amazon as well as a variety of other organizations including Yahoo, Web Methods (part of Software AG), and AT&T/Lucent Technologies. I’m primarily an “internationalization architect”, but I’ve worked in the localization, tools, and consulting space.

Q: How long have you been a volunteer at Unicode? Is there an area of focus currently?

A: A long time. I have volunteered on different levels with Unicode since the early 2000s. I had some less consistent involvement in the late 1990s.

Most recently, I’ve been the Chair of the Message Format Working Group, which is part of the CLDR project. We just released our Technical Preview a couple of months ago, as part of LDML45.

A lot of locale data is focused on individual APIs–how do you format a number? How do you format a time? How do you call “January” or “Tuesday” or “Morocco” in a given language? But Message Format, to me, is a much better starting point for the people building the software–and the people localizing it. It’s a format that lets developers make easy-to-translate, grammatical messages and saves all these people from having to learn all the low-level formatting minutiae. Building an open, shared, consistent standard for formatting will unlock so much.

Q: How did you first become involved with Unicode?

A: Initially, I attended Unicode conferences, as I was working in localization and the i18n consulting space. I was lobbying for Unicode support, for example from browser makers such as Netscape. I started presenting at Unicode conferences, including the Introduction to Internationalization tutorial. In the early 2000s, I joined the conference review committee and the editorial committee and also engaged my employers to become members of Unicode. In March 2003, I attended the Unicode conference in Prague, where Mark Davis (Unicode’s cofounder) and I cooked up a plan to address issues with locale identifiers–then a hot topic–which resulted in BCP47. That work is a cornerstone of the locale data work that, today, is CLDR. On-going, I had steady but what I would consider “lower tier involvement” with Unicode including lots of communication about needed fixes.

Q: What do you enjoy about contributing to Unicode?

A: The camaraderie. These are the people who “speak my language”: they share the same concerns, and face the same problems. Unicode as an organization has been really effective at delivering impactful things, both as a consumer and promoter of these technologies. It’s been a powerful way to effect change. Really early in my career, I was working on an overseas mainframe project at AT&T. It was scary: I needed to find a system-specific encoding map. There was one guy who was rumored to have the mapping. I had to call the Adobe switchboard and hope they would connect me to this “Ken Lunde” person (luckily he took the call!). It was a tricky world to live in, with every company having its own operating system and each operating system having its own set of character encodings per language. Everything was bespoke. Because of Unicode, this issue no longer exists. Unicode changed how computing works and how it’s thought of; having CLDR data and ICU as an implementation of that, it has made life so much easier.

Q: Do you have a favorite Unicode project you’ve worked on? Why?

A: I have really liked a lot of the projects. I am most excited by the growth of the community engagement area. Education and awareness is the biggest problem we have in the internationalization space. The encoding of text and the support of different languages and cultures is now widely available, but nobody is aware of it. No one learns it in computer science courses. Engineers are busy and they generate this kind of “disinformation bubble” of quick hacks that localization teams in particular have to overcome.

In my roles with previous employers and in my consulting–and the reason I did the tutorials at Unicode conferences–was, before we can actually move forward, everyone needs to be on common ground, with common understanding and a common vocabulary. I couldn’t be happier than to see Unicode reaching the community with information and providing standard information so everyone, no matter what environment they come from, can learn this stuff–the right way.

Q: What contribution(s) to the Unicode community are you most proud of?

A: The locale identifier work (BCP47) was pretty impactful. The personal things and making people aware that Unicode is there and a reliable source of information. Promoting Unicode has been an impactful thing. Over the years, I’ve taught the internationalization tutorial to thousands of people which I believe has had a long-term impact.

Q: How did you become involved in computer science?

A: I had a job in the 1980s at a company that built shopping centers, and, among other things, operated a bookstore. They had developed a retail system running on minicomputers that they sold to other independent bookstores, and I worked for the owner developing that system: that was my first professional job and it laid the groundwork for everything. Later, I had a job with the localization/internationalization group at AT&T: once you’ve shipped “not English” there’s no going back. I followed that to internationalization consulting, working with Bill Hall.

Q: What is your favorite book?

A: I am an avid reader so it’s hard to pick just one book. My preferred genres are fantasy and science fiction. (Fun fact: I went to Amazon to work on Kindle!)

Q: Where did you grow up?

A: Born in France, but my parents are Americans. When I was young, I lived in France and Germany. I went to high school and spent my formative years in Carmel, CA.

Q: Beach or Mountains?

A: Beach. I mean, it pretty much has to be “beach”, since I live in a town called “Dillon’s Beach”.

Q: Any advice for anyone interested in volunteering at Unicode?

A: Two things. First, jump in, the water’s fine. The Unicode space can be heavy with jargon and seem full of insider knowledge, but don’t be put off by that. Ask questions, because people are always super excited to share. There truly are no dumb questions. Don’t think just because things are operating a certain way, that you can’t question it, as there might be a new, better, or different way to do it. Maybe nobody said anything before! If you come with well-thought out questions, there will always be a positive reception. Unicode is a happy and helpful space to work in.

Second, give back. Unicode is an incredibly small organization. The number of contributors is way smaller than the impact Unicode has. And Unicode could do so much more, if only we had more people contributing. Linguistic and cultural support in software could be so much more powerful, if only we had contributions.

Q: Anything else you’d like to share?

A: I’ve spent 25 years at W3C and I’ve been the Chair of the Internationalization Working Group for most of that time. We are a partner organization in promoting internationalization. We need help there too.

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

Friday, May 31, 2024

New Event on June 25 - Webinar on Bidirectional Text (Part 1): The Basics of Bidi

Registration is Now Open!

A number of scripts, such as Hebrew, Arabic and Urdu, write their letters horizontally on a page or screen, running right to left. A complication for these scripts is that other characters, such as digits, flow left-to-right, and can occur on the same line, or even alongside other left-to-right text, such as Latin. Text that handles both right-to-left and left-to-right text is called “bidirectional” text (“bidi” in short).

How to handle bidi text on browsers and in other software is challenging for both general users and implementers. This webinar will describe the basics with examples. It will be followed by a live question-and-answer period. A more in-depth question and answer session will take place August 13, 2024.

Who? If you are a translator/localizers, localization tooling maker, I18n infrastructure developer, linguist and language researcher, application developer, or a content author, you will want to join us for this webinar. Bring your questions to the people involved for the live Q&A.

When? Tuesday, 25 June 2024 starting at 8am (San Francisco), 11am (New York), and 5pm (Berlin).

Registration is Open Now! Please note this session will also be recorded and available via the Unicode YouTube channel.

Getting Started with Bidirectional Text (Part 1): The Basics of Bidi

Frequently Asked Questions: https://unicode.org/faq/bidi.html

Articles:

Additional Articles from W3C:

https://www.w3.org/International/articlelist#direction

About the Unicode Consortium

The Unicode Consortium is the premier non-profit open source, open standards body for the internationalization of all software and services.

For more than 30 years, the Unicode Consortium has coordinated the efforts of a worldwide team of volunteer programmers and linguists to standardize, evolve, and maintain a global software foundation that allows virtually every computer system and service to help people connect using their native language.

For additional information about Unicode, visit home.unicode.org.

Unicode Resources

Unicode Technical Quick Start Guide: https://home.unicode.org/technical-quick-start-guide/
Unicode YouTube Playlist - Overview of Internationalization and Unicode Projects: https://www.youtube.com/playlist?list=PLMc927ywQmTNQrscw7yvaJbAbMJDIjeBh

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

Tuesday, May 21, 2024

Unicode 16.0 Beta Review Open

The beta review period for Unicode® 16.0 has started and is open until July 2,2024.

The beta is intended primarily for review of character property data and changes to algorithm specifications (Unicode Standard Annexes). Also, for the first time, a complete draft of the core specification text is available for review during the beta period.

At this phase of a release, the character repertoire is considered stable. For this release, 5,185 new characters will be added, bringing the total number of encoded characters in Unicode 16.0 to 154,998. The new additions include seven new scripts:

Garay is a modern-use script from West Africa
Gurung Khema, Kirat Rai, Ol Onal, and Sunuwar are four modern-use scripts from Northeast India and Nepal
Todhri is an historic script used for Albanian
Tulu-Tigalari is an historic script from Southwest India

Other character additions include seven new emoji characters plus 3,995 additional Egyptian Hieroglyphs and over 700 symbols from legacy computing environments. See the delta code charts for details on all the new scripts and characters.

In addition to new characters, new “Moji Jōhō Kiban” (文字情報盤) Japanese source references will be added for over 36,000 CJK unified ideographs. This will be reflected in the code charts for virtually all CJK unified ideograph blocks by additional representative glyphs in the “J” column. Note that these glyph additions are not reflected in the delta charts mentioned above, but can be seen in the main (“single-block”) charts for the Unicode 16.0 Beta.

Various changes to properties, algorithms, and Unicode Standard Annexes will be made for Unicode 16.0. This version will add two new Unicode Standard Annexes:

UAX #53, Unicode Arabic Mark Rendering, provides a specification for interoperable font and shaping implementations for Arabic script. (This was previously published separately from the Unicode Standard as a technical report.)
UAX #57, Unicode Egyptian Hieroglyph Database (Unikemet), provides data essential for understanding the identity of over 5,100 Egyptian Hieroglyph characters encoded in Unicode 16.0. (This is similar to data for CJK unified ideographs provided in UAX #38.)

A new UCD file, DoNotEmit.txt, will provide data in machine readable form that can be useful for software implementations but that previously was provided only as tables within the core specification text. See the Unicode 16.0 Beta landing page for other noteworthy property and algorithm changes.

For full details regarding the Beta, see Public Review Issue #502. Feedback should be reported under PRI #502 using the Unicode contact form by July 2, 2024.

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Monday, May 20, 2024

Unicode CLDR Version 46 Submission Open

The Unicode CLDR Survey Tool is open for submission for version 46. CLDR provides key building blocks for software to support the world’s languages (dates, times, numbers, sort-order, etc.) All major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)

Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.

Version 46 is focusing on:

Unicode 16 additions: new emoji, script names, collation data (Chinese & Japanese), …
Emoji search keywords: Expanding keyword coverage to make it easier for users to find the right emoji
New Languages targeting Basic:
- Ewe (ee),
- Ga (gaa)
- Kinyarwanda (rw)
- Northern Sotho (nso)
- Oromo (om),
- Sesotho (st)
- Setswana (tn),
Up-leveling: Akan (ak)

Submission of new data opened recently, and is slated to finish on June 11. The new data then enters a vetting phase, where contributors work out which of the supplied data for each field is best. That vetting phase is slated to finish on July 1. A public alpha makes the draft data available around August 28, and the final release targets October 16.

Each new locale starts with a small set of Core data, such as a list of characters used in the language. Submitters of those locales need to bring the coverage up to Basic level (very basic basic dates, times, numbers, and endonyms) during the next submission cycle.

Once a language reaches Basic coverage, it has the minimum support for use in language selection, such as on mobile devices. In the next submission cycle, the name for that language is also added for translation for all languages at Modern coverage.

If you would like to contribute missing data for your language, see Survey Tool Accounts. For more information on contributing to CLDR, see the CLDR Information Hub.

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Thursday, May 2, 2024

Unicode Technical Committee (UTC) Updates from Meeting #179

by Peter Constable, UTC Chair

The Unicode Technical Committee (UTC) met last week (April 23 to 25) in San Jose, California. Thanks to Unicode member company Adobe for hosting. Here are some highlights from the large number of items that were covered.

Preparing Unicode 16.0 Beta

An important objective was to cover all technical decisions that would be needed for the Unicode 16.0 Beta preview. The Beta will be available for public review and comment on May 21, 2024, and will include all charts, data and annexes for The Unicode Standard as well as other synchronized standards, including UTS 10, Unicode Collation Algorithm, and UTS 51, Unicode Emoji. Also, for the first time, the Beta release will include a complete draft of the core text of the standard.

The character repertoire for Unicode 16.0 was slightly adjusted, with the removal of two characters: U+0CDC KANNADA ARCHAIC SHRII and U+0C5C TELUGU ARCHAIC SHRII. These characters were first approved in January 2022 (UTC #170) and assigned for addition in Unicode 16.0 in April 2023 (UTC #175). However, in the ISO process for Amendment 2 of ISO/IEC 10646:2022 (which is to be synchronized with Unicode 16.0), the India national body requested more time for review by experts in India. To avoid a risk of Unicode 16.0 and Amendment 2 of 10646 not being in sync, UTC decided to delay these two characters for a later version.

Various character property (UCD) and algorithm changes were made based on issues reported during the Alpha review or found while the UTC Properties and Algorithms Working Group prepared data files for 16.0. Two notable areas for changes are grapheme cluster segmentation (UAX #29) and line breaking (UAX #14):

For grapheme clusters, some changes will be made to extended grapheme cluster segmentation for improved handling of orthographic syllables in Indic scripts.

For line breaking, several changes will be made to data and rules to fix various edge cases, and to incorporate behaviour for hyphens that has already been implemented in CLDR and ICU for several years.

Also related to properties, the organization of the ScriptExtensions.txt file will be changing. Previously, lines of data were grouped by characters that had the same script extension property values. Going forward, lines will be ordered by code point. (This is only a change in the order the data is listed; the parsing of lines is unchanged.) This will make it much easier to compare changes in property values between different Unicode versions.

In relation to emoji, the set of new emoji for version 16.0 is unchanged. During the Beta review, the draft update for UTS #51, Unicode Emoji, will include some proposed revisions related to recommendations for display of emoji family combinations. These revisions have not yet been reviewed and approved by UTC, so will require careful review and will be subject to confirmation or change at the next UTC meeting, after the Beta review period is over.

UTC action item backlog

UTC has had a growing backlog of open action items, some over ten years old. For this meeting, the various UTC working groups triaged their action items that were five or more years old, and outcomes were discussed by the UTC. Some action items were completed; some were closed as no longer relevant. Many that required more research were closed as UTC action items and replaced by issues in the relevant working group’s GitHub repo. Note that tracking them in this other way doesn’t necessarily mean they will get higher priority. However, since the working groups are using GitHub issues to organize their regular work, this should bring more attention to these issues. UTC will repeat this process at UTC #181, six months from now.

As a side effect of this review of old action items, a document was submitted to UTC (L2/24-123) proposing that UTC transition from the way it has handled action items in the past to tracking issues in a public GitHub repo to allow contributions from a broader set of volunteers. That document identifies some problems and limitations of the existing processes, and suggests that a new process could provide improvements. UTC spent some time discussing this document. It was noted that the idea was valuable, though such a change in processes would not be a small change and would involve some not-so-obvious challenges. It would also be something that affects the Unicode Consortium as a whole, not just UTC. For that reason, this proposal will need to be considered as part of a broader discussion of Consortium processes, resources and infrastructure.

New investigation: automatic space handing at inter-script boundaries

East Asian text often combines different scripts, and a common typographic practice is to insert space between script runs. UTC briefly discussed a new document, L2/24-057, which proposes development of an algorithm for automatic spacing between script runs. The Properties and Algorithms Working Group has assembled experts to discuss this topic. Interested experts are invited to participate in discussion via issues (with "auto-spacing" label) in the public unicodetools repo in GitHub.

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

SILICON Joins as Supporting Member of the Unicode Consortium

The Unicode Consortium is pleased to announce that SILICON has joined as a Supporting Member.

The Stanford Initiative on Language Inclusion & Conservation in Old & New Media (SILICON) is a humanities-led tech initiative at Stanford University aiming to promote and sustain Digitally Disadvantaged Languages and, more broadly, address digital inequalities. Bridging gaps between Engineering, the Humanities, Computer Science, and the Social Sciences, the initiative seeks to help build tomorrow’s digital tools: improved OCR algorithms and AI generative text models; more globally inclusive text corpora, interfaces, keyboards, and digital fonts.

SILICON is interested in accelerating the timeline for digitally disadvantaged languages to be fully usable by their communities, by facilitating ongoing conversation between people involved in Unicode’s encoding work, designers of the fonts and keyboards, script and language communities, and technical experts, linguists, and technologists. We will also be working towards usable OCR for newly-encoded languages, with an eye towards developing corpora for LLM training.

“In the 21st century, the intertwining fate of language death and digital exclusion underscores a critical challenge: the marginalization and potential extinction of diverse linguistic heritage. With over 98% of the world’s ~7000 languages categorized as ‘Digitally Disadvantaged Languages’ by the Unicode Consortium, the urgency to bridge this digital divide is unmistakable. SILICON is delighted to support the pivotal role played by Unicode, long at the forefront of advancing the cause of Digitally Disadvantaged Languages globally.” - Tom Mullaney, Professor of History at Stanford University and Co-Director of SILICON

“We are excited to welcome SILICON as a Supporting member of the Unicode Consortium. By integrating SILICON’s interdisciplinary expertise, we look forward to working together to advance digital inclusiveness.” - Toral Cowieson, CEO of Unicode

Supporting members of the Consortium have a half vote as well as representation on up to two technical committees. A list of Consortium members can be found here.

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Thursday, April 18, 2024

Unicode CLDR v45 released

The Unicode CLDR v45 is now available and has been integrated into version 75 of ICU. The CLDR v45 release page has information on accessing the data, reviewing charts of the changes, and — importantly — Migration issues.

CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.) For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)

CLDR 45 did not have a Survey Tool submission phase, and focused on tooling and just a few functional areas:

MessageFormat 2.0 Tech Preview

Software needs to construct messages that incorporate various pieces of information. The complexities of the world's languages make this challenging. The goal for MessageFormat 2.0 is to allow developers and translators to create natural-sounding, grammatically-correct, user interfaces that can appear in any language and support the needs of various cultures.

The new MessageFormat defines the data model, syntax, processing, and conformance requirements for the next generation of dynamic messages. It is intended for adoption by programming languages, software libraries, and software localization tooling. It enables the integration of internationalization APIs (such as date or number formats), and grammatical matching (such as plurals or genders). It is extensible, allowing software developers to create formatting or message selection logic that add on to the core capabilities. Its data model provides the means of representing existing syntaxes, thus enabling gradual adoption by users of older formatting systems.
See also:

UTW { } MessageFormat v2 (November 7, 2023)
Message Format Virtual Open House (February 20, 2024)

Keyboard 3.0 stable version

Keyboard support for digitally disadvantaged languages (DDLs) is often lacking or inconsistent between platforms. The updated LDML Keyboard 3.0 format specifies an interchange format for keyboard data. This will allow keyboard authors to create a single mapping file for their language, which implementations can use to provide that language’s keyboard mapping on their own platform. This format allows both physical and virtual (that is, on-screen or touch) keyboard layouts for a language to be defined in a single file.

See also:

CLDR, Beyond Locale Data (June 22, 2023)

Tooling changes

Many tooling changes are difficult to accommodate in a data-submission release, including performance work and UI improvements. The changes in v45 provide faster turn-around for linguists and higher data quality. They are targeted at the v46 submission period, starting in May, 2024.

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Wednesday, April 17, 2024

ICU 75 Released

Unicode® ICU 75 has just been released. ICU is the premier library for software internationalization, used by a wide array of companies and organizations to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR). ICU 75 updates to CLDR 45 (beta blog) locale data with new locales and various additions and corrections. C++ code now requires C++17 (C code now requires C11) and is being made more robust.

The CLDR MessageFormat 2.0 specification is now in technology preview, together with a corresponding update of the ICU4J (Java) tech preview and a new ICU4C (C++) tech preview.

For details, please see https://icu.unicode.org/download/75.

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Friday, April 5, 2024

Unicode CLDR v45 Beta available for specification review

The Unicode CLDR v45 Beta is now available for specification review and integration testing. The release is planned for April 17th, but any feedback on the specification needs to be submitted well in advance of that date. The specification is available at Draft LDML Modifications. The biggest change is the new Message Formats and Keyboards section; see also the Migration section.

The beta has already been integrated into the development version of ICU. We would especially appreciate feedback from ICU users and non-ICU consumers of CLDR data, and on Migration issues.

Feedback can be filed at CLDR Tickets.

CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.) For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)

Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.

CLDR 45 did not have a Survey Tool submission phase, and focused on tooling and just a few functional areas:

MessageFormat 2.0 Tech Preview

UTW { } MessageFormat v2 (November 7, 2023)
Message Format Virtual Open House (February 20, 2024)

Keyboard 3.0 stable version

CLDR, Beyond Locale Data (June 22, 2023)

Tooling changes

For more information

See the draft CLDR v45 release page, which has information on accessing the data, reviewing charts of the changes, and — importantly — Migration issues.

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Thursday, March 28, 2024

Wikimedia Foundation Joins as an Associate Member of the Unicode Consortium

The Unicode Consortium is pleased to announce that Wikimedia Foundation has joined as its latest organizational member.

The Wikimedia Foundation hosts the most multilingual top-10 website on the internet, Wikipedia, which is built collaboratively by people around the world – and in more than 300 languages.

“Our projects have been supported by the resources, technical projects, and forums supported by the Unicode Consortium. Through initiatives like the CLDR project, for example, Language and Internationalization engineers at the Wikimedia Foundation and volunteers on our projects have contributed to the open-source knowledge infrastructure that serves a shared mission with the Consortium of expanding language representation online. We’re looking forward to future collaborations with the Unicode Consortium as the Wikimedia movement continues to prioritize global access to knowledge in the fast-changing digital space.”
— Selena Deckelmann, Chief Product and Technology Officer

“We’re excited to welcome the Wikimedia Foundation as our latest organizational member, enhancing our shared mission to promote global language representation. We greatly appreciate the contributions from the Wikimedia team over the years and look forward to accelerating our collaboration.”
— Toral Cowieson, CEO

Associate members of the Consortium have observation status on one technical committee. The list of Consortium members can be found here.

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Volunteer Spotlight — Roozbeh Pournader

“The rewards have been much greater than I could have ever imagined.”

Roozbeh Pournader has been a volunteer at Unicode since 1999. His first exposure to Unicode was in the late 90’s when he and several colleagues were working on a project called FarsiWeb at Sharif University trying to figure out how best to properly support Persian on the internet. They encountered several “home-made” solutions.

They also discovered The Unicode Standard. Quickly realizing Unicode was the right path, Roozbeh reached out and found a welcoming community of supportive and like-minded technologists. Roozbeh recalls engaging with Mark Davis, Ken Whistler, and Rick McGowan who “were there to help anybody who wanted to use or contribute to Unicode.” Roozbeh became an active contributor when recognizing areas within Unicode that he could help improve.

Roozbeh arrived in the United States in 2008, having represented an Iranian standards organization to Unicode before that. Since then, he has also been a technical representative for several US organizations to the Consortium, a Technical Director, and is currently a Vice Chair of the Script Encoding Working Group. He remembers how excited he was to be involved in person with Unicode when he first came to the US, sharing that he walked off the plane on a Wednesday and on Friday he was attending his first UTC meeting.

Asked what he likes most about working with Unicode, he admits that it is hard to say. He knows that what he has done and is doing has an enormous impact on communities around the world. Roozbeh came from an underserved community and is pleased that he can use his skills to support minority languages, historical languages, and writing systems for other underserved communities. He is especially proud of his work in the area of Arabic script and says everyday is a joy to work with people who are so dedicated to the Unicode cause.

On a personal note, Roozbeh enjoys eating anything his wife prepares, and says that as scientists (his wife is a biologist), they are fascinated by scientific methods of cooking and experimenting in the kitchen. His favorite dish is Kabab Tabe’i, a mixture of ground meat, onions and tomatoes. When asked about any hobbies outside of work, Roozbeh says that Unicode is really not his work anymore, it is his life’s work and where he spends most of his free time.

His parting words were to encourage anyone who is interested in Unicode to get involved. There are big and small ways to contribute, and he recommends just reaching out as the Unicode community is so welcoming.

Roozbeh says he has made lifetime friends of the highest quality and for that he is forever grateful.

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Tuesday, March 26, 2024

Cathy Wissink joins Unicode Board, other Board Updates

Unicode Consortium is excited to welcome Cathy Wissink to the Board effective immediately. Cathy is a 30-year veteran of the global tech industry. Most of her career was spent at Microsoft, with her early tenure devoted to internationalization support for Windows. She then spent 15 years working focused on global government and regulatory affairs. In her most recent role at Microsoft, Cathy managed a part of Microsoft’s standards portfolio supporting regulatory needs in forums like ISO/IEC JTC1, CEN/CENELEC, NIST, and also led Microsoft’s product certification process for China.

Cathy led Microsoft’s participation in the Unicode Technical Committee from 2000-2005, and served as UTC vice-chair and INCITS/L2 chair from 2002-2005.

The Board also elected Tim Brandall and Salvo Giammarresi as Chair and Vice Chair of the Finance and Funding Committee. Tim was also named Treasurer of the Board.

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Wednesday, March 20, 2024

Emoji submissions intake process opening on April 2, 2024!

It’s that time of year. For some of us that means the birds are chirping and the weather is getting warmer. For others, the leaves are changing color and you’re unpacking your coats. No matter where you are in the world, you can submit an emoji proposal starting April 2, 2024 and the Emoji Standard and Research Working Group is looking forward to receiving your submissions. 📄👀

Whether you’re new to the process or a veteran contributor to the Unicode Standard please take note: a few changes have been made to the Guidelines for Submitting Unicode® Emoji Proposals. So, as you prepare your documents, consider the following steps to ensure your work meets all the requirements to be considered.

Step 1: See if it’s already been approved or “under consideration”

It seems obvious but if it’s already an emoji (or if we’ve reviewed a similar proposal) your job is done! Scan the list of Emoji Requests and see whether your proposed emoji has previously been submitted.

Emoji that are listed as “Prioritization Pending” or “Under Consideration” do not require additional proposals. 🆕 Emoji declined within the last four years are not eligible for re-review.

Step 2: Familiarize yourself with Criteria for Inclusion

We recommend reading recently successful proposals. It may also be worth your time scrolling through the Emoji Submission FAQ, which includes common questions and answers.

Note: Some proposals will not be considered for encoding which include but are not limited to flags, logos, UI icons, specific people, or designs that include text. 🆕 A full list of the types that will not be reviewed can be found here.

Hot tip: Advocating for an emoji on behalf of an important social cause? A proposal may be advanced despite a “cause” argument - if other factors are compelling - but will not be advanced because of it.

Step 3: Prepare your proposal document

Completing the Guidelines for Submitting Unicode® Emoji Proposals is critical. Take note of the Selection Factors and Emoji Encoding Principles as well as this section on Limitations on Emoji Encoding. Don’t skip any of the fields in the form! The Emoji Standard and Research Working Group [ESC] receives a lot of submissions, and complete proposals helps the ESC best evaluate them.

Your document must contain all of the sections shown in this format, provide empirical evidence, and address all of the questions specified there as completely as possible. Review your proposal document to confirm it is complete, has all of the necessary frequency citations, the images are not copyrighted, and meets all of the selection factors.

Hot Tip: The ESC has updated the required sources for frequency data. Previously Google & Bing Search was accepted but recently removed a mechanism that quantified the queries. Moving forward, submissions should quote Google Books and Google Ngram Viewer stats.

Step 4: Submit your documents

If you have gotten this far, congratulations. You’re almost done and the ESC’s work has just begun.

Submit your proposal as a PDF with reference images using the Unicode Emoji Submission Form. Your complete “Submission” will be made up of the completed form, which includes acceptance of the Emoji Proposal Agreement & License, and your proposal PDF.

Hot Tip: 🆕 You no longer have to submit images separately from the proposal PDF. All images will now be included directly in the PDF.

The submission window will be open April 2 through July 31, 2024 and all submissions will be notified of their status by November 30, 2024.

Good luck and happy emoji-ing! ✨😃🔎✍️✨

Hot Tip: Register now for the upcoming webinar on April 16, “How to make your emoji proposal the best that it can be!”

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Friday, March 8, 2024

Breaking the Cycle 🔗💥

by Jennifer Daniel

(This article was originally published on Jennifer’s Substack, January 17, 2023. Republished here with minor revision.)

Phoenix image

In the fall of 2022, the Unicode Technical Committee announced that the 2023 release of the Unicode Standard would be a “dot” release with limited character additions, with the next major release in 2024. This wasn’t without precedent — COVID slowed down the release of Unicode 14.0 in 2020 and the world seemed to survive 😉. Subcommittees were well prepared and adjusted accordingly, discussing what this meant for their respective areas of expertise.

For the Emoji Subcommittee (ESC) — the group responsible for defining the rules, algorithms, and properties necessary to achieve interoperability between different platforms for those smiley faces that appear on your keyboard (Shout out 😁🥰🥹🤔🫣🫡😵‍💫!) — this delay presented an opportunity. Sure, we were so close to exhaling a sigh of relief (the intake period for Emoji 16.0 proposals had just completed). But upon learning we couldn’t ship any new codepoints until 2024 we turned our energy towards recommending new emoji based on existing ones. (These are called emoji ZWJ sequences. That's when a combination of multiple emoji display as a single emoji … like 👩 🏽 +🏭 = 🧑🏽‍🏭).

When Less is More

An incredibly powerful aspect of written language is that it consists of a finite number of characters that can "do it all". And yet, as the emoji ecosystem has matured over time our keyboards have ballooned and emoji categories are about to hit or have hit a level of saturation. Upon reflecting on how emoji are used, the ESC has entered a new era where the primary way for emoji to move forward is not merely to add more of them to the Unicode Standard. Instead, the ESC approves fewer and fewer emoji proposals every year.

But our work is not done. Not by a longshot. Language is fluid and doesn’t stand still. There is more to do! This “off-cycle” gives us a chance to address some long-standing major pain points using emoji. The first one that came to mind: skin-tone.

What is a family?

The encoding of multi-person multi-tone support has matured over the years; However, the implementation can seem random to the average person: While it’s true, all people emoji have toned options (with the exception of characters where you can’t see skin like 🤺) there are … misfits. Some two people emoji offer tone support ( 🧑🏻‍❤️‍🧑🏿) others do not ( 👯). A few non RGI emoji render with tone but with no affordance to change one of the two characters (For example, 🤼🏾‍♂ renders with skintone on Android but as gold on iOS. WHY. This is why we standardize these things, people).

And then ... There is the suite of family emoji (👨‍👦👨‍👦‍👦👨‍👧👨‍👧‍👦👨‍👧‍👧👩‍👦👩‍👦‍👦👩‍👧👩‍👧‍👦👩‍👧‍👧 👨‍👨‍👦👨‍👨‍👦‍👦👨‍👨‍👧👨‍👨‍👧‍👦👨‍👨‍👧‍👧👩‍👩‍👦👩‍👩‍👦‍👦👩‍👩‍👧👩‍👩‍👧‍👦👩‍👩‍👧‍👧👨‍👩‍👦👨‍👩‍👦‍👦👨‍👩‍👧👨‍👩‍👧‍👦👨‍👩‍👧‍👧👪). These characters include two people, three people, sometimes four and none of them have any tone support (!). We seem to have a lot of family emoji and yet simultaneously not enough.

The 26 “family” emoji can be broken down into four groups:

Families image

Despite the Unicode Standard containing 26 “family” emoji, each one of these glyphs is overly prescriptive with regard to delivering on a visual representation of a family. The inclusion of many permutations of families was well intentioned. But we can’t list them all, and by listing some of the combinations, it calls attention to the ones that are excluded.

What even is a family? For some, family is the people you were raised with. Others have embraced friends as their chosen family. Some families have children, other families have pets. There are multi-generational families, mutliracial families and of course many families are any combination of all of these characteristics and more.

Fortunately, we don’t need to add 7000 variants to your keyboards (even this would fall short of capturing the breadth of "family" as a concept). Instead we can juxtapose individual emoji together to capture a concept with some reasonable level of specificity — not too unlike arranging letters together to create words to convey concepts 😉

Different families image

For emoji keyboards to advance in creating more intuitive and personalized experiences the Emoji Subcommittee is recommending a visual deprecation of the family emoji. This small set of emoji will be redesigned as part of a multi-phase effort to “complete the set” of toned variants for the remaining multi-person emoji. This of course begs the question: when there are as many families as there are people in the world, is there an effective way at conveying the concept of “family” without being overly prescriptive in defining what is and is not a family? Well, thankfully icons can do a lot of heavy lifting without requiring very much detail.

Famiy, symbol image

When is an emoji running for the police or getting chased by them?

Another area the ESC is actively exploring is how the semantics of emoji sequences can differ when writing directionality changes. Some emoji characters have semantics that encode implicit directionality but when the string is mirrored and their meaning may be unintentionally lost or changed.

Left to Right emoji image

Left to Right Emoji Sequence: Quickly running towards an “exciting” police chase

Right to Left emoji image

Right to Left Emoji Sequence: Running away from the coppers

What, if anything, can we do to aid in ensuring that messages are meaningfully translated be them tiny pictures or tiny letters? As part of 15.1 we’re proposing a small set of emoji with strong directionality — with an initial focus on people — to face the opposite direction. Soon you too can run towards or away from ... excitement.

Emoji 15.1

Given that the intake cycle of emoji proposals for Unicode 16.0 ended last July, the Emoji Subcommittee has also decided to temporarily delay the intake of Unicode Version 17.0 proposals until April 2024. Fortunately, you won’t have to wait until then to get new emoji. (Note: I know it sounds like I’m talking about the past and future simultaneously ... the emoji lifecycle is looooong and as a result overlaps with multiple releases. Expect a future blog post about the Emoji 15.0 candidates landing early this year (Shout out goose, pink heart, and pushing hands). I’ve been holding off writing about this set until you can actually see them on your phones but given that we’re already talking about 2024 maybe it’s time I dust that blog post off).

Emoji 2023 timeline image

Anyways, among the list of Emoji 15.1 recommendations for 2024 includes 578 characters (most of them the candidates described above to support directionality). The list also includes a few humble additions including a broken chain, a lime, a non-poisonous mushroom, a nodding and shaking face, and a phoenix bird. Each one of these leverages a unique valid ZWJ sequence of emoji so while they look like atomic characters made of a single codepoint they are composed of two or more codepoints.

Broken chain is the result of a 🔗💥 ZWJ and contains a variety of meanings, such as freedom, breaking a cycle, or perhaps a broken url ;-). Nodding face and shaking face are composed of arrows to imply movement in a still image (🙂↔️) and (🙂↕️). Oh, and of course there is a phoenix rising from the ashes (🐦🔥), an ancient metaphor that captures the zeitgeist of today.

The Unicode Technical Committee (UTC) will review the required documents at its first meeting of 2023 in January – and if these candidates move forward, you can expect an update from the UTC later this Spring and Summer.

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Tuesday, March 5, 2024

Unicode CLDR v45 Alpha available for testing

The Unicode CLDR v45 Alpha is now available for integration testing.

CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.) For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)

Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.

The alpha has already been integrated into the development version of ICU. We would especially appreciate feedback from non-ICU consumers of CLDR data and on Migration issues. Feedback can be filed at CLDR Tickets.

CLDR 45 is a closed release with no submission period, focusing on just a few areas: