Showing posts with label beta. Show all posts
Showing posts with label beta. Show all posts

Tuesday, May 21, 2024

Unicode 16.0 Beta Review Open

[image]
The beta review period for Unicode® 16.0 has started and is open until July 2,2024.

The beta is intended primarily for review of character property data and changes to algorithm specifications (Unicode Standard Annexes). Also, for the first time, a complete draft of the core specification text is available for review during the beta period.

At this phase of a release, the character repertoire is considered stable. For this release, 5,185 new characters will be added, bringing the total number of encoded characters in Unicode 16.0 to 154,998. The new additions include seven new scripts:
  • Garay is a modern-use script from West Africa
  • Gurung Khema, Kirat Rai, Ol Onal, and Sunuwar are four modern-use scripts from Northeast India and Nepal
  • Todhri is an historic script used for Albanian
  • Tulu-Tigalari is an historic script from Southwest India
Other character additions include seven new emoji characters plus 3,995 additional Egyptian Hieroglyphs and over 700 symbols from legacy computing environments. See the delta code charts for details on all the new scripts and characters.

In addition to new characters, new “Moji Jōhō Kiban” (文字情報盤) Japanese source references will be added for over 36,000 CJK unified ideographs. This will be reflected in the code charts for virtually all CJK unified ideograph blocks by additional representative glyphs in the “J” column. Note that these glyph additions are not reflected in the delta charts mentioned above, but can be seen in the main (“single-block”) charts for the Unicode 16.0 Beta.

Various changes to properties, algorithms, and Unicode Standard Annexes will be made for Unicode 16.0. This version will add two new Unicode Standard Annexes:
  • UAX #53, Unicode Arabic Mark Rendering, provides a specification for interoperable font and shaping implementations for Arabic script. (This was previously published separately from the Unicode Standard as a technical report.)
  • UAX #57, Unicode Egyptian Hieroglyph Database (Unikemet), provides data essential for understanding the identity of over 5,100 Egyptian Hieroglyph characters encoded in Unicode 16.0. (This is similar to data for CJK unified ideographs provided in UAX #38.)
A new UCD file, DoNotEmit.txt, will provide data in machine readable form that can be useful for software implementations but that previously was provided only as tables within the core specification text. See the Unicode 16.0 Beta landing page for other noteworthy property and algorithm changes.

For full details regarding the Beta, see Public Review Issue #502. Feedback should be reported under PRI #502 using the Unicode contact form by July 2, 2024.


Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Thursday, March 30, 2023

The Unicode CLDR v43 Beta is now available for integration testing

[image] CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.). For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)

Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.

It is important to review the Migration section for changes that might require action by implementations using CLDR directly or indirectly (eg, via ICU), and the Specification changes, since those are new since the Alpha.

We appreciate feedback from both ICU and non-ICU consumers of CLDR data. (The Beta has already been integrated into the development version of ICU.) Feedback can be filed at CLDR Tickets. Any tickets should be filed as soon as possible, because the target release date is 2023 Apr 12, Wed.

CLDR 43 is a limited-submission release, focusing on just a few areas:
  1. Formatting Person Names
    • Completing the data for formatting person names, allowing it to advance out of “tech preview”. For more information on the benefits of this feature, see Background.
  2. Locales
    • Adding substantially to the LikelySubtags data: This is used to find the likely writing system and country for a given language, used in normalizing locale identifiers and inheritance. The data has been contributed by SIL
    • Inheritance: Adding components to parentLocales, and documenting the different inheritance for rgScope data, which inherits primarily by region
  3. Other data updates
    • Alternate names for Turkey / Türkiye
    • Name for the new timezone Ciudad Juárez
  4. Structure
    • Adding some structure and data needed for ICU4X & JavaScript, for calendar eras and parentLocales.
  5. Collation & Searching
    • Treat various quote marks as equivalent at a Primary strength, also including Geresh and Gershayim.
To find out more about these and other changes, see the draft CLDR v43 release page, which has information on accessing the date, reviewing charts of the changes, and — importantly — Migration issues.



Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

[badge]

Tuesday, May 31, 2022

Unicode 15.0 Beta Review

[Kawi beta chart image] The beta review period for Unicode 15.0 has started. The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones-plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases. Thus it is important to ensure a smooth transition to each new version of the standard.

Unicode 15.0 includes a number of changes and 4,489 new characters, including another major extension of CJK unified ideographs. A number of the Unicode Standard Annexes have significant modifications for Unicode 15.0. Two new scripts have been added, and there are also 20 additional emoji characters in Unicode 15.0.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by July 12, 2022. The review period will only be for six weeks, so prompt feedback is appreciated. Feedback instructions are on the beta page.

See https://www.unicode.org/versions/beta-15.0.0.html for more information about testing the 15.0.0 beta.

See https://www.unicode.org/versions/Unicode15.0.0/ for the current draft summary of Unicode 15.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Amazon, Apple, Emojipedia, Google, Government of Bangladesh, International Emerging Technology Company (ETCO), Meta, Microsoft, Netflix, Salesforce, SAP, Tamil Virtual Academy, The University of California (Berkeley), Yat Labs, plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to https://home.unicode.org/membership/members/.

For more information, please contact the Unicode Consortium https://home.unicode.org/connect/contact-unicode/.


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Thursday, March 24, 2022

Unicode CLDR v41 Beta available for testing

[beta image] The Unicode CLDR v41 Beta is now available for testing. The beta has already been integrated into the development version of ICU

The XML data, JSON data, charts, and specification are available for review. These may change if showstopper bugs are found. We would especially appreciate feedback from non-ICU consumers of CLDR data. Feedback can be filed at CLDR Tickets.

The release is scheduled for April 06, 2022.

CLDR v41 is a limited-submission release. Most work was on tooling, with only specified updates to the data, namely Phase 3 of the grammatical units of measurement project. The required grammar data for the Modern coverage level increased, with 40 locales adding an average of 4% new data each. Ukrainian grew the most, by 15.6%. 
The tooling changes  are targeted at the v42 general submission release. They include a number of features and improvements such as progress meter widgets in the Survey Tool

Finally, the Basic level has been modified to make it easier to onboard new languages, and easier for implementations to filter locale data based on coverage levels.

The following table shows the number of Languages/Locales in this version. (See the v41 Locale Coverage table for more information.)

Level Languages  Locales  Notes
Modern 89 361 Suitable for full UI internationalization
Moderate 13 32 Suitable for full “document content” internationalization, such as formats in a spreadsheet.
Basic 22 21 Suitable for locale selection, such as choice of language in mobile phone settings.
Total 124 414 Total of all languages/locales with ≥ Basic coverage.

Beyond the member organizations of the Unicode Consortium, many dedicated communities and individuals regularly contribute to updating their locales, including:
  • Modern: Cherokee, Cantonese, Scottish Gaelic,  Sorbian (Lower), Sorbian (Upper)
  • Moderate: Asturian [nearly Modern], Breton, Faroese, Fulah (Adlam), Kaingang, Nheengatu, Quechua, Sardinian
  • Basic: Bosnian (Cyrillic), Interlingua, Kabuverdianu, Māori, Romansh, Tajik, Tatar, Tongan, Uzbek (Cyrillic), Wolof
For details, see the Unicode CLDR v41 Release Note.
The next version of CLDR, version 42, is slated to start General Submission on May 18, 2022.

Unicode CLDR provides key building blocks for software supporting the world’s languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Wednesday, October 6, 2021

Unicode CLDR v40 Beta available for testing

[beta image] The Unicode CLDR v40 Beta is now available for testing. The beta has already been integrated into the development version of ICU. We would especially appreciate feedback from non-ICU consumers of CLDR data. Feedback can be filed at CLDR Tickets.

Beta means that the main data, charts, and specification are available for review, but the JSON data is not yet ready for review. Some data may change if showstopper bugs are found. The planned schedule is:
  • Oct 27 — Release
In CLDR v40, the focus is on:

Grammatical features (gender and case) for units of measurement in additional locales
  • In many languages, forming grammatical phrases requires dealing with grammatical gender and case. Without that, it can sound as bad as "on top of 3 hours" instead of "in 3 hours"
  • Phase 1 (v39) of grammatical features included just 12 locales (da, de, es, fr, hi, it, nl, no, pl, pt, ru, sv).
  • Phase 2 (v40) has expanded the number of locales by 29 (am, ar, bn, ca, cs, el, fi, gu, he, hr, hu, hy, is, kn, lt, lv, ml, mr, nb, pa, ro, si, sk, sl, sr, ta, te, uk, ur), but for a more restricted number of units.
Emoji v14 names and search keywords
  • These supply short names and search keywords for the new emoji, so that implementations can build on them to provide, for example, type-ahead in keyboards
Modernized Survey Tool front end.
  • The Survey Tool is used to gather all the data for locales. The outmoded Javascript infrastructure (very difficult to enhance or even fix bugs) was modernized.
Specification Improvements
  • Notably in the areas of Locale Identifiers, Dates, and Units of Measurement
There are many other changes: to find out more, see the draft CLDR v40 release page, which has information on accessing the date, reviewing charts of the changes, and necessary migration changes.

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Tuesday, June 8, 2021

Unicode 14.0 Beta Review

Vithkuqi Sample The beta review period for Unicode 14.0 has started. The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones-plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases. Thus it is important to ensure a smooth transition to each new version of the standard.

Unicode 14.0 includes a number of changes and 838 new characters. Some of the Unicode Standard Annexes have modifications for Unicode 14.0. Five new scripts have been added in Unicode 14.0. There are also additional emoji characters.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by July 13, 2021. This will be a slightly shorter review period of only five weeks, so prompt feedback is appreciated. Feedback instructions are on the beta page.

See https://www.unicode.org/versions/beta-14.0.0.html for more information about testing the 14.0.0 beta.

See https://www.unicode.org/versions/Unicode14.0.0/ for the current draft summary of Unicode 14.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, Emojipedia, Facebook, Google, Government of Bangladesh, Government of India, Microsoft, Netflix, Sultanate of Oman MARA, Salesforce, SAP, Tamil Virtual Academy, The University of California (Berkeley), Yat Labs, plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to https://home.unicode.org/membership/members/

For more information, please contact the Unicode Consortium https://www.unicode.org/contacts.html.


Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Thursday, March 25, 2021

CLDR v39 Beta 2

[beta image]The CLDR v39 beta has reached specification freeze, so no further changes will be made to the CLDR specification (aka LDML) except for showstoppers. For more details please see the release page.

The CLDR v39 release is planned for 2021-Apr-07.


Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Tuesday, November 19, 2019

Unicode 13.0 Beta Review

U13 beta image The beta review period for Unicode 13.0 has started. The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases. Thus it is important to ensure a smooth transition to each new version of the standard.

Unicode 13.0 includes a number of changes and 5,930 new characters. Some of the Unicode Standard Annexes have modifications for Unicode 13.0, often in coordination with changes to character properties. For the first time, a CJK extension has been encoded in plane 3, the Tertiary Ideographic Plane. Four new scripts have been added in Unicode 13.0. There are also 55 additional emoji characters and many other new emoji, including the transgender flag and polar bear.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by January 6, 2020. Feedback instructions are on the beta page.

See http://unicode.org/versions/beta-13.0.0.html for more information about testing the 13.0.0 beta.

See http://unicode.org/versions/Unicode13.0.0/ for the current draft summary of Unicode 13.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, Emojipedia, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Netflix, Sultanate of Oman MARA, Oracle, SAP, Tamil Virtual University, The University of California (Berkeley), plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to https://home.unicode.org/membership/members/.

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Monday, November 5, 2018

Unicode 12.0 Beta Review

U12 beta image The beta review period for Unicode 12.0 has started. The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases. Thus it is important to ensure a smooth transition to each new version of the standard.

Unicode 12.0 includes a number of changes and 554 new characters. Some of the Unicode Standard Annexes have modifications for Unicode 12.0, often in coordination with changes to character properties. In particular, there are minor changes to UAX #29, Unicode Text Segmentation, to account for differences in Georgian casing behavior. Four new scripts have been added in Unicode 12.0. There are also 61 additional emoji characters, as well as very significant enhancements to the representation and behavior of multiperson emoji.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by January 7, 2019. Feedback instructions are on the beta page.

See http://unicode.org/versions/beta-12.0.0.html for more information about testing the 12.0.0 beta.

See http://unicode.org/versions/Unicode12.0.0/ for the current draft summary of Unicode 12.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, Emojipedia, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Netflix, Sultanate of Oman MARA, Oracle, SAP, Shopify, Tamil Virtual University, The University of California (Berkeley), plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to http://www.unicode.org/consortium/members.html.

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Friday, April 13, 2018

Last Call on Unicode 11.0 Review

stopwatch image The beta review period for Unicode 11.0 and related technical standards will close on April 23, 2018. This is the last opportunity for technical comments before version 11.0 is released in Q2 2018. Implementers and interested parties are encouraged to download data files, review proposed updates, and submit comments.

Unicode 11.0 adds seven new scripts, including Hanifi Rohingya, 66 additional emoji characters, including four new components for hair color (for a total of 157 emoj sequences). The set of Georgian Mtavruli capital letters has been added to support modern casing practices.
In addition to the Unicode core specification, five Unicode Standard Annexes and two Unicode Technical Standards have significant specification and/or data file updates that are correlated with the new additions for Unicode 11.0.0. Review of those changes is strongly encouraged during the beta review period.

UAX #14, Unicode Line Breaking Algorithm
  • Uses Extended_Pictographic property for future-proofing
UAX #29, Unicode Text Segmentation
  • New support for Indic virama handling
  • Uses Extended_Pictographic property for future-proofing
  • A new table of formal regex definitions
UAX #31, Unicode Identifier and Pattern Syntax
  • Refines the use of ZWJ in identifiers
  • Broadens the definition of hashtag identifiers
UAX #38, Unicode Han Database (Unihan)
  • Five new fields and improved regular expressions.
  • Document extension of Unihan properties to non-Unihan
UAX #44, Unicode Character Database
  • New property Equivalent_Unified_Ideograph
  • New regular expressions Bidi_Paired_Bracket & Equivalent_Unified_Ideograph
  • More discussion of emoji variation sequences
  • Clarification of values allowed for the Age property
UTS #10, Unicode Collation Algorithm
  • Updates data to Unicode 11.0
  • Clarification of search tailoring in visual-order scripts
UTS #39, Unicode Security Mechanisms
  • Updates data to Unicode 11.0
  • Enhances discussions of joining controls & combining sequences
UTS #46, Unicode IDNA Compatibility Processing
  • Updates data to Unicode 11.0
  • Changes the format of the test file for arbitrary input settings
  • Updates input setting for Transitional_Processing
UTS #51, Unicode Emoji
  • Supplies Extended_Pictographic property for future-proofing
  • Simplifies emoji sequence definitions
  • EBNF and Regex expressions for loose matches
  • More proposed guidelines: gender-neutral emoji, skin-tone modifiers, ZWJ visible fallbacks, hair-style components
  • Mechanism for changing the “facing” direction for emoji
Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by April 23, 2018. Feedback instructions are in each public review page. For more information, see the open public review issues.


Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Wednesday, March 14, 2018

Unicode 11.0 Beta Review

U11 beta image The beta review period for Unicode 11.0 has started. The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases. Thus it is important to ensure a smooth transition to each new version of the standard.

Unicode 11.0 includes a number of changes. Some of the Unicode Standard Annexes have modifications for Unicode 11.0, often in coordination with changes to character properties. In particular, there are major changes to UAX #29, Unicode Text Segmentation. Seven new scripts have been added in Unicode 11.0, including Hanifi Rohingya. A major adjustment has been made to the Georgian script, with the introduction of uppercase Georgian letters. There are also 66 additional emoji characters.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by April 23, 2018. Feedback instructions are on the beta page.

See http://unicode.org/versions/beta-11.0.0.html for more information about testing the 11.0.0 beta.

See http://unicode.org/versions/Unicode11.0.0/ for the current draft summary of Unicode 11.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, Emojipedia, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Netflix, Sultanate of Oman MARA, Oracle, SAP, Shopify, Symantec, Tamil Virtual University, The University of California (Berkeley), plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to http://www.unicode.org/consortium/members.html.

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Monday, December 4, 2017

Unicode Emoji 11.0 Beta

Emoji 11.0 beta is now available for developers, with 130 Draft Candidates, such as:

🧁 🥳 🦸‍♀️ 👨‍🦰 👨🏿‍🦱 🦚 🦞

The list is not final: changes can include removals or additions: for example, new ZWJ sequences could be added. The decisions of the 2017Q4 UTC meeting for emoji have been incorporated into the draft Charts, Specification, and Data, and are now available for testing and feedback. The contents will be finalized in 2018Q1. The following are the expected dates for 2018.

Emoji Set Decision date Announcement of final list Market availability
Draft Emoji Candidates (2018) 2018-01 2018-03 2018H2

The version number for the next release of Unicode emoji is jumping from the previously-released Emoji 5.0 to Emoji 11.0 (instead of 6.0). This is due to alignment of the emoji versions in 2018 and beyond with the versions of the Unicode Standard.

The draft emoji 11.0β Charts now show the candidates in context: for example, Emoji Ordering, v11.0β shows the sorting of all the emoji, with the candidates highlighted with rounded-rectangles. Feedback on the sort-order, categories, names, and keywords is welcome.

The draft 11.0β Specification has a number of changes, including proposed guidelines for display, handling gender, handling skin tone, and a proposed mechanism for allowing emoji to point either to the right or left.

The draft 11.0β Emoji Data provides property data, which determines how implementations handle the new characters.

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[fortune cookie badge]