The world is turning local. Are you?
An Indian perspective to Unicode and Localisation

Arrival of Unicode 5.0.0: It is 99,000 characters strong now

The new Unicode version, 5.0.0, defines more than 99,000 characters for the languages of the world, and provides the detailed properties needed for computer software implementations.

By Balendu Sharma Dadhich 18/07/06

The Unicode Consortium has announced the release of a significant update of its widely-used Unicode Character Database (UCD). Unicode 5.0.0 is a major version of the Unicode Standard and supersedes all previous versions.

The new version, Version 5.0.0, defines more than 99,000 characters for the languages of the world, and provides the detailed properties needed for computer software implementations. This latest level of the UCD contains all the information needed to update software to support the characters and algorithms that are the foundation for all modern computer programs -- including the latest data for Unicode security mechanisms, collation, and locales.

1,369 new character assignments have been made to the Unicode Standard, Version 5.0 (over and above what was in Unicode 4.1.0). These additions include new characters for Cyrillic, Greek, Hebrew, Kannada, Latin, math, phonetic extensions, symbols, and five new scripts: Balinese, N’Ko, Phags-pa, Phoenician, and Sumero-Akkadian Cuneiform.

Unicode Character Database now includes Graphic (98,884 ), Format (140), Control (65),
Symbols for private Use (137,468), Surrogate (2,048), Noncharacter (66) and Reserved (875,441).

The publication of the book, The Unicode Standard, Version 5.0, is pending and is expected in the fourth quarter of 2006. The book, presently under copy edit stage, is going to be consisting thousands of pages. During the first quarter of 2007, it would also be made available on the Internet with certain restrictions.

Some major changes have taken place in the new Unicode version over the last. As Unicode consortium puts it, “For stability of protocols on the Internet and elsewhere, Unicode 5.0 also makes changes to guarantee case-folding stability. Unicode 5.0 incorporates all the changes introduced in Unicode 4.1, including full interoperability with the most recent versions of GB 18030, JIS X 0213, and HKSCS, and support for stable identifiers and pattern syntax characters.

“Unicode 5.0 revises and improved property values and behavioral specifications in areas such as character, word, line, and sentence segmentation, and tightens conformance requirements on Bidi implementations (used for Arabic and Hebrew). The text is significantly revised for clarity and completeness, especially for Unicode conformance.

“Unicode 5.0 covers the full repertoire of ISO/IEC 10646:2003, including Amendments 1 and 2, which add characters required for some languages of India, for mathematicians, for minority languages, and for academic use.

“The Unicode Standard is closely connected with other Unicode software globalization standards in such key areas as collation (used for sorting, searching, and matching), character set conversion, regular expressions, and the interchange and registration of locale data for the world's languages and local cultural conventions [CLDR]. It has been further significantly augmented by several new Unicode Technical Standards that provide recommendations and data to assist in secure implementation of Unicode, and to establish the registration mechanism for Ideographic Variation Sequences needed by the publishing industry for Chinese and Japanese.”

Welcome Unicode 5.1: the mightier one

Unicode is IT’s contribution to Globalisation

Unicode: a smarter, more powerful encoding

An effort to promote unhindered use of Indian languages in Information Technology
Copyright:
localisationlabs.com. 2006. Since: March, 2006.
A website by Balendu Sharma Dadhich.