Version 13 of Unicode

Updated 2003-11-11 20:33:54

Purpose: Gather together related Wiki references to Unicode (= ISO 10646), the standard multilingual character encoding - if the ASCII code chart is a page, Unicode is a book with hundreds of such pages. Or, put differently: Unicode is to ASCII what rational numbers are to integers - you still can't represent everything, but infinitely more ;-)

http://www.unicode.org/ http://www.wikipedia.org/wiki/unicode http://www.joelonsoftware.com/articles/Unicode.html


http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html has iso10646 fonts that are useful to install on Unix.

http://www.slovo.info/unifonts.htm links to free TrueType fonts for larger or smaller subsets of the Unicode.


Until version 3.0, 16 bits (\u0000-\uFFFD: the "Basic Multilingual Plane", BMP) were sufficient for any Unicode. From 3.1, we must expect longer codes - up to 31 bits long, as specified in ISO 10646. Why 31 bits? Because that is the maximum that can be expressed in UTF-8: 6 bytes, omitting the taboo values \xFE and \xFF. (RS)

 1111110a 10aaaaaa 10bbbbbb 10bbcccc 10ccccdd 10dddddd

(small letters standing for "payload" bits of bytes a..d, highestmost has only 7 bits)


i18n - writing for the world - Arts and crafts of Tcl-Tk programming


Category Characters