Version 39 of Unicode

Updated 2008-04-28 11:35:42 by LV

Purpose: Gather together related Wiki references to Unicode (= ISO 10646), the standard multilingual character encoding - if the ASCII code chart is a page, Unicode is a book with hundreds of such pages. Or, put differently: Unicode is to ASCII what rational numbers are to integers - you still can't represent everything, but infinitely more ;-)

http://www.unicode.org/ http://www.wikipedia.org/wiki/unicode http://www.joelonsoftware.com/articles/Unicode.html


http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html has iso10646 fonts that are useful to install on Unix.

http://www.slovo.info/unifonts.htm links to free TrueType fonts for larger or smaller subsets of the Unicode. Two fonts for writing almost any Latin script based language, one of them is TimesRoman like [L1 ] together with many articles about Unicode.


Until version 3.0, 16 bits (\u0000-\uFFFD: the "Basic Multilingual Plane", BMP) were sufficient for any Unicode. From 3.1, we must expect longer codes - up to 31 bits long, as specified in ISO 10646. Why 31 bits? Because that is the maximum that can be expressed in UTF-8: 6 bytes, omitting the taboo values \xFE and \xFF. (RS)

 1111110a 10aaaaaa 10bbbbbb 10bbcccc 10ccccdd 10dddddd

(small letters standing for "payload" bits of bytes a..d, highestmost has only 7 bits)


Character Sets And Code Pages At The Push Of A Button http://www.i18nguy.com/unicode/codepages.html

Lots of other good info at the same site's home page [L2 ]


"ASCII, dammit" [L3 ] goes the other way.

Unicode Explained [L4 ] (book review)


LV What other fonts are useful for a system to have available to display fonts? Many of this wiki's pages look like damaged nonsense (for instance, displaying the fraction symbols, etc. rather than language characters). I'm using Windows XP, so I would have thought I would have the most popular fonts. Or, maybe it is some settings I need to make to Firefox 2.x? an example of the type of page I mean is (well, this morning - 2007 June 19 10:10 EDT), http://wiki.tcl.tk/34


From comp.lang.tcl, the following exchange occurred during April, 2008:

Newsgroups: comp.lang.tcl
From: [email protected]
Date: Sat, 26 Apr 2008 11:55:45 -0700 (PDT)
Local: Sat, Apr 26 2008 2:55 pm 
Subject: unicode - get character representation from \uxxx notation
Reply | Reply to author | Forward | Print | Individual message | Show original | Report this message | Find messages by this author 
Hello, 

to show my problem see the following example: 



> set tcl_patchLevel 


8.5.3b1 


> set str "n\u00E4mlich" 


nämlich 


> set c 0xE4 
> set str "n\\uformat %04.4X $chmlich" 


n\u00E4mlich 

How do I get the \u00E4 in the character representation let's say 
iso8859-1 ? 



> encoding convertto iso8859-1 $str 


Newsgroups: comp.lang.tcl
From: [email protected]
Date: Sat, 26 Apr 2008 14:21:27 -0700 (PDT)
Local: Sat, Apr 26 2008 5:21 pm 
Subject: Re: unicode - get character representation from \uxxx notation

To convert the hex number expressed as a string 0x00e4 to a Unicode 
character, use: 

format "%c" 0x00e4 


You can then use encoding convertto to convert this to another 
encoding, e.g.: 


encoding convertto iso8859-1 format "%c" 0x00e4 


i18n - writing for the world - Arts and crafts of Tcl-Tk programming - some random korean text