Version 14 of Things Chinese

Updated 2008-01-31 15:19:36 by LV

A Chinese website dedicated to Tcl/Tk:

RS 2006-05-30 adds


RS 2007-08-29: Here's a converter from Chinese characters in Unicode to decimal GB2312 numbers:

 proc c2gb str {
   set res ""
   foreach c [split $str ""] {
      binary scan [encoding convertto gb2312 $c] cc a b
      set a [expr {($a+0x100) % 0x100 - 160}]
      set b [expr {($b+0x100) % 0x100 - 160}]
      lappend res [format %02d%02d $a $b]
   set res
 % c2gb 上海
 4147 2603

ZU 2007-08-09: Could you please tell me how can I use that, wenn I use TclKit zu show some character with chinise charaters? And I have tried

 puts "\u2603"  ;# the second nummber above

And I get ☃. May I ask why?

AM (31 august 2007) That is interesting in itself too :) but the problem is really that Unicode expects hexadecimal numbers, whereas the above are decimal. - RS: Yes. The Unicodes of these example characters can be scanned out:

 % u2x 上海

LV 2008 Jan 31 I was asked, today, whether Tcl supports the GBK or GB18030 chinese character encodings. I don't see it among the encodings listed when I type:

$ tclsh8.5
% encoding names
cp860 cp861 cp862 cp863 tis-620 cp864 cp865 cp866 gb12345 gb2312-raw cp949 cp950 cp869 dingbats ksc5601 macCentEuro cp874 macUkraine jis0201 gb2312 euc-cn euc-jp macThai iso8859-10 jis0208 iso2022-jp macIceland iso2022 iso8859-13 jis0212 iso8859-14 iso8859-15 cp737 iso8859-16 big5 euc-kr macRomania macTurkish gb1988 iso2022-kr macGreek ascii cp437 macRoman iso8859-1 iso8859-2 iso8859-3 macCroatian koi8-r iso8859-4 ebcdic iso8859-5 cp1250 macCyrillic iso8859-6 cp1251 macDingbats koi8-u iso8859-7 cp1252 iso8859-8 cp1253 iso8859-9 cp1254 cp1255 cp850 cp1256 cp932 identity cp1257 cp852 macJapan cp1258 shiftjis utf-8 cp855 cp936 symbol cp775 unicode cp857

Has anyone out there worked out the issues?