Version 19 of Things Chinese

Updated 2008-04-15 08:33:01 by wjg

A Chinese website dedicated to Tcl/Tk: (2006-05-30)


RS 2007-08-29: Here's a converter from Chinese characters in Unicode to decimal GB2312 numbers:

 proc c2gb str {
   set res ""
   foreach c [split $str ""] {
      binary scan [encoding convertto gb2312 $c] cc a b
      set a [expr {($a+0x100) % 0x100 - 160}]
      set b [expr {($b+0x100) % 0x100 - 160}]
      lappend res [format %02d%02d $a $b]
   set res
 % c2gb 上海
 4147 2603

ZU 2007-08-09: Could you please tell me how can I use that, wenn I use TclKit zu show some character with chinise charaters? And I have tried

 puts "\u2603"  ;# the second nummber above

And I get ☃. May I ask why?

AM (31 august 2007) That is interesting in itself too :) but the problem is really that Unicode expects hexadecimal numbers, whereas the above are decimal. - RS: Yes. The Unicodes of these example characters can be scanned out:

 % u2x 上海

LV 2008 Jan 31 I was asked, today, whether Tcl supports the GBK or GB18030 chinese character encodings. I don't see it among the encodings listed when I type:

$ tclsh8.5
% encoding names
cp860 cp861 cp862 cp863 tis-620 cp864 cp865 cp866 gb12345 gb2312-raw cp949 cp950 cp869 
dingbats ksc5601 macCentEuro cp874 macUkraine jis0201 gb2312 euc-cn euc-jp macThai iso8859-10 
jis0208 iso2022-jp macIceland iso2022 iso8859-13 jis0212 iso8859-14 iso8859-15 cp737 
iso8859-16 big5 euc-kr macRomania macTurkish gb1988 iso2022-kr macGreek ascii cp437 macRoman 
iso8859-1 iso8859-2 iso8859-3 macCroatian koi8-r iso8859-4 ebcdic iso8859-5 cp1250 macCyrillic 
iso8859-6 cp1251 macDingbats koi8-u iso8859-7 cp1252 iso8859-8 cp1253 iso8859-9 cp1254 cp1255 
cp850 cp1256 cp932 identity cp1257 cp852 macJapan cp1258 shiftjis utf-8 cp855 cp936 symbol cp775 unicode cp857

Has anyone out there worked out the issues?

WJP 2008 Jan 31 iconv supports this GBK and GB18030, so it might be helpful to look at the source, or possibly just run iconv.