Fun with Chinese Characters

JimG 20050425

I am just starting to learn Tcl. Find it's fun when using Chinese Characters.

     puts "The next line is in Chinese. Since we don't
           use blank space to separate words, {}s and \"s are not needed!"
     puts 中文不用空格来分词,所以句子不需要用引号或者大括号。
      set cs 白日依山尽,黄河入海流。欲穷千里目,更上一层楼。
      puts [string replace $cs end-5 中文更自由。]

It's interesting.

There're others to come...

KJN 2005-05-20

Non-ASCII characters are first-class citizens in Tcl (unlike other languages which "support" Unicode as a kind of optional extra). Tcl is therefore a very good foundation for programming in languages other than English. Variables and commands can be named with non-ASCII characters, for example:

  set 子 需要用
  puts $子

works just fine. Tcl lets you rename and alias commands, so you can also say

  interp alias {} 用 {} puts
  用 $子


  rename puts 用
  用 $子

If you wish, you can keep the "English" commands in a different place by renaming them:

  rename puts en_puts
  interp alias {} 用 {} en_puts
  用 $子

You can do this in a systematic way, by moving the English commands into a namespace.

With just an hour's work, it would be possible to translate every Tcl command into a different natural language. Command options are more difficult, but could be translated in a substitute command that then calls the "real" command. The most difficult problem is with error messages: is there an easy way to modify the error messages that Tcl sends? If so, English could be completely replaced with an arbitrary natural language.

wdb It is doubtlessly big fun to nationalise via rename ... but a drawback remains: you cannot do the same to the sub commands, e.g. you cannot rename string is integer to something like German zeichenfolge entspricht ganzzahl ... possibly a tip to improve the command rename?

Lars H: Rather unlikely, in view of how the parsing of subcommands is often hardcoded in C internally. You can however use namespace ensemble to "rename" the subcommands:

 namespace ensemble create -command sträng -map {
    bytelängd {string bytelength}
    jämför {string compare}
    lika {string equal}
    första {string first}
    index {string index}
    är {{sträng är}}
    sista {string last}
    längd {string length}
    avbilda {string map}
    matchar {string match}
    intervall {string range}
    upprepa {string repeat}
    bytut {string replace}
    medgemena {string tolower}
    somititel {string totitle}
    medversaler {string toupper}
    beskär {string trim}
    beskärvänster {string trimleft}
    beskärhöger {string trimright}
    ordslut {string wordend}
    ordbörjan {string wordstart}
 namespace ensemble create -command {sträng är} -map {
    alfanumerisk {string is alnum}
    bokstäver {string is alpha}
    ascii {string is ascii}
    boolesk {string is boolean}
    styrtecken {string is control}
    siffror {string is digit}
    dubbelprecisionsflyttal {string is double}
    falsk {string is false}
    synlig {string is graph}
    heltal {string is integer}
    gemena {string is lower}
    skrivbar {string is print}
    skiljetecken {string is punct}
    mellanslag {string is space}
    sann {string is true}
    versaler {string is upper}
    ordtecken {string is wordchar}
    hexsiffror {string is xdigit}

RAY - 2009-07-22 10:20:51

Fun with Chinese Characters the Straits Times Collection is a great book about Chinese Characters.

CJK characters are first class citizen in tcl. This is very cool. Here is a little experiment with string length in tcl, perl, python and ruby. (; is my shell prompt)

; tclsh
% string length I  
% string length 我
% exit
; perl -de 42

Loading DB routines from version 1.31
Editor support available.

Enter h or `h h' for help, or `man perldebug' for more help.

main::(-e:1):   42
  DB<1> print length 'I'
  DB<2> print length '我'
  DB<3> q
; irb
irb(main):001:0> 'I'.length
=> 1
irb(main):002:0> '我'.length
=> 3
irb(main):003:0> quit
; python
Python 2.6.4 (r264:75706, Nov 24 2009, 21:34:34) 
[GCC 4.0.1 (Apple Computer, Inc. build 5370)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> len('I')
>>> len('我')
>>> len(u'我')
>>> len(u'我'.encode('utf8'))