** Summary ** [Richard Suchenwirth] 2002-11-26: Characters are abstractions of writing elements (e.g. letters, digits, punctuation characters, Chinese ideographs, ligatures...). In Tcl since 8.1, characters are internally represented with Unicode (see [Unicode and UTF-8]), which can be seen as unsigned integers between 0 and 65535 (recent Unicode versions have even crossed that boundary, but the Tcl implementation currently uses a maximum of 16 bits). Convert between numeric Unicode and characters with ====== set char [format %c $int] set int [scan $char %c] ====== Watch out that ''int'' values above 65535 produce 'decreasing' characters again, while negative ''int'' even produces two bogus characters. [format] does not warn, so better test before calling it. Sequences of characters are called ''strings''. Characters are no separate data type in Tcl, but represented as strings of length one ([everything is a string]). Represented as UTF-8, a character can be one to three bytes long in memory or file. Find out the bytelength of a character with ====== string bytelength $c ;# assuming [string length $c]==1 ====== String routines can be applied to single characters too, e.g [[string toupper]] etc. Find out whether a character is in a given set (a character string) with ====== expr {[string first $char $set]>=0} ====== As Unicodes for characters fall in distinct ranges, checking whether a character's code lies withing a range allows more or less rough classification of its category: ====== proc inRange {from to char} { # generic range checker set int [scan $char %c] expr {$int>=$from && $int <= $to} } interp alias {} isGreek {} inRange 0x0386 0x03D6 interp alias {} isCyrillic {} inRange 0x0400 0x04F9 interp alias {} isHangul {} inRange 0xAC00 0xD7A3 ====== ** See Also ** [Unicoded integer sets]: [Characters, glyphs, code-points, and byte-sequences]: [Non-ASCII characters]: [Character]: <> Category Concept | Arts and crafts of Tcl-Tk programming