Characters are abstractions of writing elements (e.g. letters, digits, punctuation characters, Chinese ideographs, ligatures...). In Tcl since 8.1, characters are internally represented with Unicode (see [Unicode and UTF-8]), which can be seen as unsigned integers between 0 and 65535 (recent Unicode versions have even crossed that boundary, but the Tcl implementation currently uses a maximum of 16 bits). Convert between numeric Unicode and characters with set char [format %c $int] set int [scan $char %c] Watch out that ''int'' values above 65535 produce 'decreasing' characters again, while negative ''int'' even produces two bogus characters. [format] does not warn, so better test before calling it. Sequences of characters are called ''strings''. Characters are no separate data type in Tcl, but represented as strings of length one ([everything is a string]). Represented as UTF-8, a character can be one to three bytes long in memory or file. Find out the bytelength of a character with string bytelength $c ;# assuming [string length $c]==1 String routines can be applied to single characters too, e.g [[string toupper]] etc. Find out whether a character is in a given set (a character string) with expr {[string first $char $set]>=0} As Unicodes for characters fall in distinct ranges, checking whether a character's code lies withing a range allows more or less rough classification of its category: proc inRange {from to char} { # generic range checker set int [scan $char %c] expr {$int>=$from && $int <= $to} } interp alias {} isGreek {} inRange 0x0386 0x03D6 interp alias {} isCyrillic {} inRange 0x0400 0x04F9 interp alias {} isHangul {} inRange 0xAC00 0xD7A3 See also [Unicoded integer sets] - [Characters, glyphs, code-points, and byte-sequences] - [Non-ASCII characters] ---- [Category Concept] | [Arts and crafts of Tcl-Tk programming]