Characters are abstractions of writing elements (e.g. letters, digits, punctuation characters, Chinese ideographs, ligatures...). In Tcl since 8.1, characters are internally represented with Unicode (see [Unicode and UTF-8]), which can be seen as unsigned integers between 0 and 65535 (recent Unicode versions have even crossed that boundary, but the Tcl implementation currently uses a maximum of 16 bits). Convert between numeric Unicode and characters with
 set char [format %c $int]
 set int  [scan $char %c]
Watch out that ''int'' values above 65535 produce 'decreasing' characters again, while negative ''int'' even produces two bogus characters. [format] does not warn, so better test before calling it.

Sequences of characters are called ''strings''. Characters are no separate data type in Tcl, but represented as strings of length one ([everything is a string]). Represented as UTF-8, a character can be one to three bytes long in memory or file. Find out the bytelength of a character with
 string bytelength $c ;# assuming [string length $c]==1
String routines can be applied to single characters too, e.g [[string toupper]] etc. Find out whether a character is in a given set (a character string) with
 expr {[string first $char $set]>=0}

As Unicodes for characters fall in distinct ranges, checking whether a character's code lies withing a range allows more or less rough classification of its category:
 proc inRange {from to char} {
     # generic range checker
     set int [scan $char %c]
     expr {$int>=$from && $int <= $to}
 }
 interp alias {} isGreek {}    inRange 0x0386 0x03D6
 interp alias {} isCyrillic {} inRange 0x0400 0x04F9
 interp alias {} isHangul {}   inRange 0xAC00 0xD7A3

See also [Unicoded integer sets] - [Characters, glyphs, code-points, and byte-sequences] 
- [Non-ASCII characters]


----
[Category Concept] | [Arts and crafts of Tcl-Tk programming]