Version 0 of characters

Updated 2002-11-26 19:12:45

Characters are abstractions of writing elements (e.g. letters, digits, punctuation characters, Chinese ideographs, ligatures...). In Tcl since 8.1, characters are internally represented with Unicode (see Unicode and UTF-8), which can be seen as unsigned integers between 0 and 65535 (recent Unicode versions have even crossed that boundary, but the Tcl implementation currently uses a maximum of 16 bits). Convert between numeric Unicode and characters with

 set char [format %c $int]
 set int  [scan $char %c]

Watch out that int values above 65535 produce 'decreasing' characters again, while negative int even produces two bogus characters. format does not warn, so better test before calling it.

Sequences of characters are called strings. Characters are no separate data type in Tcl, but represented as strings of length one (everything is a string). Represented as UTF-8, a character can be one to three bytes long in memory or file. Find out the bytelength of a character with

 string bytelength $c ;# assuming [string length $c]==1

See also Unicoded integer sets.


Category Concept | Arts and crafts of Tcl-Tk programming