Generally, character is a unit of text. Examples include letters, digits, punctuation, Chinese ideographs, and ligatures. In Tcl version [Changes in Tcl/Tk 8.1%|%8.1] and later a character is a [Unicode] character. A sequence of [characters] is a [EIAS%|%string]. ** See Also ** [Characters, glyphs, code-points, and byte-sequences]: The finer points of the terminology. [Unicoded integer sets]: [Non-ASCII characters]: [Character]: ** Description ** A [Unicode] character in the basic multilingual plane can be seen as an unsigned integer between 0 and 65535. To convert between a character and its corresponding code: ====== set char [format %c $int] set int [scan $char %c] ====== Watch out that ''int'' values above 65535 produce 'decreasing' characters again, while negative ''int'' even produces two bogus characters. `[format]` does not warn, so test before calling it. A sequences of characters is called a ''[EIAS%|%string]''. A character is not a separate data type, but represented as a string of length one ([everything is a string]). Represented as [UTF-8], a character can take one to six bytes of storage. To determine the length of a string encoded in UTF-8: ====== string length [encoding convertto utf-8 $c] ====== String routines such as `[string toupper]` can be applied to a single character. Find out whether a character is in a given set (a character string) with ====== expr {[string first $char $set] >= 0} ====== A character can be categorized by the range of character codes it is located in: ====== proc inRange {from to char} { # generic range checker set int [scan $char %c] expr {$int >= $from && $int <= $to} } interp alias {} isGreek {} inRange 0x0386 0x03D6 interp alias {} isCyrillic {} inRange 0x0400 0x04F9 interp alias {} isHangul {} inRange 0xAC00 0xD7A3 ====== ** Hexadecimal to Character ** Here are some routines to convert between a character and a hexadecimal number: ====== # probably slower proc fromhex hex { if {![string is xdigit -strict $hex]} { error [list {input is not a hexadecimal string}] } if {[string length $hex] > 8} { error [list {hexadecimal string is too long}] } expr "\"\\U$hex\"" } proc fromhex hex { if {[scan $hex %llx ord] == -1} { error [list {input is not a hexadecimal string}] } format %c $ord } proc tohex char { scan $char %c cardinal format %x $cardinal } ====== These implementations can be found in `[ycl] convert char`. ** Page Authors ** [pyk]: [Richard Suchenwirth]: ** Historical ** The content of [characters] was transfered to this page on `2019 08 11`. <> Category Concept | Arts and crafts of Tcl-Tk programming | Word and Text Processing