A character is a composable unit of meaning in a text. Categories of meaning include sound, concept, reference, or textual structure. Examples of characters include letters, combining marks, digits, punctuation, whitespace, Chinese ideographs, and ligatures. A character is typically composed along with other characters into a word, but a character may also be composed into a more complex character. Examples include diacritical marks added to a base character in a Latin alphabet, and Han characters which are composed of smaller units known as radicals, where each radical is itself considered a character. In Tcl version 8.1 and later a character is a Unicode character. A sequence of characters is a string, but non-character code points may also occur in a string.
A Unicode character in the basic multilingual plane can be seen as an unsigned integer between 0 and 65535. To convert between a character and its corresponding code:
set char [format %c $int] set int [scan $char %c]
Watch out that int values above 65535 produce 'decreasing' characters again, while negative int even produces two bogus characters. format does not warn, so test before calling it.
A sequences of characters is called a string. A character is not a separate data type, but represented as a string of length one (everything is a string). Represented as UTF-8, a character can take one to six bytes of storage. To determine the length of a string encoded in UTF-8:
string length [encoding convertto utf-8 $c]
String routines such as string toupper can be applied to a single character. Find out whether a character is in a given set (a character string) with
expr {[string first $char $set] >= 0}
A character can be categorized by the range of character codes it is located in:
proc inRange {from to char} { # generic range checker set int [scan $char %c] expr {$int >= $from && $int <= $to} } interp alias {} isGreek {} inRange 0x0386 0x03D6 interp alias {} isCyrillic {} inRange 0x0400 0x04F9 interp alias {} isHangul {} inRange 0xAC00 0xD7A3
Here are some routines to convert between a character and a hexadecimal number:
# probably slower proc fromhex hex { if {![string is xdigit -strict $hex]} { error [list {input is not a hexadecimal string}] } if {[string length $hex] > 8} { error [list {hexadecimal string is too long}] } expr "\"\\U$hex\"" } proc fromhex hex { if {[scan $hex %llx ord] == -1} { error [list {input is not a hexadecimal string}] } format %c $ord } proc tohex char { scan $char %c cardinal format %x $cardinal }
These implementations can be found in ycl convert char.
The content of characters was transfered to this page on 2019 08 11.