character

A character is a composable unit of meaning in a text. Categories of meaning include sound, concept, reference, or textual structure. Examples of characters include letters, combining marks, digits, punctuation, whitespace, Chinese ideographs, and ligatures. A character is typically composed along with other characters into a word, but a character may also be composed into a more complex character. Examples include diacritical marks added to a base character in a Latin alphabet, and Han characters which are composed of smaller units known as radicals, where each radical is itself considered a character. In Tcl version 8.1 and later a character is a Unicode character. A sequence of characters is a string, but non-character code points may also occur in a string.

See Also

Characters, glyphs, code-points, and byte-sequences
The finer points of the terminology.
Unicoded integer sets
Non-ASCII characters
Character

Description

A Unicode character in the basic multilingual plane can be seen as an unsigned integer between 0 and 65535. To convert between a character and its corresponding code:

set char [format %c $int]
set int  [scan $char %c]

Watch out that int values above 65535 produce 'decreasing' characters again, while negative int even produces two bogus characters. format does not warn, so test before calling it.

A sequences of characters is called a string. A character is not a separate data type, but represented as a string of length one (everything is a string). Represented as UTF-8, a character can take one to six bytes of storage. To determine the length of a string encoded in UTF-8:

string length [encoding convertto utf-8 $c]

String routines such as string toupper can be applied to a single character. Find out whether a character is in a given set (a character string) with

expr {[string first $char $set] >= 0}

A character can be categorized by the range of character codes it is located in:

proc inRange {from to char} {
    # generic range checker
    set int [scan $char %c]
    expr {$int >= $from && $int <= $to}
}
interp alias {} isGreek {}    inRange 0x0386 0x03D6
interp alias {} isCyrillic {} inRange 0x0400 0x04F9
interp alias {} isHangul {}   inRange 0xAC00 0xD7A3

Hexadecimal to Character

Here are some routines to convert between a character and a hexadecimal number:

# probably slower
proc fromhex hex {
    if {![string is xdigit -strict $hex]} {
        error [list {input is not a hexadecimal string}]
    }
    if {[string length $hex] > 8} {
        error [list {hexadecimal string is too long}]
    }
    expr "\"\\U$hex\""
}


proc fromhex hex {
        if {[scan $hex %llx ord] == -1} {
                error [list {input is not a hexadecimal string}]
        }
        format %c $ord
}


proc tohex char {
        scan $char %c cardinal
        format %x $cardinal
}

These implementations can be found in ycl convert char.

Page Authors

pyk
Richard Suchenwirth

Historical

The content of characters was transfered to this page on 2019 08 11.