Version 22 of string is

string is class ?-strict? ?-failindex varname? string

Returns 1 if string is a valid member of the specified character class, otherwise returns 0. If -strict is specified, then an empty string returns 0, otherwise an empty string will return 1 on any class. If -failindex is specified, then if the function returns 0, the index in the string where the class was no longer valid will be stored in the variable named varname. The varname will not be set if the function returns 1. The following character classes are recognized (the class name can be abbreviated):

alnum: Any Unicode alphabet or digit character.

alpha: Any Unicode alphabet character.

ascii: Any character with a value less than \u0080 (those that are in the 7-bit ascii range).

boolean: Any of the forms allowed to Tcl_GetBoolean.

control: Any Unicode control character.

digit: Any Unicode digit character. Note that this includes characters outside of the [0-9] range.

double: Any of the valid forms for a double in Tcl, with optional surrounding whitespace. In case of under/overflow in the value, 0 is returned and the varname will contain -1.

false: Any of the forms allowed to Tcl_GetBoolean where the value is false.

graph: Any Unicode printing character, except space.

integer: Any of the valid forms for an integer in Tcl, with optional surrounding whitespace. In case of under/overflow in the value, 0 is returned and the varname will contain -1.

lower: Any Unicode lower case alphabet character.

print: Any Unicode printing character, including space.

punct: Any Unicode punctuation character.

space: Any Unicode space character.

true: Any of the forms allowed to Tcl_GetBoolean where the value is true.

upper: Any upper case alphabet character in the Unicode character set.

wordchar: Any Unicode word character. That is any alphanumeric character, and any Unicode connector punctuation characters (e.g. underscore).

xdigit: Any hexadecimal digit character ([0-9A-Fa-f]).

In the case of boolean, true and false, if the function will return 0, then the varname will always be set to 0, due to the varied nature of a valid boolean value.

A comparison with the ctype(3) man page (e.g., [L1 ]) shows much agreement with the string is classes. A not-too-daring guess is that C has contributed also this piece of Tcl heritage.

string is digit string will return 1 if string is composed of "Any Unicode digit character." Then it goes on to say that "this includes characters outside of the [0-9] range." What other characters, aside from [0-9], are members of the digit character class?

RS As Tcl is great for introspection, a few lines of code give the answer:

 proc udigits max {
   set res {}
   for {set i 0} {$i<=$max} {incr i} {
      if [string is digit [format %c $i]] {
         append res "\\u[format %04x $i] "
      }
   }
   set res
 }
 % udigits 65535

\u0030 \u0031 \u0032 \u0033 \u0034 \u0035 \u0036 \u0037 \u0038 \u0039 \u0660 \u0661 \u0662 \u0663 \u0664 \u0665 \u0666 \u0667 \u0668 \u0669 \u06f0 \u06f1 \u06f2 \u06f3 \u06f4 \u06f5 \u06f6 \u06f7 \u06f8 \u06f9 \u0966 \u0967 \u0968 \u0969 \u096a \u096b \u096c \u096d \u096e \u096f \u09e6 \u09e7 \u09e8 \u09e9 \u09ea \u09eb \u09ec \u09ed \u09ee \u09ef \u0a66 \u0a67 \u0a68 \u0a69 \u0a6a \u0a6b \u0a6c \u0a6d \u0a6e \u0a6f \u0ae6 \u0ae7 \u0ae8 \u0ae9 \u0aea \u0aeb \u0aec \u0aed \u0aee \u0aef \u0b66 \u0b67 \u0b68 \u0b69 \u0b6a \u0b6b \u0b6c \u0b6d \u0b6e \u0b6f \u0be7 \u0be8 \u0be9 \u0bea \u0beb \u0bec \u0bed \u0bee \u0bef \u0c66 \u0c67 \u0c68 \u0c69 \u0c6a \u0c6b \u0c6c \u0c6d \u0c6e \u0c6f \u0ce6 \u0ce7 \u0ce8 \u0ce9 \u0cea \u0ceb \u0cec \u0ced \u0cee \u0cef \u0d66 \u0d67 \u0d68 \u0d69 \u0d6a \u0d6b \u0d6c \u0d6d \u0d6e \u0d6f \u0e50 \u0e51 \u0e52 \u0e53 \u0e54 \u0e55 \u0e56 \u0e57 \u0e58 \u0e59 \u0ed0 \u0ed1 \u0ed2 \u0ed3 \u0ed4 \u0ed5 \u0ed6 \u0ed7 \u0ed8 \u0ed9 \u0f20 \u0f21 \u0f22 \u0f23 \u0f24 \u0f25 \u0f26 \u0f27 \u0f28 \u0f29 \u1040 \u1041 \u1042 \u1043 \u1044 \u1045 \u1046 \u1047 \u1048 \u1049 \u1369 \u136a \u136b \u136c \u136d \u136e \u136f \u1370 \u1371 \u17e0 \u17e1 \u17e2 \u17e3 \u17e4 \u17e5 \u17e6 \u17e7 \u17e8 \u17e9 \u1810 \u1811 \u1812 \u1813 \u1814 \u1815 \u1816 \u1817 \u1818 \u1819 \uff10 \uff11 \uff12 \uff13 \uff14 \uff15 \uff16 \uff17 \uff18 \uff19

Here they are "literally":

0 1 2 3 4 5 6 7 8 9 ٠ ١ ٢ ٣ ٤ ٥ ٦ ٧ ٨ ٩ ۰ ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹ ० १ २ ३ ४ ५ ६ ७ ८ ९ ০ ১ ২ ৩ ৪ ৫ ৬ ৭ ৮ ৯ ੦ ੧ ੨ ੩ ੪ ੫ ੬ ੭ ੮ ੯ ૦ ૧ ૨ ૩ ૪ ૫ ૬ ૭ ૮ ૯ ୦ ୧ ୨ ୩ ୪ ୫ ୬ ୭ ୮ ୯ ௧ ௨ ௩ ௪ ௫ ௬ ௭ ௮ ௯ ౦ ౧ ౨ ౩ ౪ ౫ ౬ ౭ ౮ ౯ ೦ ೧ ೨ ೩ ೪ ೫ ೬ ೭ ೮ ೯ ൦ ൧ ൨ ൩ ൪ ൫ ൬ ൭ ൮ ൯ ๐ ๑ ๒ ๓ ๔ ๕ ๖ ๗ ๘ ๙ ໐ ໑ ໒ ໓ ໔ ໕ ໖ ໗ ໘ ໙ ༠ ༡ ༢ ༣ ༤ ༥ ༦ ༧ ༨ ༩ ၀ ၁ ၂ ၃ ၄ ၅ ၆ ၇ ၈ ၉ ፩ ፪ ፫ ፬ ፭ ፮ ፯ ፰ ፱ ០ ១ ២ ៣ ៤ ៥ ៦ ៧ ៨ ៩ ᠐ ᠑ ᠒ ᠓ ᠔ ᠕ ᠖ ᠗ ᠘ ᠙ ０１２３４５６７８９

For instance, \u0660-\u0669 are "Indo-Arabic" digits as used in Arab countries. \uFF10-\uFF19 are "fullwidth" variants of 0-9, etc.

Lars H: Quite a lot, as you can see; there are plenty of digit sets in Unicode. The authorative source on the subject should be the Unicode Character Database (see [L2 ] for format and links), but whether Tcl really uses that is another matter. A comparison of the above with the UCD shows that [string is digit] returns 1 for most (but not quite all!) of the characters from the class Nd (decimal digits), so that is probably what it is supposed to test.

escargo 14 Mar 2006 - I had my own peek starting at http://www.unicode.org/charts/symbols.html where there are links to four PDF files under Numbers and Digits: ASCII Digits, Fullwidth ASCII Digits, Number Forms, and Super and Subscripts. All of these might legitimately contain digits.

NEM 14 Mar 2006: Interesting. Note though that most of these "digits" are not valid for expressions fed to expr. So, if you are using [string is digit] to validate arguments to expressions then you probably have a bug.

escargo - If there are digits that are not valid when fed to expr, is that a problem in the implementation of expr? It almost makes me think that there ought to be a string normalize that changes Unicode digits of different stripes down to the [0-9] range expected by expr.

It's like if the character looks to humans like a digit but it is not one to expr then the problem is really in expr. (The solution might not be in expr though.)

NEM My gut reaction is the same as yours: that expr should be enhanced to recognize these alternative numeral characters as digits. This seems to be sensible, and would be backwards compatible as the characters are apparently not allowed at all by expr currently. The expr(n) manpage could also do with clarification about what counts as an integer or float -- at present, it just says "decimal" (or octal/hexidecimal), as far as I can see.

slebetman 15 Mar 2006: Well, if you are not using a number larger than 18446744073709551615.99, then you can use [string is double $number] as the test instead of [string is digit $number]. Currently under Microsoft Windows on 32-bit Pentium [string is double] returns 0 for 18446744073709551616 and larger (found out by trial and error). Alternatively, if you want to check for integers and you're not using a number larger than 4294967295 then you can use [string is integer $number]:

    (bin) 59 % string is digit \u0669
    1
    (bin) 60 % string is double \u0669
    0