Version 20 of string

Updated 2003-04-24 12:33:34

http://purl.org/tcl/home/man/tcl8.4/TclCmd/string.htm

   string compare [options] string1 string2

Returns -1, 0, or 1, depending on whether string1 is lexicographically less than, equal to, or greater than string2. Options are -nocase or -length integer. (From the Tcl/Tk Reference Guide)

"Lexicographically less, equal, or greater" means in terms of ASCII values. The less the ASCII value, the less the character is lexicographically.

Basically it means: . < 0 < 1 < 2 < ... < 9 < A < B < ... < Z < a < b < ... < z

The command compares the strings character by character beginning at the left side (i.e. the first character) until it either finds a difference or the end of the shortest string.

For example:

   string compare "abc" "xyz"

would return a -1, because "a" is less than "z" in ASCII.

   string compare "abcd" "abc"

would return a 1 - the first string is 'greater' than the second because it is longer.

What about

   string compare "3" "10"

While one might be tempted to say "well, 3 is less than 10, so it will return a -1", the true answer is that the above returns a "1" - because lexicographically, 3 comes after 1. This is a reason NOT to use string compare if the arguments may both be numeric and you want to know when one is truly less than the other.

The -nocase option ignores the case of the two strings as they are compared.

The -length NUMBER compares only the first NUMBER of characters. Note that this value is one based.


In April, 2003, Jeff Hobbs says:

... string compare should be used sparingly when string equal or string length is an alternative. string equal can do an extra quick check on whether the strings are of equal size first (since we know their size) before any char-by-char comparison must be done.

For general benchmarks see http://wiki.tcl.tk/1611 , and look at the STR ones for string comparisons.


   string index string charIndex

Returns for index an integer, the charIndex'th character in string. If index is end, the last character in string, and if index is end-integer, the last character minus the specified offset. (From the Tcl/Tk Reference Guide)

For example,

   string index "abcde" "3"

returns "d" . Notice that the index is zero based - the first character is index 0.

   string index "abcde" "10"

returns "" . There is no 10th character. No error is returned here.


   string bytelength string

Example:

   string bytelength "abc"

would return 3.

The reason this command is important is because the value returned is the number of memory bytes used by string. Since internally Tcl uses UTF-8 for internal representation, Unicode characters can take from one to three characters. If your code really cares about the memory size, then this is the function you need.

Note that (perhaps confusingly) [string bytelength] should not be used with binary data. This command measures how long the UTF-8 representation of a string is in bytes. For binary data you don't want conversion to UTF-8, so you don't want [string bytelength] either. Use [string length] instead.

US: Proof for the sceptical:

 for {set n 0} {$n < 256} {incr n} {
   lappend cl $n
 }
 set str [binary format c* $cl]
 puts "len: [string length $str]"
 puts "blen: [string bytelength $str]"

   string equal ?-nocase? ?-length int? string1 string2

This command may seem similar to string compare. The difference is that this is only an equal (return value of 1) or not equal (return value of 0) type comparison.

   string equal -nocase -length 3 "abcde" "abcdefg"

returns 1


   string first string1 string2 ?startIndex?

Example:

   string first a 0a23456789abcdef

returns 1 (notice this is again a zero based value).

   string first "a" "0a23456789abcdef" 5

returns 10.

This example uses the startIndex to indicate that the searching should begin at the fifth character (a 4 in the example). However the result is still a zero based index from the beginning of the string!

    string first a 0123456789abcdef 11

returns a -1, the standard return if the string being sought is not found. An error is generated if the index is not an integer or end?-integer?.


   string index string charIndex

   string is ''class'' ?-strict? ?-failindex varname? string

where class can be:

  • alnum
  • alpha
  • ascii
  • boolean
  • control
  • digit
  • double
  • false
  • graph
  • integer
  • lower
  • print
  • punct
  • space
  • true
  • upper
  • wordchar
  • xdigit

   string last string1 string2 ?startIndex?

   string length string

At times, people ask which is better for determining whether a string is empty (null):

 [string equal x x$str]
 [string equal "" $str]
 ![string compare "" $b]
 [string length $str] == 0
 $str eq ""

The string length or string equal will be a bit quicker, as they will look to see if the strings are of equal size first.


   string map ?-nocase? charMap string

Here's an example from jenglish that lvirden thought was pretty neat:

        string map -nocase {
            "&lt;"      "<"
            "&gt;"      ">"
            "&le;"      "<="
            "&ge;"      ">="
        } $whatever


   string match ?-nocase? pattern string

   string range string first last

   string repeat string count

   string replace string first last ?newstring?

   string tolower string ?first? ?last?

   string toupper string ?first? ?last?

   string totitle string ?first? ?last?

   string trim string ?chars?
   string trimleft string ?chars?
   string trimright string ?chars?

This command can be used many ways. One possible use is to strip pathnames and extensions off of filenames. For stripping off extensions from filenames, use [file rootname] instead of [string trimright]! Supposing you have a directory of .log files and you want to return the filename with out its' extension, try:

   foreach lfn [glob -directory ./Logs -nocomplain -- *.log] {
      lappend fn [string trimright [string trimleft $lfn ./Logs/] .log]
   }

[GG: This only works if you can be sure that the basename of your logfile doesn't end in any combination of l, o and g. for example:

   % set str "smeagol.log"
   % set newstr [string trimright $str ".log"]
   % set newstr
   smea

I like stripping leading and trailing whitespace and field delimiters with 'string trim' commands, but not filename manipulation. ]

The trimleft will remove the pathname from the beginning of the string and trimright will remove the extension. Remember that this command will not save to a variable, therefore you must set the same or another variable:

  % set foo ../returned../
  ../returned../
  % string trim $foo ../
  returned
  % set foo
  ../returned../
  % set foo [string trim $foo ../]
  returned
  % set foo
  returned

   string wordstart string charIndex
   string wordend string charIndex

      lappend fn [string trimright [string trimleft $lfn ./Logs/] .log]

is not a good idea. It will strip l, o and g from the end of the file name, similarly the front. the string trim_x commands remove any character from the set given as the last argument.

Ian Gay


See also string forward compatibility - Additional string functions - string compare ... - string map


[ Tcl syntax help | Arts and crafts of Tcl-tk programming | Category Command (of Tcl) | Category Data Structure | ]