Version 50 of string

Updated 2012-10-15 05:04:38 by RLE

What is a string?

Within Tcl, the basic philosophy is that everything is a string. That is to say, Tcl's primary fundamental data structure is an item which is dynamically allocated (size limited only by the machine's memory and any limits put onto Tcl from the operating system or parent process) and which can contain any type of byte (at least as of Tcl 8.0 [is that version right?]).

Other data structures (such as proc, dict, list, handle) are built on top of this fundamental assumption.

The Tcl string command

string - Manipulate strings

http://purl.org/tcl/home/man/tcl8.5/TclCmd/string.htm

Ensembliform command for manipulating strings.


string bytelength string

string compare ?-nocase? ?-length int? string1 string2

string equal ?-nocase? ?-length int? string1 string2

string first string1 string2 ?startIndex?

string index string charIndex

string is class ?-strict? ?-failindex varname? string

string last string1 string2 ?startIndex?

string length string

string map ?-nocase? charMap string

string match ?-nocase? pattern string

string range string first last

string repeat string count

string replace string first last ?newstring?

string reverse string

string tolower string ?first? ?last?

string totitle string ?first? ?last?

string toupper string ?first? ?last?

string trim string ?chars?

string trimleft string ?chars?

string trimright string ?chars?

string wordend string charIndex

string wordstart string charIndex


Is this correct behavior?

 % set str ""
 % string is true $str
 1
 % string is false $str
 1
 % string is integer $str
 1
 % string is alpha $str
 1

Sadly, yes, that is correct. You'll have to use the -strict option to keep empty strings from passing all tests.

This is an unfortunate legacy from the origin of the [string is] command as a tool for entry validation, where it's important the empty string pass everything so that every input doesn't fail immediately.


Using string functions for binary data

The following subcommands check for the ByteArray object type internally based on their bytecode versions (as of 8.5.0):

  • string range
  • string index
  • string match
  • string length
  • string compare (both objects must be ByteArrays)

The following subcommands force promotion to unicode strings:

  • string first
  • string last
  • string map
  • string replace
  • string reverse

MG Since Tcl 8.5, an index in the string commands can include basic math;

 string range $string $startChar+1 $endChar-1

is now equivalent to

 string range $string [expr {$startChar + 1}] [expr {$endChar - 1}]

While the first may be clearer, though, it seems to be (potentially quite a lot) slower for me, running 8.5a6:

 % set string "This is a test string"
 This is a test string
 % set startChar 3
 3
 % set endChar 12
 12

 % time {string range $string [expr {$startChar+1}] [expr {$endChar-1}]} 500000
 2.55498 microseconds per iteration
 % time {string range $string $startChar+1 $endChar-1} 500000
 5.092856 microseconds per iteration

Using expr there is quite drastically faster...


See also: