Version 31 of string match

Updated 2015-05-07 00:53:31 by AMG

Synopsis

string match ?-nocase? pattern string

Description

Determine whether pattern matches string, returning return 1 if it does, 0 if it doesn't. If -nocase is specified, then the pattern attempts to match against the string in a case insensitive manner.

string equal compares strings literally, but string match matches interprets a pattern expression and matches a string against that.

For the two strings to match, their contents must be identical except that the following special sequences may appear in pattern:

*
Matches any sequence of characters in string, including a null string.
?
Matches any single character in string.
[chars]
Matches any character in the set given by chars. If a sequence of the form x-y appears in chars, then any character between x and y, inclusive, will match. When used with -nocase, the end points of the range are converted to lower case first. Whereas {[A-z]} matches '_' when matching case-sensitively ('_' falls between the 'Z' and 'a'), with -nocase this is considered like {[A-Za-z]} (and probably what was meant in the first place).
\x
Matches the single character x. This provides a way of avoiding the special interpretation of the characters *?[]\ in pattern.

Beware that the parsing of strings inside grouping [] is not particularly robust -- neither the manual, the tests nor the code takes pains to specify how to interpret combinations of []*?- inside brackets. If you need a character class which includes any of these special characters, you are probably better off with a [regexp. (see also [L1 ]).

string match does not use the same code as glob

JJM - With the primary notable difference being that glob supports the notion of (optionally nested?) curly braces allowing for a logical OR-style operation in the pattern.

Example

string match *is* "this test is" ;# -> true

Layers of Quoting

to match a single left bracket, the match pattern should be a backslash followed by a left bracket so that string match sees the left bracket as a literal character. One possibility is to place the backslash and left bracket in curly quotes so that Tcl leaves them alone:

string match {\[} {[}

Alternatively, the backslash could be preceded by a backslash and the left bracket could be preceded by a backslash:

string match \\\[ \[

Pattern Ending in Backslash

A pattern ending in a backslash doesn't match a string ending in a backslash. Bug?

string match a\\ a\\
# -> 0

PL 2015-01-26: one can match explicitly against a backslash character, though:

string match {a[\]} a\\
# -> 1

MG The reason the first fails is that the backslash at the end of the pattern is eaten by the string match parser as an escape character, leaving the pattern as just 'a' and the string as 'a\'. You'd need to use

string match a\\\\ a\\
# -> 1

aspect was about to suggest "conventional escaping with backslash" but MG beat me to it!

The manual leaves this in nasal demon territory, so the simple answer is: don't use such strings for patterns. glob seems to have the same issue.

Another unspecified case:

string match {[ab} a
# -> 1
string match {[ab} b
# -> 1
string match {[ab} c
# -> 0

... I note also that a generic escaping routine for these patterns is not simply a matter for string map, as metacharacters lose their special meaning within groups (as PL's example above shows).

AMG: [string map] should be fine, or [regsub]. If you want to match the literal sequence a[\], you would prefix each character other than the first with a \backslash, giving a[\\\]. All [PL is demonstrating is that backslash is not the only possible way to quote a metacharacter.

set needle {a[\]}
# a[\]
set pattern [string map {* \\* ? \\? [ \\[ ] \\] \\ \\\\} $needle]
# a\[\\\]
string match $pattern $needle
# 1
set pattern [regsub -all {[][\\*?]} $needle {\\&}]
# a\[\\\]
string match $pattern $needle
# 1

See Also