This list of regular expression examples serves several purposes.

  • Library of useful expressions which you can include in your code
  • Examples of well written expressions

For advanced examples, see Advanced Regular Expression Examples You can also find some regular expressions on Regular Expressions and Bag of algorithms pages.

Here's a good source-book for useful regular expressions for programmers:

Simple examples demonstrating the [ regexp ] command

The regexp command has syntax:

    regexp ?switches? exp string ?matchVar? ?subMatchVar subMatchVar ...?

If matchVar is specified, its value will be only the part of the string that was matched by the exp. As an example:

    regexp {c.*g} "abcdefghi" matched
    puts $matched       ;# ==> cdefg

If any subMatchVars are specified, their values will be the part of the string that were matched by parenthesized bits in the exp, counting open parentheses from left to right. For example:

    regexp {c((.*)g)(.*)} "abcdefghi" matched sub1 sub2 sub3
    puts $matched       ;# ==> cdefghi
    puts $sub1          ;# ==> defg
    puts $sub2          ;# ==> def
    puts $sub3          ;# ==> hi

Many times, people only care about the subMatchVars and want to ignore matchVar. They use a "dummy" variable as a placeholder in the command for the matchVar. You will often see things like

    regexp $exp $string -> sub1 sub2

where ${->} holds the matched part. It is a sneaky but legal Tcl variable name.

Splitting a String Into Words

"How do I split an arbitrary string into words?" is a frequently asked question. If you use [split $string " "], then multiple spaces will produce a list with empty elements. If you try to use [foreach] or [lindex] or some other list operation, then you must be sure that the string is a well-formed list. (Braces could cause problems.) So use a regular expression like this very simple shorthand for non-space characters.


You can even split a string of text with arbitrary spaces and special characters into a list of words by using [regexp]s -inline and -all switches.

  set text "Some arbitrary text which might include \$ or {"
  set wordList [regexp -inline -all -- {\S+} $text]

Floating Point Number

This expression includes options for leading +/- character, digits, decimal points, and a trailing exponent. Note the use of nearly duplicate expressions joined with the or operator "|" to permit the decimal point to lead or follow digits. This was posted to comp.lang.tcl by Roland B. Roberts.



Thanks to Brent Welch for these examples, showing the difference between a traditional character matching and "the Unicode way."

 {^[A-Za-z]+$}   Only letters.

  {^[[:alpha:]]+$} Only letters, the Unicode way.

Special Characters

Thanks again to Brent Welch for these two examples.

 {[][${}\\]} The set of Tcl special characters: ] [ $ { } \

 {[][$^?+*()|\\]} The set of regular expression special characters: ] [ $ ^ ? + * ( ) | \

I don't understand these examples. Why have [, ], and then the rest of the characters inside a [] - that just makes the string have [ and ] there twice, right?

LV: the first regular expression should be seen like this:

 { ... } - protect the 9 inner characters
  [ ... ] - these two define a set of characters to process
   ] - if your set of characters is going to include the right bracket character (]) as a specific matching character, then it needs to be first in the set/class definition.
   [${} - these are more individual characters
   \\ - this is doubled because when regexp goes to evaluate the characters, it would otherwise treat a single backslash (\) as a request to quote the next character, the ending right bracket of the set/class.

The second regular expresesion is interpreted in a similar fashion. There are more characters because there are more metacharacters.

Also, not all characters are there - where are the period, equals, bang (exclamation sign), dash, colon, alphas that are a part of character entry escapes or classes, 0, hash/pound sign, and angle brackets (< and >)? These special characters all have meta meanings within regular expressions...

LV Apparently no one has come along and updated the above expression to cover these.

Example posted by KC:

     {[\<\>]} - defines a set containing both angle brackets

IP Numbers

You can create a regular expression to check an IP address for correct syntax. Note that this regular expression only checks for groups of 1-3 digits separated by periods. If you want to ensure that the digit groups are from 0-255, or that you have a valid IP address, you'll have to do additional (non regexp) work. This code posted to comp.lang.tcl by George Peter Staplin

  set str

  regexp "(\[0-9]{1,3})\.(\[0-9]{1,3})\.(\[0-9]{1,3})\.(\[0-9]{1,3})" $str all first second third fourth

  puts "$all \n $first \n $second \n $third \n $fourth \n"

The above regular expression matches any string where there are four groups of 1-3 digits separated by periods. Since it's not anchored to the start and end of the string (with ^ and $) it will match any string that contains four groups of 1-3 digits separated by periods, such as: "".

If you don't mind a longer regexp, there is no reason you can't ensure that each group of 1-3 digits is in the range of 0-255. For example (broken up a bit to make it more readable):

    set octet {(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])}
    set RE "^[join [list $octet $octet $octet $octet] {\.}]\$"
    regexp $RE $str all first second third fourth ;# Michael A. Cleverly

recently on comp.lang.tcl, someone mentioned that talks about matching IP addresses.

Domain names First shot


This code does NOT attempt, obviously, to ensure that the last level of the regular expression matches a known domain...

E-mail addresses: No warranty, just a first shot:

 {^[A-Za-z0-9._-]+@[[A-Za-z0-9.-]+$} ;# RS

Understand that this expression is an attempt to see if a string has a format that is compatible with normal RFC SMTP email address formats. It does not attempt to see whether the email address is correct. Also, it does not account for comments embedded within email addresses, which are defined even though seldom used.

XML-like data

To match something similar to XML-tags you can use regular-expressions, too. Let's assume we have this text:

  % set text {<bo>s</bo><it><bo>M</bo></it>}

We can match the body of bo with this regexp:

  % regexp "<(bo)>(.*?)</bo>" $text dummy tag body

Now we extend our XML-text with some attributes for the tags, say:

  % set text2 {<bo h="m">s</bo><it><bo>M</bo></it>}

If we try to match this with:

  % regexp "<(bo)\\s+(.+?)>(.*?)</bo>" $text2 dummy tag attributes body

it won't work anymore. This is because \\s+ is greedy (in contrary to the non-greedy (.+?) and (.*?)) and that (the one greedy-operator) makes the whole expression greedy. (Refer to this [L1 ] Sept. 1999 posting from Henry Spenter to c.l.t.)

The correct way is:

  % regexp "<(bo)\\s+?(.+?)>(.*?)</bo>" $text2 dummy tag attributes body

Now we can write a more general XML-to-whatever-translater like this:

  1. Substitute [ and ] with their corresponding and to avoid confusion with "subst" in 3.
  2. Substitute the tags and attributes with commands
  3. Do a "subst" on the whole text, thereby calling the inserted commands
  proc xml2whatever {text userCallback} {
    set text [string map {[ [ ] ]} $text]
    # replace all tags with a call to userCallback
    # this has to be done multiple times, because of nested tags
    # match each tag (everything not space after <)
    # and all the attributes (everything behind the tag until >)
    # than match body and the end-tag (which should be the same as the
    # first matched one (\1))
    while {[regsub -all {<(\S+?)(.*?)>(.*?)</\1>} $text "\[[list $userCallback \\1 \\2 \\3]\]" text]} {
      # do nothing
    return [subst -novariables -nobackslashes $text]

  # is called from xml2whatever with
  # element: the xml-element
  # attributes: the attributes of xml-element
  # body: body of xml-element
  proc myTranslate {element attributes body} {
    # map bo - b; it - i (leave rest alone)
    # do a subst for the body, because of possible nested tags
    switch -- $element {
      bo { return "<b>[subst -novariables -nobackslashes $body]</b>"}
      it { return "<i>[subst -novariables -nobackslashes $body]</i>"}
      default { return "<$element$attributes>[subst -novariables -nobackslashes $body]</$element>" }

Call the parser with:

  % xml2whatever $text2 myTranslate

You have to be careful, though. Don't do this for large texts or texts with many nested xml-tags because the regular-expression-machine is not the the right tool to parse large,nested files efficiently. (Stefan Vogel)

DKF - I agree with that last point. If you are really dealing with XML, it is better to use a proper tool like TclDOM or tDOM.

Negated string:

Bruce Hartweg wrote in comp.lang.tcl: You can't negate a regexp, but you CAN negate a regexp that is only a simple string. Logically, it's the follow ing:

  • match any single char except first letter in the string.
  • match the first char in string if followed by any letter except the 2nd
  • match the first two if followed by any but the third, et cetera

Then the only thing more is to allow a partial match of the string at end of line. So for a regexp that matches

 any line that DOES NOT have the word ''foo'':

 set exp {^([^f]|f[^o]|fo[^o])*.{0,2}$}

The following proc will build the expression for any given string

 proc reg_negate {str} {
    set partial ""
    set branches [list]
    foreach c [split $str ""] {
        lappend branches [format {%s[^%s]} $partial $c]
        append partial $c
    set exp [format {^(%s)*.{0,%d}$} [join $branches "|"] \
        [expr [string length $str] -1]]


Donal Fellows followed up with:

That's just set me thinking; you can do this by specifying that the whole string must be either not the character of the antimatch*, or the first character of the antimatch so long as it is not followed by the rest of the antimatch. This leads to a fairly simply expressed pattern.

  set exp {^(?:[^f]|f(?!oo))*$}

In fact, this allows us to strengthen what you say above to allow the matching of any negated regexp directly so long as the first component of the antimatch is a literal, and the rest of the antimatch is expressible in an ERE lookahead constraint (which imposes a number of restrictions, but still allows for some fairly sophisticated patterns.)

* Anything's better than overloading 'string' here!

JMN 2005-12-22 Could someone please explain what is meant by a 'negated string' here? Specifically - what do the above achieve that isn't satisfied by the simpler:

 set exp {^(?!(.*foo.*))}

Doesn't the following snippet from the regexp manpage indicate that a regexp can be negated? where does(or did?) the 'simple string' requirement come in? - is this info no longer current?

 negative lookahead (AREs only), matches at any point where no substring matching re begins 

Lars H: It indeed seems the entire problem is rather trivial. In Tcl 7 (before AREs) one sometimes had to do funny tricks like the ones Bruce Hartweg performs above, but his use of {0,2} means he must be assuming AREs. Perhaps there was a transitory period where one was available but not the other.

Oleg 2009-12-11 If one needs to match any string but 'foo', then the following will do the work:

set exp {^((?!foo).*)|^(foo.+)}

And in general case when one needs to match any string that is neither 'foo' nor 'bar', then the following will do the work:

set exp {^((?!(foo|bar)).*)|^((foo|bar).+)}

VisualRegExp is a good way to learn about REs.

Visual Regexp is a terrific way to learn about REs.

Redet is another tool for learning about and working with REs.

Turn a string into %hex-escaped (url encoded) characters:

 e.g. Csan -> %43%73%61%6E

 regsub -all -- {(.)} $string {%[format "%02lX" [scan \1 "%c"]]} new_string
 subst $new_string

This demonstrates the power of using regsub together with subst, which is regarded as one of the most powerful ways to use regular expressions in Tcl.

Turn a string into %hex-escaped (url encoded) characters (part 2)

This one makes the result more readable and still quite safe to use in URLs e.g. -> http%3A%2F%2Fwiki%2Etcl%2Etk

 regsub -all -- {([^A-Za-z0-9_-])} $string {%[format "%02lX" [scan \1 "%c"]]} new_string
 subst $new_string


Joe Mistachkin

The inverse of the above (not optimized):

 regsub -all -- {%([0123456789ABCDEF][0123456789ABCDEF])} $string {[format "%c" 0x\1]} new_string
 subst $new_string

Caveats about using [regsub] with [subst]

glennj 20081216: It can be dangerous to blindly apply [subst] to the results of [regsub], particularly if you have not validated the input string. Here's an example that's not too contrived:

 set string {[some malicious command]}
 regsub -all {\w+} $string {[string totitle &]} result
 subst $result

This results in "invalid command name "Some"". What if $string was {[exec format c:]}?

See DKF's "proc regsub-eval" contribution in regsub to properly prepare the input string for substitution. Paraphrased:

 set string {[some malicious command]}
 set escaped [string map {\[ \\[ \] \\] \$ \\$ \\ \\\\} $string]
 regsub -all {\w+} $escaped {[string totitle &]} result
 subst $result

which results in what you'd expect: the string "[Some Malicious Command]"

Maintain proper spacing when formatting for HTML

DG got this from Kevin Kenny on c.l.t.

 regsub -all { (?= )} $line {\&nbsp;} line

 set line {this is an    example}
 regsub -all { (?= )} $line {\&nbsp;} line
 set line
 this is an&nbsp;&nbsp;&nbsp; example

And tabs require replacement, too:

 set tabFill "[string repeat \\&nbsp\; 7] "
 regsub -all {\t} $line $tabFill line

glennj taken from comp.lang.perl.misc, transform variable names into StudlyCapsNames:

 set old_vars {VARIABLE_ONE VARIABLE_NUMBER_TWO a_really_long_VARIABLE_name}
 set NewVars {}
 foreach v $old_vars {
    regsub -all {_?(.)([^_]*)} $v {[string toupper "\1"][string tolower "\2"]} new
    lappend NewVars [subst $new]

When using [ASED]'s syntax checker you get an error of you don't insert " -- " after "regexp". Instead of "regexp {([^A-Za-z0-9_-])} $string" you have to write "regexp -- {([^A-Za-z0-9_-])} $string"

LV A user recently asked:

I have a string that I'm trying to parse. Why doesn't this seem to work?

 % set str {Acc No: 12345}
 % set num [regexp {.*?(\d+).*} $str junk result]
 % puts $result

It looks to me like the *? causes the subsequent \d+ to also be greedy and only match the first hit. Did I figure that out correctly? I presume that we currently don't have a way to turn off the greediness item?

Of course, in this simplified problem, one could just drop the greediness and code

 % set num [regexp {(\d+)} $str junk result]
 % puts $result

I'll let the user decide if that suffices.

How do you select from two words?

 % set word "foo"
 % set result [regexp {(foo|bar)} match zzz]
 % set zzz
 can't read "zzz": no such variable

LES: You got the regexp syntax wrong and tried to match the regular expression with the string "match". There is no "zzz" variable (the actual match variable in your code) because your regular expression does not match the string "match". Try this:

 % set word "foo"
 % set result [regexp {(foo|bar)} $word match zzz]
 % set match

Note that I could have dropped the "zzz" variable, but left it there as a second match variable, as an exercise to you. You should understand why and what it does if you read the regexp page and assimilate the syntax.

RUJ: Could you match the following pattern of following string: infinite spaces at start and end.

 % set str "  sjkhf sdhj   "

LV try

   set rest [regexp {^ +.* +$} $str match]
   puts $rest

which should have a value of 1 (in other words, it matched). Of course, if those leading and trailing spaces are optional, then change the + to a *.

See also re_syntax, URI detector for arbitrary text as a regular expression.

Arts and crafts of Tcl-Tk programming - Regular Expressions - Regular Expression Debugging Tips

