Version 23 of regsub

Updated 2012-08-07 02:22:51 by AMG

Perform substitutions based on regular expression pattern matching.
http://www.purl.org/tcl/home/man/tcl8.5/TclCmd/regsub.htm

regsub ?switches? exp string subSpec ?varName?

This uses exp (a regular expression) to find a part of string to replace with subSpec, and either returns the resulting string or stores it in varName if that is given (in which case the number of substitutions performed is returned). The substitution process can be modified through the use of switches, these being the supported ones:

-all
-expanded
-line
-linestop
-lineanchor
-nocase
-start index
--

These are all similar to those for regexp, except for -all which causes regsub to perform the replacement in as many places as possible (given that it will only scan through the string once) rather than just once.

Examples

See also Regular Expression Examples and Advanced Regular Expression Examples

[Feel free to add below various examples, demonstrating the use of the various flags, etc.]


One example of using regsub from Brent Welch's BOOK Practical Programming in Tcl and Tk is:

 regsub -- {([^\.]*)\.c} file.c {cc -c & -o \1.o} ccCmd

The & is replaced by file.c, and the \1 is replaced by file.


Recently on the Tcler's Wiki chat room, someone had the desire to converta string like this:

 rand ||=> this is some text <=|| rand

to

 rand ||=> some other text <=|| rand
 set unique1 {\|\|=>}
 set unique2 {<=\|\|}
 set string {rand ||=> this is some text <=|| rand}
 set replacement {some other text}
 set new [regsub -- "($unique1) .* ($unique2)" $string "\\1$replacement\\2" string ]
 puts $new
 puts $string

Note that the regular expression metacharacters in unique1 and unique2 need to be quoted so they are not treated as metacharacters.

NOTE: assuming the above example is for some type of template system, remember that the expression is greedy and will not do what you expect for multiple instances of unique1 and unique2 For example:

 left ||=> this is some text <=|| middle ||=> and some more text <=|| right

Will be converted to:

 left ||=> some other text <=|| right

Everything between the first instance of unique1 and the last instance of unique2 will be thrown away.


AM (7 october 2003) I asked about a complicated substitution in the chatroom:

Here is the question:

I have a fixed substring that delimits a variable number of characters. Anything in between (including the delimiters) must be replaced by a repetition of another string. For example:

        1234A000aadA12234 --> 1234BXBXBXBX12234

(A000aadA is 8 characters, my replacing string fits 4 times in that)

arjen: I do not think I can use some clever regexp to do this ... (note: things will always fit)

arjen: The regexp to identify the substring could be: {A[^A]*A}

arjen: But now to get the replacing string ...

CoderX2 easy... one sec

CoderX2

   set string "1234A000aadA12234"
   set substring "BX"
   regsub -all {(A[^A]*A)} $string {[string repeat $substring [expr {[string length "\1"] / [string length $substring]}]]} new_string
   set new_string [subst $new_string]

(conversation edited to highlight this wonderful gem!)


Has a -eval flag to regsub ever been suggested? It would apply in the above example, and some other common idioms, e.g., url-deoding:

  regsub -all -eval {%([:xdigit:][:xdigit:]} $str {binary format H2 \1} str

The idea is that the replacement string gets eval-ed after expanding the \1 instead of just substituted in. To safely do this otherwise needs an extra call to regsub before (to protect existing []s) and a call to subst afterwards to do the evaluation.

-JR

DKF: Yes, and I mean to do something about it sometime (too many things to do, too little time). Meantime, try this:

 proc regsub-eval {re string cmd} {
    subst [regsub $re [string map {\[ \\[ \] \\] \$ \\$ \\ \\\\} $string] "\[$cmd\]"]
 }
 regsub-eval {%([:xdigit:][:xdigit:]} $str {binary format H2 \1}

JH: Once upon a time I coded up regsub -eval in full in C (still have the patch around somewhere). I decided to not push it forward since it was actually slower than the full subst work-around. I believe this was due to the overhead of many small Tcl_Eval calls versus a one-time subst-pass that could be more effective. There are some newer Tcl_Eval* APIs to try and we should resuscitate this one.


elfring 2003-10-29 TCL variables can be marked that an instance contains a compiled regular expression. REs can be pre-compiled by the call "regexp $RE {}" [L1 ].

DKF: Technically, the compiled RE is cached in the internal representation of the RE value and not the variable. The effect is pretty much indistinguishable though (in all sane programs).


AMG: The -all option interacts strangely with the * quantifier when no ^ anchor is used. Take this example:

regsub -all .* foo <&>

This returns <foo><>. First .* matches all the text ("foo" in this case), then it matches the empty string at the end of the text. Adding $ to the end of the pattern doesn't help. To fix, either lose the -all option (it's not desirable in this case), or add the ^ anchor to the beginning (prevents matching anything at end of string).