'''regsub''', a [Tcl Commands%|%built-in Tcl command], performs substitutions based on [regular expression] pattern matching. ** See Also ** [regular expressions]: information about Tcl regular expressions that is not unique to any particular command [string map]: [subst]: ** Synopsis ** : '''regsub''' ?''switches''? ''exp string subSpec'' ?''varName''? ** Documentation ** [http://www.tcl.tk/man/tcl/TclCmd/regsub.htm%|%official reference]: ** Description ** regsub matches ''string'' against the regular expression ''exp'', performing substitutions according to ''subSpec'', and either returns the resulting string. If ''varName'' is given, the resulting string is stored in that variable and the number of substitutions is returned instead. The substitution process can be modified through the use of ''switches'', these being the supported ones: : '''-all''' : '''-expanded''' : '''-line''' : '''-linestop''' : '''-lineanchor''' : '''-nocase''' : '''-start''' ''index'' : '''--''' These are all similar to those for `[regexp]`, except for '''-all''' which causes '''`regsub`''' to perform the replacement in as many non-overlapping places as possible. ** Basic Examples ** One example of using regsub from [Brent Welch]'s [BOOK Practical Programming in Tcl and Tk] is: ====== regsub -- {([^\.]*)\.c} file.c {cc -c & -o \1.o} ccCmd ====== `&` is replaced by `file.c`, and `\1` is replaced by `file`. ---- Recently on the Tcler's Wiki chat room, someone had the desire to convert a string like this: ======none rand ||=> this is some text <=|| rand ====== to ======none rand ||=> some other text <=|| rand ====== ====== set unique1 {\|\|=>} set unique2 {<=\|\|} set string {rand ||=> this is some text <=|| rand} set replacement {some other text} set new [regsub -- "($unique1) .* ($unique2)" $string "\\1$replacement\\2" string ] puts $new puts $string ====== The regular expression metacharacters in `$unique1` and `$unique` need to be quoted so they are not treated as metacharacters. NOTE: assuming the above example is for some type of template system, remember that the expression is greedy and will not do what you expect for multiple instances of unique1 and unique2 For example: ====== left ||=> this is some text <=|| middle ||=> and some more text <=|| right ====== Will be converted to: ====== left ||=> some other text <=|| right ====== Everything between the first instance of unique1 and the last instance of unique2 will be thrown away. ---- [AM] 2003-10-07: I asked about a complicated substitution in the chatroom: Here is the question: I have a fixed substring that delimits a variable number of characters. Anything in between (including the delimiters) must be replaced by a repetition of another string. For example: ======none 1234A000aadA12234 -> 1234BXBXBXBX12234 ====== (A000aadA is 8 characters, my replacing string fits 4 times in that) arjen: I do not think I can use some clever regexp to do this ... (note: things will always fit) arjen: The regexp to identify the substring could be: `{A[[^A]*A}` arjen: But now to get the replacing string ... CoderX2 easy... one sec CoderX2 ====== set string 1234A000aadA12234 set substring BX regsub -all {(A[^A]*A)} $string {[string repeat $substring [expr {[string length "\1"] / [string length $substring]}]]} new_string set new_string [subst $new_string] ====== (conversation edited to highlight this wonderful gem!) ** `-all` Caveats ** [AMG]: The `-all` option interacts strangely with the `*` quantifier when no `^` anchor is used. Take this example: ====== regsub -all .* foo <&> ====== This returns `<>`. First `.*` matches all the text (`foo` in this case), then it matches the empty string at the end of the text. Adding `$` to the end of the pattern doesn't help. To fix, either lose the `-all` option (it's not desirable in this case), or add the `^` anchor to the beginning (prevents matching anything at end of string). ** TODO: `-eval` ** Has an `-eval` switch to regsub ever been suggested? It would apply in the above example, and some other common idioms, e.g., url-decoding: ====== regsub -all -eval {%([:xdigit:][:xdigit:]} $str {binary format H2 \1} str ====== The idea is that the replacement string gets eval-ed after expanding the \1 instead of just substituted in. To safely do this otherwise needs an extra call to regsub before (to protect existing [[]]s) and a call to subst afterwards to do the evaluation. -JR [DKF]: Yes, and I mean to do something about it sometime (too many things to do, too little time). Meantime, try this: ====== proc regsub-eval {re string cmd} { subst [regsub $re [string map {[ \\[ ] \\] $ \\$ \\ \\\\} $string] \[$cmd\]] } regsub-eval {%([:xdigit:][:xdigit:]} $str {binary format H2 \1} ====== [JH]: Once upon a time I coded up `regsub -eval` in full in C (still have the patch around somewhere). I decided to not push it forward since it was actually slower than the full subst work-around. I believe this was due to the overhead of many small Tcl_Eval calls versus a one-time subst-pass that could be more effective. There are some newer Tcl_Eval* APIs to try and we should resuscitate this one. [DKF] (2017-02-18): I'm now working on this for TIP #463. It turns out that `[Tcl_EvalObjv]` is fast enough (and the cost of [subst] is pretty high from 8.6 onwards). A full [string map] is cheaper, but computing the map itself could be really quite expensive. ** Preparing a string for literal matching ** [AMG]: If you're building up a regular expression that's supposed to match various fixed strings, you must backslash-quote any special characters in those strings. [[[string map]]] will do the job, but [[regsub]] will get you there with much less typing: ====== regsub -all {[][*+?{}()<>|.^$\\]} $string {\\&} ====== Here's the [[string map]] version: ====== string map {\\ {\\} \] {\]} {[} {\[} * {\*} + {\+} ? {\?} \{ {\{} \} {\}} ( {\(} ) {\)} < {\<} > {\>} | {\|} . {\.} ^ {\^} {$} {\$}} $string ====== ** Converting Mustache-Style Handlebars to Variables ** [Napier]: I had a situation where I needed to convert handlebars into Tcl variables within a string for substitution. This is a nice little one-liner that should achieve that goal: ====== proc handlebarsToVariables str { regsub -all {\{\{([^\}]*)\}\}} $str \$\{\\1\} } ====== ====== set str { my name is {{name}} and I like to play {{sport}}! } set name "Foo" set sport "Football" % set parsed [handlebarsToVariables $str] my name is ${name} and I like to play ${sport}! % puts [subst $parsed] my name is Foo and I like to play Football! ====== [PYK] 2016-11-06: Since this [Templates and subst%|%templating] approach adds an additional layer to translate `{{` into `$` substitution, why not just use `$` from the outset? ** Feature request: per-branch replacement specifications ** [AMG]: I wish [[regsub]] would let me specify different replacement patterns depending on which branch of the pattern matched. Ideally, the patterns and replacements would be given as an alternating list. Internally, the regular expression engine would treat each pattern argument as a branch of a combined regular expression, and whichever branch matches determines which replacement is used. Unfortunately, the existing [[regsub]] syntax is not amenable to this approach because it puts the string ''between'' the pattern and the replacement. It would be nice if the search string were the final argument, but it's not. But for a moment, pretend: ====== % regsub -all a+ A b+ B aaaacccbbddda AcccBdddA ====== Of course, that's not the only problem. Where would the compiled form of the regular expression go? There's no one [Tcl_Obj] to hold it, and combining separately compiled recognizer objects is decidedly nontrivial. So the combined regular expression must remain a single argument. The replacement patterns can still be separate, and which one is used would be determined positionally by the corresponding (top-level) branch. ====== % regsub -all {a+|b+} aaaacccbbddda A B AcccBdddA ====== Thus the number of replacement arguments can either be one (current behavior) or equal to the number of branches (proposed behavior). There's still trouble. [[regsub]] takes an optional `varName` argument. How to disambiguate? The above code executes right now, and we must not break compatibility by redefining existing behavior. If you run it, it returns `3` and sets $`B` to `AcccAdddA`. I think the best bet is to explicitly enable this alternate mode with a `-branch` option. We can't call it `-multi` since that's too easy to confuse with `-all`. In `-branch` mode, the number of replacement arguments must be equal to the number of top-level branches in the regular expression. ====== % regsub -all -branch {a+|b+} aaaacccbbddda A B AcccBdddA ====== What about capturing parentheses and back references? Well, by design neither work across branches, other than the fact that the counting spans the entire regular expression. Let's not change that. ====== % regsub -all -branch {([xy])\1|([uv])\2} xxyyxyuuvvuv {a\1} {b\2} axayxybubvuv ====== To assist in debugging, I wish we could make it an error for a replacement argument to contain a back reference to any branch other than its own. However, it's currently not an error for the replacement argument to contain nonexistent back references, so this change, while desirable, really would not be appropriate. Now, a useful example. This should replace doubled brackets with single brackets, while putting backslashes in front of single brackets. ====== regsub -all -branch {([][])\1|([][])(?!\2)} $script {\1} {\\&} ====== Oh never mind, lookahead constraints cannot contain back references. For your curiosity, here's how to get the promised behavior: ====== string map {[[ [ [ \\[ ]] ] ] \\]} $script ====== I'll have to think up a better demonstration! [DKF]: The `-command` option to `regsub` in 8.7 makes this fairly easy to do. ====== % proc addPrefix {all sub1 sub2} { if {$sub1 ne ""} {return "a$sub1"} if {$sub2 ne ""} {return "b$sub2"} error "should be unreachable" } % regsub -all -command {([xy])\1|([uv])\2} xxyyxyuuvvuv addPrefix axayxybubvuv ====== Better yet, since you have access to all of Tcl in there (current implementation restriction: can't [yield]) you should be able to do any kind of complex processing you want. It'll also be easier to debug. (I wrote it originally to make handling HTML entities easier, and the key insight was using command prefixes instead of magic variables. And yes, I've wanted something like this for about as long as I've used Tcl; the equivalent in Perl was something I missed.) <> Arts and Crafts of Tcl-Tk Programming | Command | String Processing