[Keith Vetter] 2017-08-10 : On the [string map] page, there was a
request for a version of string map that takes regular expressions
instead of fixed strings. There's a quick and dirty solution on that page but it doesn't handle the tricky
case of double substitutions, e.g. string map {a b b c} "a" should return just "b".
Since the request was from over 11 years ago, it's obviously not a high demand
item. But I thought it would be a fun intellectual challenge. This isn't the most efficient
method--it does too many regex searches and replacements, but the more efficient
solution gets too complicated when you start allowing references.
You can even add it into the [string] ensemble via
======
namespace ensemble configure string -map [dict merge [namespace ensemble configure string -map] {mapx string_mapx}]
======
----
======
##+##########################################################################
#
# string mapx -- like string map but with regular expressions
# see also http://wiki.tcl.tk/2819
# by Keith Vetter 2017-08-07
#
proc string_mapx {mapping str} {
if {[llength $mapping] % 2 != 0} {
return -code error "regex map list unbalanced"
}
# Difficulty is we must avoid double substition, meaning we cannot
# keep matching in replaced text. e.g. string mapx {A. BB B. CC} AA
# should return "BB" not "CC". To do so, we maintain a list of
# <unmatched text>, <matched text> pairs and only do matching on the
# first part. When done we join all the pieces.
set pieces [list $str ""]
foreach {regex replace_with} $mapping {
set new_pieces {}
foreach {unmatched already_matched} $pieces {
set mapping [_string_mapx_one $unmatched $regex $replace_with]
lset mapping end $already_matched
lappend new_pieces {*}$mapping
}
set pieces $new_pieces
}
set result [join $pieces ""]
return $result
}
proc _string_mapx_one {str regex replace_with} {
# Replaces all occurrences of regex in str with replace_with
# The result is a list of prefix and replace_with pairs
# e.g. _string_mapx_one "xAByACz" {A.} "M" => {x M y M z ""}
set result {}
set start 0
while {1} {
set n [regexp -start $start -indices $regex $str index]
if {! $n} break
lassign $index lo hi if {$hi < $lo} {
error "regex $regex matches the empty string"
}
set prefix [string range $str $start $lo-1]
set match [string range $str $lo $hi]
lappend result $prefix [regsub $regex $match $replace_with]
set start [expr {$hi + 1}]
}
lappend result [string range $str $start end] ""
return $result
}
################################################################
#
# testing code
#
proc test {expected mapping str} {
set got [string_mapx $mapping $str]
puts "string_mapx $mapping '$str'"
if {$got ne $expected} {
puts "BAD: expected: '$expected' got: '$got'"
}
}
test xxMyyMzz {A. M} xxAByyACzz
test xxyyzz {A. ""} xxAByyABzz
test xNxMyNyMzz {A. M CC N} xCCxAByCCyABzz
test bca {a b b c c a} abc
test "NOW IS THE TIME" {{n[^ ]*} NOW {now is} BAD {i.*the} "IS THE" {t.*time} BAD time TIME} "now is the time"
test "dbca dbca" {{(a)(bc)(d)} {\3\2\1}} {abcd abcd}
test "dbcadbca" {{(.)(..)(.)} {\3\2\1}} {abcdabcd}
test "dbca dbca" {{(a)(..)(.)} {\3\2\1}} {abcd abcd}
return
======
[RKzn] 2020-02-29: I think this has some bug. While running
======
test "xxAxx" {{A*} {A}} "xxAAAAxx"
======
On windows I got a "out of memory error", on linux (at a different machine) it ran for a long time (minutes) until I killed it.
Maybe some infinite (recursive)loop...?
I admit I haven't tried to read the code.
Other than that, the few times I used this it worked as expected.
[KPV] Fixed. What was wrong is that the regex '''A*''' can match the null string. This means that the ''regexp'' call inside the loop always succeeds, matching the null string at the start. But since the ending index is -1, the start index never progresses, and we get the infinite loop.
It's unclear what the proper return value should be. You could argue that is should be an infinite string a 'A's. But I think the best response is to throw an error.
If you change the test to '''test "xxAxx" {{A+} {A}} xxAAAAxx''', then everything works find.
<<categories>> String Processsing