[NJG] January 15, 2005 Recently I posted the page [A regexp twist]. Not having any feedback I have no idea how much attention it has received. It has, however, suddenly occured to me that neither the title nor the style of the prose were very advertising of its essence. So here is a more direct try. [A regexp twist] provides an extensiont to the functionality of the '''regexp''' command in the form of '''regexp -inline''' ''?other options?'' '''''pattern string script''''' where ''script'' would be executed each time a pattern match occured. (Note that in current Tcl this is illegal so it represents a compatible extension.) The result of the actual match is available to ''script'' in the global match variables ''mVar0'', ''mVar1'', .... ''mVar9''. For Windows users the '''''downloadable zip file''''' contains the source and the extension dll. Those on Linux may either replace the ''.dll'' specific part of the source with whatever is needed for compiling it into an ''.so'' module or replace the function ''Tcl_RegexpObjCmd'' in file ''tclCmdMZ.c'' of their Tcl source distribution with the one in the provided source and recompile Tcl. Lacking time I cannot help those who need a Linux binary. Please note: I have no idea which is the oldest Tcl version number for which this extension works. For Tcl 8.4.x it surely does. ---- [NJG] January 23, 2005 A speed tuned version can now be downloaded from [A regexp twist]! ---- [MG] takes a quick shot at this in pure-Tcl on 8.4.9 ... proc regexpScriptPre8.5 {args} { if { [llength $args] < 3 } { error "wrong # args" } set rArgs [lrange $args 0 end-3] set cmd [lindex $args end] eval "foreach x \[regexp -inline $rArgs \[lindex \$args end-2\] \[lindex \$args end-1\]\] \{ \ uplevel 1 \[list [list $cmd]\] \[list \$x\] \ \}" } Or, in 8.5 simplified with {expand} (though untested as I don't have 8.5) proc regexpScript8.5 {args} { if { [llength $args] < 3 } { error "wrong # args" } set rArgs [lrange $args 0 end-3] set cmd [lindex $args end] foreach x [regexp -inline {expand}$rArgs [lindex $args end-2] [lindex $args end-1]] { uplevel 1 [list $cmd] [list $x] } } [Lars H]: A less [quoting hell] backport of the 8.5 version to 8.4: proc regexpScript {args} { if { [llength $args] < 3 } { error "wrong # args" } set rArgs [lrange $args 0 end-3] set cmd [lindex $args end] foreach x [eval [list regexp -inline] $rArgs [lrange $args end-2 end-1]] { uplevel 1 [list $cmd] [list $x] } } which can of course be optimised further still by concatenating the [lrange]s. But '''note''' that these do not do the same as the thing at the top of the page; here the last argument is a command prefix, but the compiled command is supposed to take an arbitrary script that accesses the match in global variables. ---- [MG] Just decided to do a quick test to see what, if anything, the speed difference was... (Desktop) 6 % time {regexpScriptPre8.5 -all . {This is a test string} bleh} 500 635 microseconds per iteration (Desktop) 7 % time {regexpScriptPre8.5 -all . {This is a test string} bleh} 5000 664 microseconds per iteration (Desktop) 8 % time {regexpScriptPre8.5 -all . {This is a test string} bleh} 50000 730 microseconds per iteration (Desktop) 9 % load xregexp.dll Extended regexp handling is in place (Desktop) 10 % time {regexp -inline -all . {This is a test string} bleh} 500 897 microseconds per iteration (Desktop) 11 % time {regexp -inline -all . {This is a test string} bleh} 5000 903 microseconds per iteration (Desktop) 12 % time {regexp -inline -all . {This is a test string} bleh} 50000 1017 microseconds per iteration As you can see, I did the tests using the pre-8.5 version (with eval), rather than the {expand} version. All the tests were done on Tcl 8.4.9, and the Tcl-only version used the "normal" regexp; ie, I only loaded [NJG]'s package after I'd tested the plain-tcl code. Suprisingly, the plain-Tcl version comes out slightly faster. (Oh, the 'bleh' script used there was just: proc bleh {x} { set ::tmp $x; return } ---- [NJG] January 19, 2005 '''MG''', I would not say that a 40-50% difference is slight, so I looked into the code again. I found that the major part of the difference must come from my code saving 10 match variables at each match while your ''regexp -inline'' creates a sublist of only as many elements as there are subexpression matches. In the actual test 9 of the saves are superfluous! It is easy to remedy this, so I will shortly post the corrected version. I stated in the original posting ([A regexp twist]) that it is the least effective way to execute the script by ''Tcl_Eval'' at each match, as it is done in the code now. However, its effect is the least pronounced when the script consists of only a single parameterless procedure call, as is the case in your test. Anyway, I did this hack as it was easy and I found the result aesthetically pleasing. Perhaps I should create the solution which is efficient as well ... Finally, your test does not take into account the time needed for fetching the match and submatch values from the list representation returned by ''regexp -inline''! (The time of at least one '''set''' '''' '''[[lindex $x''' ''''''']]''' or '''lindex $x''' '''' command). Thanks for the feedback! ---- [Category Discussion]