Recently a discussion arose concerning a way of determining how many occurances of a character appeared in a string. Several methods were proposed:
#! /usr/tcl84/bin/tclsh set mystring "line1\nline2\nline3\nline4\n" proc splitline {string} { set rc [llength [split $string "\n"]] incr rc -1 return $rc } proc regline {string} { set rc [regexp -line -all "\n" $string] return $rc } proc splitchar {string} { foreach x [split $string ""] {if {[catch {incr counters($x)}]} {set counters($x) 1}} set rc $counters(\n) return $rc } puts [time {splitline $mystring} 10000] puts [time {regline $mystring} 10000] puts [time {splitchar $mystring} 10000]
On my SPARC Solaris, using Tcl 8.4, I get
49 microseconds per iteration 154 microseconds per iteration 963 microseconds per iteration
as the result. - RS: Surprising, but good to know, that split is by far the fastest way...
Brett Schwarz added this proc as well:
proc strmap {str} { return [expr {[string length $str]-[string length [string map {"\n" ""} $str]]}] };
on my SuSE Linux 7.3 machine (2X333MHz PII) with Tcl 8.4, I get these results:
34 microseconds per iteration 87 microseconds per iteration 782 microseconds per iteration 34 microseconds per iteration
So, it gives the same performance of the split version. However, if the string length is increased, to lets say 100 lineX's (i.e. 100 \n), then this proc using string map actually scales better. Here are the numbers with 100:
263 microseconds per iteration 1729 microseconds per iteration 2573 microseconds per iteration 141 microseconds per iteration
There is quite an improvement....
However, there seems to be a difference in interpreters used. Here is the same timing with 8.3.4.4:
288 microseconds per iteration 1775 microseconds per iteration 8514 microseconds per iteration 578 microseconds per iteration
Appears string map got byte compiled in 8.4 (according to Miguel Sofer)
RS contributes maybe not the fastest, but for now the shortest variation:
interp alias {} countCharsA {} regexp -all
to be called with the wanted character and the string as additional arguments
Brett Schwarz Some people (please correct me, if I got the wrong person) added some other procs, and here they are with some new times:
proc splitline2 {string} { return [expr {[llength [split $string {\n}]] - 1}] }
proc regsubline {string} { return [regsub -all -- {\n} $string {\n} string] }
proc splitchar2 string { set rc 0 foreach x [split $string ""] { if {$x == "\n"} { incr rc } } return $rc }
proc strfirst {string} { set rc [set ndx 0] while {[set ndx [expr { [string first \n $string $ndx] + 1}]]} { incr rc } return $rc } puts "splitline [time {splitline $mystring} 10000]" puts "splitline2 [time {splitline2 $mystring} 10000]" puts "regline [time {regline $mystring} 10000]" puts "regsubline [time {regsubline $mystring} 10000]" puts "splitchar [time {splitchar $mystring} 10000]" puts "splitchar2 [time {splitchar2 $mystring} 10000]" puts "strmap [time {strmap $mystring} 10000]" puts "countCharA [time {countCharsA "\n" $mystring} 10000]" puts "strfirst [time {strfirst $mystring} 10000]"
Tcl 8.4.0.1
splitline 265 microseconds per iteration splitline2 307 microseconds per iteration regline 1798 microseconds per iteration regsubline 1996 microseconds per iteration splitchar 2672 microseconds per iteration splitchar2 2872 microseconds per iteration strmap 141 microseconds per iteration countCharA 1781 microseconds per iteration strfirst 602 microseconds per iteration forcount 4164 microseconds per iteration
On Win2k
splitline 160 microseconds per iteration splitline2 192 microseconds per iteration regline 1310 microseconds per iteration regsubline 1410 microseconds per iteration splitchar 1461 microseconds per iteration splitchar2 1692 microseconds per iteration strmap 55 microseconds per iteration countCharA 1305 microseconds per iteration strfirst 295 microseconds per iteration forcount 1944 microseconds per iteration
Tcl 8.3.4.4
splitline 284 microseconds per iteration splitline2 324 microseconds per iteration regline 1814 microseconds per iteration regsubline 2082 microseconds per iteration splitchar 8723 microseconds per iteration splitchar2 3204 microseconds per iteration strmap 597 microseconds per iteration countCharA 1810 microseconds per iteration strfirst 4185 microseconds per iteration forcount 2027 microseconds per iteration
On Win2k
splitline 163 microseconds per iteration splitline2 190 microseconds per iteration regline 1324 microseconds per iteration regsubline 1434 microseconds per iteration splitchar 3786 microseconds per iteration splitchar2 1969 microseconds per iteration strmap 242 microseconds per iteration countCharA 1342 microseconds per iteration strfirst 3027 microseconds per iteration forcount 794 microseconds per iteration
As an aside, this illustrates the speed improvements in Tcl 8.4