Arjen Markus (13 december 2005) In the chatroom yesterday we discussed (or should I say mused over) the pros and cons of using Tcl for biomathematics. One of the subjects was how to reverse a string. Several possibilities have already been published on the Wiki, but this is a comparison of the performance of several on short and long strings.
I find the results remarkable: the behaviour for short strings is completely different than for moderately long strings ... well, judge for yourself.
# Experiments: # Reversing a string - several ways # proc rev_by_string {str} { set rev "" set length [string length $str] for {set i 0} {$i < $length} {incr i} { set rev "[string index $str $i]$rev" } return $rev } proc rev_by_append {str} { set rev "" set length [string length $str] for {set i [expr {$length-1}]} {$i >= 0} {incr i -1} { append rev [string index $str $i] } return $rev } proc rev_by_list {str} { set rev "" foreach c [split $str ""] { set rev "$c$rev" } return $rev } proc rev_recursive {str} { set rev "" set length [string length $str] if { $length <= 4 } { return "[string index $str 3][string index $str 2][string index $str 1][string index $str 0]" } elseif { $length % 2 == 0 } { set half [expr {$length/2}] set halfp1 [expr {$length/2+1}] return "[rev_recursive [string range $str $halfp1 end]][rev_recursive [string range $str 0 $half]]" } else { set half [expr {$length/2}] set halfp1 [expr {$length/2+1}] return "[string index $str end][rev_recursive [string range $str $halfp1 end-1]][rev_recursive [string range $str 0 $half]]" } } # Test the procedures ... puts [rev_by_string "abcdefg"] puts [rev_by_append "abcdefg"] puts [rev_by_list "abcdefg"] puts [rev_recursive "abcdefg"] puts [rev_recursive "abcdef"] # Measure the performance foreach len {1 10 100 1000 10000} repeat {10000 10000 1000 1000 10} { set str [string repeat ABCDE $len] puts "String length: [string length $str]" puts "rev_by_string: [time {rev_by_string $str} $repeat]" puts "rev_by_append: [time {rev_by_append $str} $repeat]" puts "rev_by_list:\ [time {rev_by_list $str} $repeat]" puts "rev_recursive: [time {rev_recursive $str} $repeat]" }
Here is the result:
gfedcba gfedcba gfedcba gfedcba fedcba String length: 5 rev_by_string: 7 microseconds per iteration rev_by_append: 8 microseconds per iteration rev_by_list: 11 microseconds per iteration rev_recursive: 15 microseconds per iteration String length: 50 rev_by_string: 58 microseconds per iteration rev_by_append: 44 microseconds per iteration rev_by_list: 48 microseconds per iteration rev_recursive: 152 microseconds per iteration String length: 500 rev_by_string: 707 microseconds per iteration rev_by_append: 403 microseconds per iteration rev_by_list: 480 microseconds per iteration rev_recursive: 1561 microseconds per iteration String length: 5000 rev_by_string: 16722 microseconds per iteration rev_by_append: 3671 microseconds per iteration rev_by_list: 13910 microseconds per iteration rev_recursive: 13971 microseconds per iteration String length: 50000 rev_by_string: 1051731 microseconds per iteration rev_by_append: 37300 microseconds per iteration rev_by_list: 1032877 microseconds per iteration rev_recursive: 137735 microseconds per iteration
The solution with append is the definitive winner, but quite surprisingly the solution with recursion, which is slow for short strings, is a good second for long strings.
Lars H, 14 Dec 2005: Why on earth do you find it surprising that the recursive solution comes in second for long strings? This is precisely the result a trivial complexity analysis of the procedures would produce! Remember that string concatenation (as performed in e.g. commands like
set rev "$c$rev"
above) is linear in the sum of the sizes of the strings concatenated. This implies rev_by_string and rev_by_list both do an O(n) operation at each iteration of the loop, giving a total O(n^2) complexity. rev_by_append on the other hand takes advantage of the fact that append is on average an O(1) (constant time) operation, whence the total complexity there is a mere O(n). rev_recursive manages an O(n log n) total complexity through the same recursive halving trick as a classical mergesort -- most of its concatenations are on short strings, so even though there are the same number of concatenations as in rev_by_string, the average cost is much lower. See also quadratic time.
RS The canonical recursive solution would be
proc rev_rec2 str { expr {[string length $str]<2? $str: "[string index $str end][rev_rec2 [string range $str 0 end-1]]"} }
but that hits the recursion limit when strings get longer than 500 or so...
JMN The following two solutions show that a list-based approach is not necessarily slow as suggested by the above timings. These two appear to be ever so slightly faster than the append solution for medium & large lists.
proc rev_by_lset {str} { set len [string length $str] foreach c [set str [split $str ""]] { lset str [incr len -1] $c } return [join $str ""] } proc rev_by_lappend {str} { set len [llength [set str [split $str ""]]] while {$len} { lappend res [lindex $str [incr len -1]] } return [join $res ""] }
For a teeny extra oomph in rev_by_lappend, preallocate the res list:
proc rev_by_lappend_2 {str} { set len [llength [set str [split $str ""]]] set res [lreplace [lrepeat $len ""] 0 end] ;# == {} while {$len} { lappend res [lindex $str [incr len -1]] } return [join $res ""] }
The saving then comes from not having to periodically realloc the internal list structure as it gets lappend'd to. Whether this saving outweighs the cost of calling lreplace & lrepeat... well YMMV :-)
% proc main {} { set N 10 foreach len { 999 9999 99999 999999 } { set STR [string repeat "abcde" $len] time { rev_by_lappend $STR } 1 ;# flush the timing puts "STR len [string length $STR]" puts "1 : [time { rev_by_lappend $STR } $N]" puts "2 : [time { rev_by_lappend_2 $STR } $N]" } } % % main STR len 4995 1 : 1652.4 microseconds per iteration 2 : 1608.6 microseconds per iteration STR len 49995 1 : 17144.6 microseconds per iteration 2 : 17155.3 microseconds per iteration STR len 499995 1 : 170775.5 microseconds per iteration 2 : 167460.7 microseconds per iteration STR len 4999995 1 : 1729771.0 microseconds per iteration 2 : 1726618.5 microseconds per iteration
See also: [string reverse]
bsg This is all kind-of moot since the core does it already, but I wanted to add the obvious combination missing from above. Combine the append version with the list version:
proc rev_by_append_list {str} { set rev "" set length [string length $str] set lstr [split $str ""] for {set i [expr {$length-1}]} {$i >= 0} {incr i -1} { append rev [lindex $lstr $i] } return $rev }
This variation mixes in nicely with the others.
Tcl patchLevel: 8.5.6 String length: 5 rev_by_string_reverse: 1.8265948 microseconds per iteration rev_by_append: 3.2407781 microseconds per iteration rev_by_string: 3.1731415000000003 microseconds per iteration rev_by_list: 6.9053605 microseconds per iteration rev_by_append_list: 6.885185099999999 microseconds per iteration rev_by_lappend: 7.940962699999999 microseconds per iteration rev_by_lset: 7.6243322000000004 microseconds per iteration rev_by_lappend_2: 8.6450237 microseconds per iteration rev_recursive: 10.6051011 microseconds per iteration String length: 50 rev_by_string_reverse: 2.2105929 microseconds per iteration rev_by_append_list: 17.4244228 microseconds per iteration rev_by_append: 18.8571488 microseconds per iteration rev_by_lappend: 20.859946400000002 microseconds per iteration rev_by_lappend_2: 21.3999276 microseconds per iteration rev_by_lset: 22.044439 microseconds per iteration rev_by_list: 27.5186891 microseconds per iteration rev_by_string: 32.9303397 microseconds per iteration rev_recursive: 96.4504848 microseconds per iteration String length: 500 rev_by_string_reverse: 2.172858 microseconds per iteration rev_by_append_list: 103.158742 microseconds per iteration rev_by_lappend_2: 123.131986 microseconds per iteration rev_by_lset: 149.26255600000002 microseconds per iteration rev_by_lappend: 162.248617 microseconds per iteration rev_by_append: 165.596969 microseconds per iteration rev_by_list: 243.793305 microseconds per iteration rev_by_string: 369.550072 microseconds per iteration rev_recursive: 961.125404 microseconds per iteration String length: 5000 rev_by_string_reverse: 5.58321 microseconds per iteration rev_by_lappend: 1132.712496 microseconds per iteration rev_by_append_list: 1182.556687 microseconds per iteration rev_by_lappend_2: 1392.8486950000001 microseconds per iteration rev_by_lset: 1466.0263300000001 microseconds per iteration rev_by_append: 1639.695497 microseconds per iteration rev_by_list: 2891.902023 microseconds per iteration rev_by_string: 3694.58081 microseconds per iteration rev_recursive: 10026.851468 microseconds per iteration String length: 50000 rev_by_string_reverse: 39.9818 microseconds per iteration rev_by_lappend_2: 15873.6687 microseconds per iteration rev_by_lappend: 16857.907199999998 microseconds per iteration rev_by_append: 17876.612100000002 microseconds per iteration rev_by_append_list: 24683.5453 microseconds per iteration rev_by_lset: 29211.918199999996 microseconds per iteration rev_recursive: 112051.476 microseconds per iteration rev_by_list: 159145.1982 microseconds per iteration rev_by_string: 164660.5375 microseconds per iteration