[Arjen Markus] (13 december 2005) In the chatroom yesterday we discussed (or should I say mused over) the pros and cons of using Tcl for biomathematics. One of the subjects was how to reverse a string. Several possibilities have already been published on the Wiki, but this is a comparison of the performance of several on short and long strings. I find the results remarkable: the behaviour for short strings is completely different than for moderately long strings ... well, judge for yourself. ---- # Experiments: # Reversing a string - several ways # proc rev_by_string {str} { set rev "" set length [string length $str] for {set i 0} {$i < $length} {incr i} { set rev "[string index $str $i]$rev" } return $rev } proc rev_by_append {str} { set rev "" set length [string length $str] for {set i [expr {$length-1}]} {$i >= 0} {incr i -1} { append rev [string index $str $i] } return $rev } proc rev_by_list {str} { set rev "" foreach c [split $str ""] { set rev "$c$rev" } return $rev } proc rev_recursive {str} { set rev "" set length [string length $str] if { $length <= 4 } { return "[string index $str 3][string index $str 2][string index $str 1][string index $str 0]" } elseif { $length % 2 == 0 } { set half [expr {$length/2}] set halfp1 [expr {$length/2+1}] return "[rev_recursive [string range $str $halfp1 end]][rev_recursive [string range $str 0 $half]]" } else { set half [expr {$length/2}] set halfp1 [expr {$length/2+1}] return "[string index $str end][rev_recursive [string range $str $halfp1 end-1]][rev_recursive [string range $str 0 $half]]" } } # Test the procedures ... puts [rev_by_string "abcdefg"] puts [rev_by_append "abcdefg"] puts [rev_by_list "abcdefg"] puts [rev_recursive "abcdefg"] puts [rev_recursive "abcdef"] # Measure the performance foreach len {1 10 100 1000 10000} repeat {10000 10000 1000 1000 10} { set str [string repeat ABCDE $len] puts "String length: [string length $str]" puts "rev_by_string: [time {rev_by_string $str} $repeat]" puts "rev_by_append: [time {rev_by_append $str} $repeat]" puts "rev_by_list: [time {rev_by_list $str} $repeat]" puts "rev_recursive: [time {rev_recursive $str} $repeat]" } ---- Here is the result: gfedcba gfedcba gfedcba gfedcba fedcba String length: 5 rev_by_string: 7 microseconds per iteration rev_by_append: 8 microseconds per iteration rev_by_list: 11 microseconds per iteration rev_recursive: 15 microseconds per iteration String length: 50 rev_by_string: 58 microseconds per iteration rev_by_append: 44 microseconds per iteration rev_by_list: 48 microseconds per iteration rev_recursive: 152 microseconds per iteration String length: 500 rev_by_string: 707 microseconds per iteration rev_by_append: 403 microseconds per iteration rev_by_list: 480 microseconds per iteration rev_recursive: 1561 microseconds per iteration String length: 5000 rev_by_string: 16722 microseconds per iteration rev_by_append: 3671 microseconds per iteration rev_by_list: 13910 microseconds per iteration rev_recursive: 13971 microseconds per iteration String length: 50000 rev_by_string: 1051731 microseconds per iteration rev_by_append: 37300 microseconds per iteration rev_by_list: 1032877 microseconds per iteration rev_recursive: 137735 microseconds per iteration The solution with append is the definitive winner, but quite surprisingly the solution with recursion, which is slow for short strings, is a good second for long strings. [Lars H], 14 Dec 2005: Why on earth do you find it surprising that the recursive solution comes in second for long strings? This is precisely the result a trivial complexity analysis of the procedures would produce! Remember that string concatenation (as performed in e.g. commands like set rev "$c$rev" above) is linear in the sum of the sizes of the strings concatenated. This implies '''rev_by_string''' and '''rev_by_list''' both do an O(''n'') operation at each iteration of the loop, giving a total O(''n''^2) complexity. '''rev_by_append''' on the other hand takes advantage of the fact that [append] is on average an O(1) (constant time) operation, whence the total complexity there is a mere O(''n''). '''rev_recursive''' manages an O(''n'' log ''n'') total complexity through the same recursive halving trick as a classical mergesort -- most of its concatenations are on short strings, so even though there are the same number of concatenations as in '''rev_by_string''', the average cost is much lower. See also [quadratic time]. ---- [RS] The canonical recursive solution would be proc rev_rec2 str { expr {[string length $str]<2? $str: "[string index $str end][rev_rec2 [string range $str 0 end-1]]"} } but that hits the recursion limit when strings get longer than 500 or so... ---- [JMN] The following two solutions show that a list-based approach is not necessarily slow as suggested by the above timings. These two appear to be ever so slightly faster than the append solution for medium & large lists. proc rev_by_lset {str} { set len [string length $str] foreach c [set str [split $str ""]] { lset str [incr len -1] $c } return [join $str ""] } proc rev_by_lappend {str} { set len [llength [set str [split $str ""]]] while {$len} { lappend res [lindex $str [incr len -1]] } return [join $res ""] } ---- For a teeny extra oomph in rev_by_lappend, preallocate the res list: proc rev_by_lappend_2 {str} { set len [llength [set str [split $str ""]]] set res [lreplace [lrepeat $len ""] 0 end] while {$len} { lappend res [lindex $str [incr len -1]] } return [join $res ""] } The code [[set res [[lreplace [[lrepeat $len ""]] 0 end]] ]] is equivalent to [[ set res {} ]], ''([Lars H]: Don't you rather mean [[set res [[lrepeat $len ""]] ]]? Or had you intended to have the [lreplace] from 0 to -1 or something like that?)'' but leaves the internal list structure with space for $len elements. The lrepeat is able to allocate space in one big lump. Otherwise the res list needs to be periodically realloc'd as it grows. It's not a very big win though: % proc main {} { set N 10 foreach len { 999 9999 99999 999999 } { set STR [string repeat "abcde" $len] time { rev_by_lappend $STR } 1 ;# flush the timing puts "STR len [string length $STR]" puts "1 : [time { rev_by_lappend $STR } $N]" puts "2 : [time { rev_by_lappend_2 $STR } $N]" } } % % main STR len 4995 1 : 1652.4 microseconds per iteration 2 : 1608.6 microseconds per iteration STR len 49995 1 : 17144.6 microseconds per iteration 2 : 17155.3 microseconds per iteration STR len 499995 1 : 170775.5 microseconds per iteration 2 : 167460.7 microseconds per iteration STR len 4999995 1 : 1729771.0 microseconds per iteration 2 : 1726618.5 microseconds per iteration ---- [[ [Category Algorithm] | [Category Performance] ]]