Version 16 of Performance of string reversing algorithms

Updated 2011-05-05 02:38:38 by AMG

Arjen Markus (13 december 2005) In the chatroom yesterday we discussed (or should I say mused over) the pros and cons of using Tcl for biomathematics. One of the subjects was how to reverse a string. Several possibilities have already been published on the Wiki, but this is a comparison of the performance of several on short and long strings.

I find the results remarkable: the behaviour for short strings is completely different than for moderately long strings ... well, judge for yourself.


 # Experiments:
 #    Reversing a string - several ways
 #
 proc rev_by_string {str} {
     set rev ""
     set length [string length $str]

     for {set i 0} {$i < $length} {incr i} {
         set rev "[string index $str $i]$rev"
     }
     return $rev
 }
 proc rev_by_append {str} {
     set rev ""
     set length [string length $str]

     for {set i [expr {$length-1}]} {$i >= 0} {incr i -1} {
         append rev [string index $str $i]
     }
     return $rev
 }
 proc rev_by_list {str} {
     set rev ""

     foreach c [split $str ""] {
         set rev "$c$rev"
     }
     return $rev
 }
 proc rev_recursive {str} {
     set rev ""
     set length [string length $str]

     if { $length <= 4 } {
         return "[string index $str 3][string index $str 2][string index $str 1][string index $str 0]"
     } elseif { $length % 2 == 0 } {
         set half   [expr {$length/2}]
         set halfp1 [expr {$length/2+1}]
         return "[rev_recursive [string range $str $halfp1 end]][rev_recursive [string range $str 0 $half]]"
     } else {
         set half   [expr {$length/2}]
         set halfp1 [expr {$length/2+1}]
         return "[string index $str end][rev_recursive [string range $str $halfp1 end-1]][rev_recursive [string range $str 0 $half]]"
     }
 }

 # Test the procedures ...
 puts [rev_by_string   "abcdefg"]
 puts [rev_by_append   "abcdefg"]
 puts [rev_by_list     "abcdefg"]
 puts [rev_recursive   "abcdefg"]
 puts [rev_recursive   "abcdef"]

 # Measure the performance
 foreach len {1 10 100 1000 10000} repeat {10000 10000 1000 1000 10} {
     set str [string repeat ABCDE $len]
     puts "String length: [string length $str]"
     puts "rev_by_string:  [time {rev_by_string   $str} $repeat]"
     puts "rev_by_append:  [time {rev_by_append   $str} $repeat]"
     puts "rev_by_list:\    [time {rev_by_list     $str} $repeat]"
     puts "rev_recursive:  [time {rev_recursive   $str} $repeat]"
 }

Here is the result:

 gfedcba
 gfedcba
 gfedcba
 gfedcba
 fedcba
 String length: 5
 rev_by_string:  7 microseconds per iteration
 rev_by_append:  8 microseconds per iteration
 rev_by_list:    11 microseconds per iteration
 rev_recursive:  15 microseconds per iteration 
 String length: 50
 rev_by_string:  58 microseconds per iteration
 rev_by_append:  44 microseconds per iteration
 rev_by_list:    48 microseconds per iteration
 rev_recursive:  152 microseconds per iteration 
 String length: 500
 rev_by_string:  707 microseconds per iteration
 rev_by_append:  403 microseconds per iteration
 rev_by_list:    480 microseconds per iteration
 rev_recursive:  1561 microseconds per iteration 
 String length: 5000
 rev_by_string:  16722 microseconds per iteration
 rev_by_append:  3671 microseconds per iteration
 rev_by_list:    13910 microseconds per iteration
 rev_recursive:  13971 microseconds per iteration 
 String length: 50000
 rev_by_string:  1051731 microseconds per iteration
 rev_by_append:  37300 microseconds per iteration
 rev_by_list:    1032877 microseconds per iteration
 rev_recursive:  137735 microseconds per iteration

The solution with append is the definitive winner, but quite surprisingly the solution with recursion, which is slow for short strings, is a good second for long strings.

Lars H, 14 Dec 2005: Why on earth do you find it surprising that the recursive solution comes in second for long strings? This is precisely the result a trivial complexity analysis of the procedures would produce! Remember that string concatenation (as performed in e.g. commands like

  set rev "$c$rev"

above) is linear in the sum of the sizes of the strings concatenated. This implies rev_by_string and rev_by_list both do an O(n) operation at each iteration of the loop, giving a total O(n^2) complexity. rev_by_append on the other hand takes advantage of the fact that append is on average an O(1) (constant time) operation, whence the total complexity there is a mere O(n). rev_recursive manages an O(n log n) total complexity through the same recursive halving trick as a classical mergesort -- most of its concatenations are on short strings, so even though there are the same number of concatenations as in rev_by_string, the average cost is much lower. See also quadratic time.


RS The canonical recursive solution would be

 proc rev_rec2 str {
   expr {[string length $str]<2? $str: 
   "[string index $str end][rev_rec2 [string range $str 0 end-1]]"} 
 }

but that hits the recursion limit when strings get longer than 500 or so...


JMN The following two solutions show that a list-based approach is not necessarily slow as suggested by the above timings. These two appear to be ever so slightly faster than the append solution for medium & large lists.

 proc rev_by_lset {str} {
    set len [string length $str]
    foreach c [set str [split $str ""]] {
        lset str [incr len -1] $c
    }
    return [join $str ""]
 }
 proc rev_by_lappend {str} {
    set len [llength [set str [split $str ""]]]
    while {$len} {
        lappend res [lindex $str [incr len -1]]
    }
    return [join $res ""]
 }

For a teeny extra oomph in rev_by_lappend, preallocate the res list:

 proc rev_by_lappend_2 {str} {
    set len [llength [set str [split $str ""]]]
    set res [lreplace [lrepeat $len ""] 0 end] ;# == {}
    while {$len} {
        lappend res [lindex $str [incr len -1]]
    }
    return [join $res ""]
 } 

The saving then comes from not having to periodically realloc the internal list structure as it gets lappend'd to. Whether this saving outweighs the cost of calling lreplace & lrepeat... well YMMV :-)

 % proc main {} {
        set N 10
        foreach len { 999 9999 99999 999999 } {
                set STR [string repeat "abcde" $len]
                time { rev_by_lappend $STR } 1 ;# flush the timing
                puts "STR len [string length $STR]"
                puts "1 : [time { rev_by_lappend $STR } $N]"
                puts "2 : [time { rev_by_lappend_2 $STR } $N]"
        }
 }
 %
 % main
 STR len 4995
 1 : 1652.4 microseconds per iteration
 2 : 1608.6 microseconds per iteration
 STR len 49995
 1 : 17144.6 microseconds per iteration
 2 : 17155.3 microseconds per iteration
 STR len 499995
 1 : 170775.5 microseconds per iteration
 2 : 167460.7 microseconds per iteration
 STR len 4999995
 1 : 1729771.0 microseconds per iteration
 2 : 1726618.5 microseconds per iteration