## Performance of string reversing algorithms

Arjen Markus (13 december 2005) In the chatroom yesterday we discussed (or should I say mused over) the pros and cons of using Tcl for biomathematics. One of the subjects was how to reverse a string. Several possibilities have already been published on the Wiki, but this is a comparison of the performance of several on short and long strings.

I find the results remarkable: the behaviour for short strings is completely different than for moderately long strings ... well, judge for yourself.

``` # Experiments:
#    Reversing a string - several ways
#
proc rev_by_string {str} {
set rev ""
set length [string length \$str]

for {set i 0} {\$i < \$length} {incr i} {
set rev "[string index \$str \$i]\$rev"
}
return \$rev
}
proc rev_by_append {str} {
set rev ""
set length [string length \$str]

for {set i [expr {\$length-1}]} {\$i >= 0} {incr i -1} {
append rev [string index \$str \$i]
}
return \$rev
}
proc rev_by_list {str} {
set rev ""

foreach c [split \$str ""] {
set rev "\$c\$rev"
}
return \$rev
}
proc rev_recursive {str} {
set rev ""
set length [string length \$str]

if { \$length <= 4 } {
return "[string index \$str 3][string index \$str 2][string index \$str 1][string index \$str 0]"
} elseif { \$length % 2 == 0 } {
set half   [expr {\$length/2}]
set halfp1 [expr {\$length/2+1}]
return "[rev_recursive [string range \$str \$halfp1 end]][rev_recursive [string range \$str 0 \$half]]"
} else {
set half   [expr {\$length/2}]
set halfp1 [expr {\$length/2+1}]
return "[string index \$str end][rev_recursive [string range \$str \$halfp1 end-1]][rev_recursive [string range \$str 0 \$half]]"
}
}

# Test the procedures ...
puts [rev_by_string   "abcdefg"]
puts [rev_by_append   "abcdefg"]
puts [rev_by_list     "abcdefg"]
puts [rev_recursive   "abcdefg"]
puts [rev_recursive   "abcdef"]

# Measure the performance
foreach len {1 10 100 1000 10000} repeat {10000 10000 1000 1000 10} {
set str [string repeat ABCDE \$len]
puts "String length: [string length \$str]"
puts "rev_by_string:  [time {rev_by_string   \$str} \$repeat]"
puts "rev_by_append:  [time {rev_by_append   \$str} \$repeat]"
puts "rev_by_list:\    [time {rev_by_list     \$str} \$repeat]"
puts "rev_recursive:  [time {rev_recursive   \$str} \$repeat]"
}```

Here is the result:

``` gfedcba
gfedcba
gfedcba
gfedcba
fedcba
String length: 5
rev_by_string:  7 microseconds per iteration
rev_by_append:  8 microseconds per iteration
rev_by_list:    11 microseconds per iteration
rev_recursive:  15 microseconds per iteration
String length: 50
rev_by_string:  58 microseconds per iteration
rev_by_append:  44 microseconds per iteration
rev_by_list:    48 microseconds per iteration
rev_recursive:  152 microseconds per iteration
String length: 500
rev_by_string:  707 microseconds per iteration
rev_by_append:  403 microseconds per iteration
rev_by_list:    480 microseconds per iteration
rev_recursive:  1561 microseconds per iteration
String length: 5000
rev_by_string:  16722 microseconds per iteration
rev_by_append:  3671 microseconds per iteration
rev_by_list:    13910 microseconds per iteration
rev_recursive:  13971 microseconds per iteration
String length: 50000
rev_by_string:  1051731 microseconds per iteration
rev_by_append:  37300 microseconds per iteration
rev_by_list:    1032877 microseconds per iteration
rev_recursive:  137735 microseconds per iteration```

The solution with append is the definitive winner, but quite surprisingly the solution with recursion, which is slow for short strings, is a good second for long strings.

Lars H, 14 Dec 2005: Why on earth do you find it surprising that the recursive solution comes in second for long strings? This is precisely the result a trivial complexity analysis of the procedures would produce! Remember that string concatenation (as performed in e.g. commands like

`  set rev "\$c\$rev"`

above) is linear in the sum of the sizes of the strings concatenated. This implies rev_by_string and rev_by_list both do an O(n) operation at each iteration of the loop, giving a total O(n^2) complexity. rev_by_append on the other hand takes advantage of the fact that append is on average an O(1) (constant time) operation, whence the total complexity there is a mere O(n). rev_recursive manages an O(n log n) total complexity through the same recursive halving trick as a classical mergesort -- most of its concatenations are on short strings, so even though there are the same number of concatenations as in rev_by_string, the average cost is much lower. See also quadratic time.

RS The canonical recursive solution would be

``` proc rev_rec2 str {
expr {[string length \$str]<2? \$str:
"[string index \$str end][rev_rec2 [string range \$str 0 end-1]]"}
}```

but that hits the recursion limit when strings get longer than 500 or so...

JMN The following two solutions show that a list-based approach is not necessarily slow as suggested by the above timings. These two appear to be ever so slightly faster than the append solution for medium & large lists.

``` proc rev_by_lset {str} {
set len [string length \$str]
foreach c [set str [split \$str ""]] {
lset str [incr len -1] \$c
}
return [join \$str ""]
}
proc rev_by_lappend {str} {
set len [llength [set str [split \$str ""]]]
while {\$len} {
lappend res [lindex \$str [incr len -1]]
}
return [join \$res ""]
}```

For a teeny extra oomph in rev_by_lappend, preallocate the res list:

``` proc rev_by_lappend_2 {str} {
set len [llength [set str [split \$str ""]]]
set res [lreplace [lrepeat \$len ""] 0 end] ;# == {}
while {\$len} {
lappend res [lindex \$str [incr len -1]]
}
return [join \$res ""]
} ```

The saving then comes from not having to periodically realloc the internal list structure as it gets lappend'd to. Whether this saving outweighs the cost of calling lreplace & lrepeat... well YMMV :-)

``` % proc main {} {
set N 10
foreach len { 999 9999 99999 999999 } {
set STR [string repeat "abcde" \$len]
time { rev_by_lappend \$STR } 1 ;# flush the timing
puts "STR len [string length \$STR]"
puts "1 : [time { rev_by_lappend \$STR } \$N]"
puts "2 : [time { rev_by_lappend_2 \$STR } \$N]"
}
}
%
% main
STR len 4995
1 : 1652.4 microseconds per iteration
2 : 1608.6 microseconds per iteration
STR len 49995
1 : 17144.6 microseconds per iteration
2 : 17155.3 microseconds per iteration
STR len 499995
1 : 170775.5 microseconds per iteration
2 : 167460.7 microseconds per iteration
STR len 4999995
1 : 1729771.0 microseconds per iteration
2 : 1726618.5 microseconds per iteration```

See also: [string reverse]

bsg This is all kind-of moot since the core does it already, but I wanted to add the obvious combination missing from above. Combine the append version with the list version:

```proc rev_by_append_list {str} {
set rev ""
set length [string length \$str]
set lstr [split \$str ""]
for {set i [expr {\$length-1}]} {\$i >= 0} {incr i -1} {
append rev [lindex \$lstr \$i]
}
return \$rev
}```

This variation mixes in nicely with the others.

```Tcl patchLevel: 8.5.6
String length: 5
rev_by_string_reverse:  1.8265948 microseconds per iteration
rev_by_append:  3.2407781 microseconds per iteration
rev_by_string:  3.1731415000000003 microseconds per iteration
rev_by_list:  6.9053605 microseconds per iteration
rev_by_append_list:  6.885185099999999 microseconds per iteration
rev_by_lappend:  7.940962699999999 microseconds per iteration
rev_by_lset:  7.6243322000000004 microseconds per iteration
rev_by_lappend_2:  8.6450237 microseconds per iteration
rev_recursive:  10.6051011 microseconds per iteration
String length: 50
rev_by_string_reverse:  2.2105929 microseconds per iteration
rev_by_append_list:  17.4244228 microseconds per iteration
rev_by_append:  18.8571488 microseconds per iteration
rev_by_lappend:  20.859946400000002 microseconds per iteration
rev_by_lappend_2:  21.3999276 microseconds per iteration
rev_by_lset:  22.044439 microseconds per iteration
rev_by_list:  27.5186891 microseconds per iteration
rev_by_string:  32.9303397 microseconds per iteration
rev_recursive:  96.4504848 microseconds per iteration
String length: 500
rev_by_string_reverse:  2.172858 microseconds per iteration
rev_by_append_list:  103.158742 microseconds per iteration
rev_by_lappend_2:  123.131986 microseconds per iteration
rev_by_lset:  149.26255600000002 microseconds per iteration
rev_by_lappend:  162.248617 microseconds per iteration
rev_by_append:  165.596969 microseconds per iteration
rev_by_list:  243.793305 microseconds per iteration
rev_by_string:  369.550072 microseconds per iteration
rev_recursive:  961.125404 microseconds per iteration
String length: 5000
rev_by_string_reverse:  5.58321 microseconds per iteration
rev_by_lappend:  1132.712496 microseconds per iteration
rev_by_append_list:  1182.556687 microseconds per iteration
rev_by_lappend_2:  1392.8486950000001 microseconds per iteration
rev_by_lset:  1466.0263300000001 microseconds per iteration
rev_by_append:  1639.695497 microseconds per iteration
rev_by_list:  2891.902023 microseconds per iteration
rev_by_string:  3694.58081 microseconds per iteration
rev_recursive:  10026.851468 microseconds per iteration
String length: 50000
rev_by_string_reverse:  39.9818 microseconds per iteration
rev_by_lappend_2:  15873.6687 microseconds per iteration
rev_by_lappend:  16857.907199999998 microseconds per iteration
rev_by_append:  17876.612100000002 microseconds per iteration
rev_by_append_list:  24683.5453 microseconds per iteration
rev_by_lset:  29211.918199999996 microseconds per iteration
rev_recursive:  112051.476 microseconds per iteration
rev_by_list:  159145.1982 microseconds per iteration
rev_by_string:  164660.5375 microseconds per iteration```

 Category Algorithm Category Performance