Version 6 of Custom sorting

Updated 2003-08-11 12:16:58

Richard Suchenwirth 2002-09-03 - The lsort command allows many useful ways of sorting, but sometimes one needs other ways. For instance, to sort strings according to their length; or alphabetically, but reversed (whole dictionaries are produced for this, e.g for finding rhymes). lsort takes a -command option, but see that page for why it is "a pig in performance".

The recommended way is rather to produce from the original input a list of pairs consisting of sorting key and original element; sort using -index according to the key; and extract the original elements from the result. Here is a little wrapper for this approach, which takes as first argument a sorting function name (possibly multi-word like string length) and then the usual arguments of lsort, the last of which is the list to be sorted:

 proc lsortby {sortf args} {
    set list [lindex $args end] ;# list to be sorted is last
    set args [lrange $args 0 end-1]
    set t {}
    foreach element $list {
        lappend t [list [eval $sortf [list $element]] $element]
    }
    set res {}
    foreach i [eval lsort $args -index 0 [list $t]] {
        lappend res [lindex $i 1]
    }
    set res
 }
 #----------- testing (and usage examples):
 puts [lsortby {string length} -int {long longer short shorter {very long} x}]
 proc reverse s {
    set i [string length $s]
    set res {}
    while {$i} {append res [string index $s [incr i -1]]}
    set res
 }
 puts [lsortby reverse -unique {this is the rest of the best test beast}]

if 0 {results in:

 x long short longer shorter {very long}
 the of is this beast best rest test

Nike 8.12.2002

Today i needed new way of sorting, shortest first in alphabetical order.

My version:

 proc nsort {list} {
   set final "" ; # for empty lists
   foreach temp $list { lappend length([string length $temp]) $temp }
   foreach temp [lsort -integer [array names length]] { lappend final [lsort $length($temp)] }
   return [join $final]
 }

Death wrote:

 proc sortlength {wordlist} {
   set l 0;foreach word $wordlist {lappend words([set s [string length $word]]) $word;if {$s>$l} {set l $s}}
   for {set c 0} {$c<=$l} {incr c} {if {[info exists words($c)]} {lappend newlist [lsort $words($c)]}}
   return [join $newlist]
 }

Death's version seems to be almost 10% slower

RS: ...and my proposal takes more than double of your time:

 % time {nsort {fool foo bart bar grill grillroom}} 1000
 91 microseconds per iteration
 % time {lsortby {string length} [lsort {fool foo bart bar grill grillroom}]} 1000
 228 microseconds per iteration

That goes to show that custom implementations can be tuned for best performance. But if runtime is not the bottleneck, using lsortby is more convenient at coding time...

Lars H: The comparison is not quite correct, as you should do integer comparison (not the default string comparison) on the results of the string length. Furthermore you can speed up lsortby even more by making sure that the "sort function" is a list, i.e., say

 lsortby [list string length] -integer ...

instead of

 lsortby {string length} ...

as above. For me, this cuts away 35% of the running time!

A convenient alternative to having special "sort functions" would be to have a foreach-like syntax, i.e.,

 proc forsort {options body list} {
    set L [list]
    foreach item $list {
        lappend L [list $item [if 1 then $body]]
    }
    set res [list]
    foreach item [eval [list lsort] [lrange $options 0 end] [list -index 1 $L]] {
        lappend res [lindex $item 0]
    }
    set res
 }

using which one could write

 forsort {-integer} {string length $item} [lsort {fool foo bart bar grill grillroom}]

I time this as being a bit faster still, probably due to the [if 1 then $body] instead of [eval $body] to force byte-compilation.


KBK 2002-12-10 - One important aspect of the lsort command that people overlook is that it is stable. In other words, sorting a list first on one criterion A and then another criterion B will preserve the A ordering for records that have equal B. In other words, we can first alphabetize the list, and then sort it by the length of the strings, and get the right answer.

 proc sortlength2 { wordlist } {
     set words {}
     foreach word $wordlist {
         lappend words [list [string length $word] $word]
     }
     set result {}
     foreach pair [lsort -integer -index 0 [lsort -ascii -index 1 $words]] {
         lappend result [lindex $pair 1]
     }
     return $result
 }

This method emerges the clear winner in this particular case. On my machine:

 set wordlist [list fool foo bart bar grill grillroom]
 foreach test {
     { nsort $wordlist }
     { sortlength $wordlist }
     { lsortby {string length} [lsort $wordlist] }
     { sortlength2 $wordlist }
 } {
     puts $test
     puts [time $test 10000]
 }

gives

 nsort $wordlist 
 82 microseconds per iteration
 sortlength $wordlist 
 133 microseconds per iteration
 lsortby {string length} [lsort $wordlist] 
 207 microseconds per iteration
 sortlength2 $wordlist 
 50 microseconds per iteration

Arts and crafts of Tcl-Tk programming