Version 4 of Custom sorting

Updated 2002-12-09 15:52:50

Richard Suchenwirth 2002-09-03 - The lsort command allows many useful ways of sorting, but sometimes one needs other ways. For instance, to sort strings according to their length; or alphabetically, but reversed (whole dictionaries are produced for this, e.g for finding rhymes). lsort takes a -command option, but see that page for why it is "a pig in performance".

The recommended way is rather to produce from the original input a list of pairs consisting of sorting key and original element; sort using -index according to the key; and extract the original elements from the result. Here is a little wrapper for this approach, which takes as first argument a sorting function name (possibly multi-word like string length) and then the usual arguments of lsort, the last of which is the list to be sorted:

 proc lsortby {sortf args} {
    set list [lindex $args end] ;# list ot be sorted is last
    set args [lrange $args 0 end-1]
    set t {}
    foreach element $list {
        lappend t [list [eval $sortf [list $element]] $element]
    }
    set res {}
    foreach i [eval lsort $args -index 0 [list $t]] {
        lappend res [lindex $i 1]
    }
    set res
 }
 #----------- testing (and usage examples):
 puts [lsortby {string length} -int {long longer short shorter {very long} x}]
 proc reverse s {
    set i [string length $s]
    set res {}
    while {$i} {append res [string index $s [incr i -1]]}
    set res
 }
 puts [lsortby reverse -unique {this is the rest of the best test beast}]

if 0 {results in:

 x long short longer shorter {very long}
 the of is this beast best rest test

Nike 8.12.2002

Today i needed new way of sorting, shortest first in alphabetical order.

My version:

 proc nsort {list} {
   set final "" ; # for empty lists
   foreach temp $list { lappend length([string length $temp]) $temp }
   foreach temp [lsort -integer [array names length]] { lappend final [lsort $length($temp)] }
   return [join $final]
 }

Death wrote:

 proc sortlength {wordlist} {
   set l 0;foreach word $wordlist {lappend words([set s [string length $word]]) $word;if {$s>$l} {set l $s}}
   for {set c 0} {$c<=$l} {incr c} {if {[info exists words($c)]} {lappend newlist [lsort $words($c)]}}
   return [join $newlist]
 }

Death's version seems to be almost 10% slower

RS: ...and my proposal takes more than double of your time:

 % time {nsort {fool foo bart bar grill grillroom}} 1000
 91 microseconds per iteration
 % time {lsortby {string length} [lsort {fool foo bart bar grill grillroom}]} 1000
 228 microseconds per iteration

That goes to show that custom implementations can be tuned for best performance. But if runtime is not the bottleneck, using lsortby is more convenient at coding time...

Lars H: The comparison is not quite correct, as you should do integer comparison (not the default string comparison) on the results of the string length. Furthermore you can speed up lsortby even more by making sure that the "sort function" is a list, i.e., say

 lsortby [list string length] -integer ...

instead of

 lsortby {string length} ...

as above. For me, this cuts away 35% of the running time!

A convenient alternative to having special "sort functions" would be to have a foreach-like syntax, i.e.,

 proc forsort {options body list} {
    set L [list]
    foreach item $list {
        lappend L [list $item [if 1 then $body]]
    }
    set res [list]
    foreach item [eval [list lsort] [lrange $options 0 end] [list -index 1 $L]] {
        lappend res [lindex $item 0]
    }
    set res
 }

using which one could write

 forsort {-integer} {string length $item} [lsort {fool foo bart bar grill grillroom}]

I time this as being a bit faster still, probably due to the [if 1 then $body] instead of [eval $body] to force byte-compilation.


Arts and crafts of Tcl-Tk programming