Custom sorting

Richard Suchenwirth 2002-09-03 - The lsort command allows many useful ways of sorting, but sometimes one needs other ways. For instance, to sort strings according to their length; or alphabetically, but reversed (whole dictionaries are produced for this, e.g. for finding rhymes). lsort takes a -command option, but see that page for why it is "a pig in performance".

The recommended way is rather to produce from the original input a list of pairs consisting of sorting key and original element; sort using -index according to the key; and extract the original elements from the result. Here is a little wrapper for this approach, which takes as first argument a sorting function name (possibly multi-word like string length) and then the usual arguments of lsort, the last of which is the list to be sorted:

 proc lsortby {sortf args} {
    set list [lindex $args end] ;# list to be sorted is last
    set args [lrange $args 0 end-1]
    set t {}
    foreach element $list {
        lappend t [list [eval $sortf [list $element]] $element]
    }
    set res {}
    foreach i [eval lsort $args -index 0 [list $t]] {
        lappend res [lindex $i 1]
    }
    set res
 }

#----------- testing (and usage examples):

 puts [lsortby {string length} -int {long longer short shorter {very long} x}]
 proc reverse s {
    set i [string length $s]
    set res {}
    while {$i} {append res [string index $s [incr i -1]]}
    set res
 }
 puts [lsortby reverse -unique {this is the rest of the best test beast}]

results in:

 x long short longer shorter {very long}
 the of is this beast best rest test

RHO 04.12.2008 I added some code to lsortby, so the sort function can be applied to a sub-element only:

 proc lsortby1 {sortf args} {
    set list [lindex $args end] ;# list to be sorted is last
    set args [lrange $args 0 end-1]
    if {[set i [lsearch -exact $args -index]] >= 0} {
      set j [lindex $args [incr i]]
      set args [lreplace $args [expr {$i - 1}] $i]
    }
    set t {}
    if {[info exists j]} {
      foreach element $list {
        lappend t [list [eval $sortf [list [lindex $element $j]]] $element]
      }
    } else {
      foreach element $list {
        lappend t [list [eval $sortf [list $element]] $element]
      }
    }
    set res {}
    foreach i [eval lsort $args -index 0 [list $t]] {
        lappend res [lindex $i 1]
    }
    set res
 }

I need this to sort information from a database call according to the revision sequence "0, A, B, ..., Z, AA, AB, ..., AZ, ..."

Test:

 set list {{Revision: B} {Revision: 0} {Revision: BA} {Revision: AA} {Revision: A} {Revision: AC} {Revision: Z} {Revision: AB} {Revision: AZ}  {Revision: BB}} 
 lsortby1 {string length} -integer -index 1 [lsort -index 1 $list]

results in

 {Revision: 0} {Revision: A} {Revision: B} {Revision: Z} {Revision: AA} {Revision: AB} {Revision: AC} {Revision: AZ} {Revision: BA} {Revision: BB}

Nike 8.12.2002

Today i needed new way of sorting, shortest first in alphabetical order.

My version:

 proc nsort {list} {
   set final "" ; # for empty lists
   foreach temp $list { lappend length([string length $temp]) $temp }
   foreach temp [lsort -integer [array names length]] { lappend final [lsort $length($temp)] }
   return [join $final]
 }

Death wrote:

 proc sortlength {wordlist} {
   set l 0;foreach word $wordlist {lappend words([set s [string length $word]]) $word;if {$s>$l} {set l $s}}
   for {set c 0} {$c<=$l} {incr c} {if {[info exists words($c)]} {lappend newlist [lsort $words($c)]}}
   return [join $newlist]
 }

Death's version seems to be almost 10% slower

RS: ...and my proposal takes more than double of your time:

 % time {nsort {fool foo bart bar grill grillroom}} 1000
 91 microseconds per iteration
 % time {lsortby {string length} [lsort {fool foo bart bar grill grillroom}]} 1000
 228 microseconds per iteration

That goes to show that custom implementations can be tuned for best performance. But if runtime is not the bottleneck, using lsortby is more convenient at coding time...

Lars H: The comparison is not quite correct, as you should do integer comparison (not the default string comparison) on the results of the string length. Furthermore you can speed up lsortby even more by making sure that the "sort function" is a list, i.e., say

 lsortby [list string length] -integer ...

instead of

 lsortby {string length} ...

as above. For me, this cuts away 35% of the running time!

A convenient alternative to having special "sort functions" would be to have a foreach-like syntax, i.e.,

 proc forsort {options body list} {
    set L [list]
    foreach item $list {
        lappend L [list $item [if 1 then $body]]
    }
    set res [list]
    foreach item [eval [list lsort] [lrange $options 0 end] [list -index 1 $L]] {
        lappend res [lindex $item 0]
    }
    set res
 }

using which one could write

 forsort {-integer} {string length $item} [lsort {fool foo bart bar grill grillroom}]

I time this as being a bit faster still, probably due to the [if 1 then $body] instead of [eval $body] to force byte-compilation.


KBK 2002-12-10 - One important aspect of the lsort command that people overlook is that it is stable. In other words, sorting a list first on one criterion A and then another criterion B will preserve the A ordering for records that have equal B. In other words, we can first alphabetize the list, and then sort it by the length of the strings, and get the right answer.

 proc sortlength2 { wordlist } {
     set words {}
     foreach word $wordlist {
         lappend words [list [string length $word] $word]
     }
     set result {}
     foreach pair [lsort -integer -index 0 [lsort -ascii -index 1 $words]] {
         lappend result [lindex $pair 1]
     }
     return $result
 }

This method emerges the clear winner in this particular case. On my machine:

 set wordlist [list fool foo bart bar grill grillroom]
 foreach test {
     { nsort $wordlist }
     { sortlength $wordlist }
     { lsortby {string length} [lsort $wordlist] }
     { sortlength2 $wordlist }
 } {
     puts $test
     puts [time $test 10000]
 }

gives

 nsort $wordlist 
 82 microseconds per iteration
 sortlength $wordlist 
 133 microseconds per iteration
 lsortby {string length} [lsort $wordlist] 
 207 microseconds per iteration
 sortlength2 $wordlist 
 50 microseconds per iteration

RS 2006-09-07 - Here's a sort function for dates like 1-JAN-05, 02-Dec-06 etc:

 proc reorder_date date {
    foreach {day month year} [split $date -] break
    scan $day %d day
    set monthnames {- Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec}
    set month [lsearch -regexp $monthnames (?i)^$month]
    format %s-%02d-%02d $year $month $day
 }

Testing:

 % lsortby reorder_date {3-apr-06 2-jan-06 5-feb-06 11-dec-04}
 11-dec-04 2-jan-06 5-feb-06 3-apr-06

MG would personally do something like...

  proc lsort_dates {one two} {
    set one [clock scan $one]
    set two [clock scan $two]
    if { $one < $two } {
         return -1
       } elseif { $one > $two} {
         return 1
       } else {
         return 0
       }
  }
 lsort -command lsort_dates {3-apr-06 2-jan-06 5-feb-06 11-dec-04}

RS: clock scan is a good idea (I wasn't aware that it accepts that format), but lsort -command is not (see above). So things get simpler (which I always love :^) :

 % lsortby {clock scan} {3-apr-06 2-jan-06 5-feb-06 11-dec-04}
 11-dec-04 2-jan-06 5-feb-06 3-apr-06

See also collation