Richard Suchenwirth 2002-09-03 - The lsort command allows many useful ways of sorting, but sometimes one needs other ways. For instance, to sort strings according to their length; or alphabetically, but reversed (whole dictionaries are produced for this, e.g. for finding rhymes). lsort takes a -command option, but see that page for why it is "a pig in performance".
The recommended way is rather to produce from the original input a list of pairs consisting of sorting key and original element; sort using -index according to the key; and extract the original elements from the result. Here is a little wrapper for this approach, which takes as first argument a sorting function name (possibly multi-word like string length) and then the usual arguments of lsort, the last of which is the list to be sorted:
proc lsortby {sortf args} { set list [lindex $args end] ;# list to be sorted is last set args [lrange $args 0 end-1] set t {} foreach element $list { lappend t [list [eval $sortf [list $element]] $element] } set res {} foreach i [eval lsort $args -index 0 [list $t]] { lappend res [lindex $i 1] } set res }
#----------- testing (and usage examples):
puts [lsortby {string length} -int {long longer short shorter {very long} x}] proc reverse s { set i [string length $s] set res {} while {$i} {append res [string index $s [incr i -1]]} set res } puts [lsortby reverse -unique {this is the rest of the best test beast}]
results in:
x long short longer shorter {very long} the of is this beast best rest test
RHO 04.12.2008 I added some code to lsortby, so the sort function can be applied to a sub-element only:
proc lsortby1 {sortf args} { set list [lindex $args end] ;# list to be sorted is last set args [lrange $args 0 end-1] if {[set i [lsearch -exact $args -index]] >= 0} { set j [lindex $args [incr i]] set args [lreplace $args [expr {$i - 1}] $i] } set t {} if {[info exists j]} { foreach element $list { lappend t [list [eval $sortf [list [lindex $element $j]]] $element] } } else { foreach element $list { lappend t [list [eval $sortf [list $element]] $element] } } set res {} foreach i [eval lsort $args -index 0 [list $t]] { lappend res [lindex $i 1] } set res }
I need this to sort information from a database call according to the revision sequence "0, A, B, ..., Z, AA, AB, ..., AZ, ..."
Test:
set list {{Revision: B} {Revision: 0} {Revision: BA} {Revision: AA} {Revision: A} {Revision: AC} {Revision: Z} {Revision: AB} {Revision: AZ} {Revision: BB}} lsortby1 {string length} -integer -index 1 [lsort -index 1 $list]
results in
{Revision: 0} {Revision: A} {Revision: B} {Revision: Z} {Revision: AA} {Revision: AB} {Revision: AC} {Revision: AZ} {Revision: BA} {Revision: BB}
Nike 8.12.2002
Today i needed new way of sorting, shortest first in alphabetical order.
My version:
proc nsort {list} { set final "" ; # for empty lists foreach temp $list { lappend length([string length $temp]) $temp } foreach temp [lsort -integer [array names length]] { lappend final [lsort $length($temp)] } return [join $final] }
Death wrote:
proc sortlength {wordlist} { set l 0;foreach word $wordlist {lappend words([set s [string length $word]]) $word;if {$s>$l} {set l $s}} for {set c 0} {$c<=$l} {incr c} {if {[info exists words($c)]} {lappend newlist [lsort $words($c)]}} return [join $newlist] }
Death's version seems to be almost 10% slower
RS: ...and my proposal takes more than double of your time:
% time {nsort {fool foo bart bar grill grillroom}} 1000 91 microseconds per iteration % time {lsortby {string length} [lsort {fool foo bart bar grill grillroom}]} 1000 228 microseconds per iteration
That goes to show that custom implementations can be tuned for best performance. But if runtime is not the bottleneck, using lsortby is more convenient at coding time...
Lars H: The comparison is not quite correct, as you should do integer comparison (not the default string comparison) on the results of the string length. Furthermore you can speed up lsortby even more by making sure that the "sort function" is a list, i.e., say
lsortby [list string length] -integer ...
instead of
lsortby {string length} ...
as above. For me, this cuts away 35% of the running time!
A convenient alternative to having special "sort functions" would be to have a foreach-like syntax, i.e.,
proc forsort {options body list} { set L [list] foreach item $list { lappend L [list $item [if 1 then $body]] } set res [list] foreach item [eval [list lsort] [lrange $options 0 end] [list -index 1 $L]] { lappend res [lindex $item 0] } set res }
using which one could write
forsort {-integer} {string length $item} [lsort {fool foo bart bar grill grillroom}]
I time this as being a bit faster still, probably due to the [if 1 then $body] instead of [eval $body] to force byte-compilation.
KBK 2002-12-10 - One important aspect of the lsort command that people overlook is that it is stable. In other words, sorting a list first on one criterion A and then another criterion B will preserve the A ordering for records that have equal B. In other words, we can first alphabetize the list, and then sort it by the length of the strings, and get the right answer.
proc sortlength2 { wordlist } { set words {} foreach word $wordlist { lappend words [list [string length $word] $word] } set result {} foreach pair [lsort -integer -index 0 [lsort -ascii -index 1 $words]] { lappend result [lindex $pair 1] } return $result }
This method emerges the clear winner in this particular case. On my machine:
set wordlist [list fool foo bart bar grill grillroom] foreach test { { nsort $wordlist } { sortlength $wordlist } { lsortby {string length} [lsort $wordlist] } { sortlength2 $wordlist } } { puts $test puts [time $test 10000] }
gives
nsort $wordlist 82 microseconds per iteration sortlength $wordlist 133 microseconds per iteration lsortby {string length} [lsort $wordlist] 207 microseconds per iteration sortlength2 $wordlist 50 microseconds per iteration
RS 2006-09-07 - Here's a sort function for dates like 1-JAN-05, 02-Dec-06 etc:
proc reorder_date date { foreach {day month year} [split $date -] break scan $day %d day set monthnames {- Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec} set month [lsearch -regexp $monthnames (?i)^$month] format %s-%02d-%02d $year $month $day }
Testing:
% lsortby reorder_date {3-apr-06 2-jan-06 5-feb-06 11-dec-04} 11-dec-04 2-jan-06 5-feb-06 3-apr-06
MG would personally do something like...
proc lsort_dates {one two} { set one [clock scan $one] set two [clock scan $two] if { $one < $two } { return -1 } elseif { $one > $two} { return 1 } else { return 0 } }
lsort -command lsort_dates {3-apr-06 2-jan-06 5-feb-06 11-dec-04}
RS: clock scan is a good idea (I wasn't aware that it accepts that format), but lsort -command is not (see above). So things get simpler (which I always love :^) :
% lsortby {clock scan} {3-apr-06 2-jan-06 5-feb-06 11-dec-04} 11-dec-04 2-jan-06 5-feb-06 3-apr-06
See also collation