**lsort - basic documentation** See the official documentation at http://www.purl.org/tcl/home/man/tcl8.5/TclCmd/lsort.htm . ---- lsort - Sort the elements of a [list] lsort ?options? list This command sorts the elements of list, returning a new list in sorted order. The implementation of the lsort command uses the merge-sort algorithm which is a stable sort that has O(n log n) performance characteristics. By default ASCII sorting is used with the result returned in increasing order. However, any of the following options may be specified before list to control the sorting process (unique abbreviations are accepted): '''-ascii''': Use string comparison with ASCII collation order. This is the default. '''-dictionary''': Use dictionary-style comparison. This is the same as -ascii except (a) case is ignored except as a tie-breaker and (b) if two strings contain embedded numbers, the numbers compare as integers, not characters. For example, in -dictionary mode, bigBoy sorts between bigbang and bigboy, and x10y sorts between x9y and x11y. '''-integer''': Convert list elements to integers and use integer comparison. '''-real''': Convert list elements to floating-point values and use floating comparison. '''-command''' ''command'': Use command as a comparison command. To compare two elements, evaluate a Tcl script consisting of command with the two elements appended as additional arguments. The script should return an integer less than, equal to, or greater than zero if the first element is to be considered less than, equal to, or greater than the second, respectively. '''-increasing''': Sort the list in increasing order ("smallest" items first). This is the default. '''-decreasing''': Sort the list in decreasing order ("largest" items first). '''-index''' ''index'': If this option is specified, each of the elements of list must itself be a proper Tcl sublist. Instead of sorting based on whole sublists, lsort will extract the index'th element from each sublist and sort based on the given element. The keyword end is allowed for the index to sort on the last sublist element. For example, lsort -integer -index 1 {{First 24} {Second 18} {Third 30}} returns {Second 18} {First 24} {Third 30}. This option is much more efficient than using -command to achieve the same effect. (From: [TclHelp]) '''-unique''': If this option is specified, then only the last set of duplicate elements found in the list will be retained. Note that duplicates are determined relative to the comparison used in the sort. Thus if -index 0 is used, {1 a} and {1 b} would be considered duplicates and only the second element, {1 b}, would be retained. (From: [TclHelp]) ---- **lsort -dictionary warning** WARNING: ''lsort -dictionary'' can not be used for sensibly sorting signed integers. It returns first the negatives, then zero and the positives, but '''both in ascending absolute value order''' ([RS]): % lsort -dic {-5 -10 -1 0 3 1 2} -1 -5 -10 0 1 2 3 So for a numeric maximum function, use the numeric qualifiers -integer or -real (which does ''decimal'' integers as well, just might take some longer). Note that ''lsort -real'' cannot sort non-decimal (i.e. hex or octal) integers, and ''lsort -integer'' cannot sort any number containing a decimal point or exponent (which cannot, in any case, be mixed with a hex specifier.) ([willdye]) Perhaps this behavior comes from trying to accomodate strings which use dashes as a general separator. For example, here's a list of dates in the widely-used ISO8601 (YYYY-MM-DD) format: % lsort -dictionary {2005-01-01 2005-01-02 2005-01-03 2005-01-04} 2005-01-01 2005-01-02 2005-01-03 2005-01-04 (See [Parsing ISO8601 dates and times] if you want details about the format). I agree with [RS], however, that the list {-5 -10 -1 0 3 1 2} should ideally sort to {-10 -5 -1 0 1 2 3}. Fortunately, the behavior is easy to work around. ---- **lsort performance** ''KBK (14 February 2001)'' -- One thing to remember when using [[lsort]] is that the performance of [[lsort -command]] is pretty awful. If you can pre-convert your list into something that can be sorted with another form of [[lsort]], it's almost always faster to do so. Example: Over in [Things British], [RS] posted code to sort a list of surnames according to the rules for a British telephone directory. The code worked something like: proc comp.uk.phone {x y} { regsub {Ma?c ?([A-Z])} $x {M_\1} x regsub {Ma?c ?([A-Z])} $y {M_\1} y string compare $x $y } proc sort.uk.phone { list } { return [lsort -command comp.uk.phone $list] } It's a good bit faster to sort the list by making a list of sort key and value, sorting this list with [[lsort -index]], and then extracting just the values, thus: proc sort.uk.phone2 { list } { foreach name $list { regsub {Ma?c ?([A-Z])} $name {M_\1} key lappend list2 [list $key $name] } foreach pair [lsort -index 0 -ascii $list2] { lappend list3 [lindex $pair 1] } return $list3 } OK, let's run the two methods head to head: set list {MacDonald McArthur McEwan Lyttle Mabbs Jones} set l 6 puts " Length | Method 1 | Method 2 | Ratio (m1 / m2)" puts " -------+----------+----------+----------------" while { $l <= 25000 } { if { $l > 100 } { set count 1 } else { set count 100 } set m1 [lindex [time { sort.uk.phone $list } $count] 0] set m2 [lindex [time { sort.uk.phone2 $list } $count] 0] set ratio [expr { double( $m1 ) / double( $m2 ) }] puts [format " %6d | %8d | %8d | %9.2f" $l $m1 $m2 $ratio] incr l $l eval lappend list $list } The results are shown below: Length | Method 1 | Method 2 | Ratio (m1 / m2) -------+----------+----------+---------------- 6 | 1410 | 500 | 2.82 12 | 4200 | 1000 | 4.20 24 | 10820 | 1900 | 5.69 48 | 27140 | 3710 | 7.32 96 | 65890 | 7310 | 9.01 192 | 160000 | 10000 | 16.00 384 | 361000 | 30000 | 12.03 768 | 771000 | 60000 | 12.85 1536 | 1783000 | 120000 | 14.86 3072 | 3805000 | 251000 | 15.16 6144 | 8352000 | 571000 | 14.63 12288 | 18166000 | 1061000 | 17.12 24576 | 39077000 | 2153000 | 18.15 ''DKF'' - This technique is (I believe) known as creating a collation key. Also note that this comparison is not entirely fair, as it is performing additional '''regsub'''s in the -command case, though it is easy to see from a comparison of '''lsort $someList''' with '''lsort -command {string compare} $someList''' that it is still painful to use -command when you don't have to. - See [Custom sorting] on a wrapper for collation key sorting. ---- **Sorting on multiple indexes** How does one sort a multi-index? By that is meant something like, "sort by last name, then first name, then age". lsort doesn't offer direct support for multi-indexes. The straightforward approach, therefore, is to use -command. However, [Bruce Hartweg] wisely observes that, as lsort sorts ''stably'', an efficient multi-index sort is achieved by cascading lsort applications--least significant first, of course. See [Additional list functions] for ''multisort'' which packages such cascaded sorts into one call. In the same vein, [Andreas Kupries] wrote on 2001-03-21 in [the comp.lang.tcl newsgroup]: The lsort command uses a 'stable' mergesort. This means that it is possible to sort something after more than one key and the later sorts will not destroy the order of the previous sorts/keys. Quicksort for example is not stable (''DKF:'' it also has some rotten worst-case behaviour, unlike mergesort.) So, just sort after the secondary key first and then after the primary key. like so: set f {{12 11} {12 13} {12 12} {11 14} {13 12}} puts [lsort -index 1 $f] puts [lsort -index 0 [lsort -index 1 $f]] => {12 11} {12 12} {13 12} {12 13} {11 14} !! {11 14} {12 11} {12 12} {12 13} {13 12} ''DKF'' - If you are doing this on a lot of data, it is far more efficient to convert the keys into an additional collation key (as described above in relation to performance-measuring) and sort on that, as this adds two ''O(n)'' (i.e. linear) operations to generate and strip the collation keys and removes an ''O(n log n)'' sort. [KBK] - [Custom sorting] has a worked example of doing what [DKF] recommends above. ---- **lsort -command note** One thing '''lsort -command''' is good for is to introspect how lsort works. Consider this silly sort function: proc foo {a b} {puts [list $a $b]; return 0} which constantly says its inputs are equal, but logs them to stdout: lsort -command foo {1 2 3 4 5 6 7 8 9 10} 1 2 3 4 1 3 2 3 5 6 7 8 5 7 6 7 1 5 2 5 3 5 4 5 9 10 1 9 2 9 3 9 4 9 5 9 6 9 7 9 8 9 Makes it pretty clear how a merge sort works, doesn't it? [RS] ---- Why is -command so slow, anyway? -[FW] [NEM] 2009-03-22: I profiled this at one point. The overhead comes partly from calling a command: looking it up, checking for any command/execution [trace]s, etc. You can eliminate some of this, but not all. Another big killer is simply having to pack/unpack the comparison result (-1,0,1) into a Tcl_Obj. That puts some allocs/frees into the critical loop. Making this run fast would probably require some heavy-duty re-engineering of large parts of how Tcl deals with values and commands. ---- **lsort unicode** [LV] June 2, 2003 Does lsort support [l10n] considerations - such as sorting \u0063 equivalent to \u00e7 - at least in a French locale? - [RS]: No, sorry. One would have to introduce [custom sorting] explicitly depending on requirements (see e.g. [Things British] for British phone-book sorting). ---- [HolgerJ] 2003-10-13: I'm still amazed by the fact that lsort -dictionary does ''not'' sort by Unicode dictionary order. I know that the algorithm is quite complex and '''language dependent''' (one would have to read the environment to find out the language), but at least the most common things could be considered - like Umlauts and French accents. lsort -dictionary should be renamed lsort -ignorecase or the like. So, if you want to sort according to dictionary rules, you have to supply your own strcmp-like function or re-implement the whole lsort command. The latter might be faster, but not so convenient. ---- [RS] Is there a Unicode dictionary order, except for the numeric codes (which [lsort] does)? As far as I know, different languages have different collation orders, e.g. "Ä" might be sorted just as A, or as AE, in German, while in Swedish it sorts after Å... which sorts after Z... ---- [HolgerJ] 2003-10-19: No, see above. "Ä" does principally sort like "A" in German, but the trema (dots) are used as a tiebreaker. This is called secondary sorting order. For details see http://www.unicode.org/unicode/reports/tr10/ Quotation: 'This is worth emphasizing: the sorting weight of characters is not provided by their position in the Unicode code charts. ' Hopefully, this will be implemented in C the Tcl core. In Tcl it would bee too slow, I think. The following idea of Lars H. looks fine, but lacks support for the secondary (and tertiary) order. ---- [Lars H]: Whereas I agree the Unicode support in Tcl is incomplete, I don't agree on the particular example (dated 2003-10-13 by Holger J above). The Unicode Standard [http://www.unicode.org/versions/Unicode4.0.0] is littered with remarks that issues such as this should ''not'' be handled by Unicode as such, but by a higher level mechanism. One could have an option to [lsort] that specifies a command that should be applied to the sort keys before they are compared, or perhaps a [string map] style list to normalize the characters, but the Tcl core has no business implementing language-specific sorting as such. Also, -dictionary does far more than the old (Tcl7) -ignore option of [lsort] did (e.g., it places a2 before a10). Currently, this is what you should do to get language-specific sorting set temp_list [list] foreach item $list_to_sort { lappend temp_list [list $item [string map $collateList $item]] # $collateList is what handles language-dependence } set sorted_list [list] foreach item [lsort -dictionary -index 1 $temp_list] { lappend sorted_list [lindex $item 0] } $sorted_list is now sorted in the custom order. For Swedish, the collateList should probably be something on the lines of set collateList {å \x80 Å \x80 ä \x81 Ä \x81 ö \x82 Ö \x82 W V w v é e É E} I need to do something special for ÅÄÖåäö, because the native order in latin-1 (and hence in Unicode) is wrong for Swedish. What I miss most in Tcl's Unicode support is some way of normalizing strings (at least to some of the standard normalisation forms defined in the standard and its annexes). '''decomposed''' and '''precomposed''' subcommands of [string] appear to me as the most obvious way to do this. ---- [LV] If someone is looking for a project, research http://www.unicode.org/reports/tr10/tr10-10.html (or whatever variation is the latest of the document) - this is a technical note by the Unicode folks on the ''fun'' of collation of unicode strings. A Tcl extension providing an interface to the algorithm would be a wonderful contribution. ---- **lsort in-place reference** [AMG] If you want to modify a sorted list in-place and keep the result sorted, check out [Sorted Lists] for alternatives to repeatedly calling lsort. ---- **lsort merge** ''[KBK] in the [Tcl chatroom] on 2005-03-29:'' Since Tcl's merge exploits natural runs, you can also merge two lists without an extra [[merge]] command to do it: [lsort [concat $list1 $list2]] works in O(n) time if $list1 and $list2 are already sorted. ----- **multisort** [MG] July 29th 2005 - I have a list, where each element is itself a two-element list, and I want to sort based on ''both'' elements, one after the other. It seems that the -index option to lsort can only take a single index, though - I'd been thinking of trying something like this % lsort -index "1 0" [list "f 5" "a 5" "b 3"] {b 3} {a 5} {f 5} but it raises an error. Should I just use a custom proc and the -command option, or is there a better way that I'm not aware of? Thanks in advance for your help :) [Peter Newman] 29 July 2005: Search for '''multisort''' above in this page and in [Additional list functions]. ---- Note that before Tcl 8.5, some argument combinations were not correct, but were not reported as an error. For example ====== lsort -integer numlist ====== which is a mistake in a couple of ways, generates an error now where in the past the fact that no integer arguments were provided was not raised as an error. ---- [AMG]: Here's how to sort a list by ascending [[[string length]]]. It demonstrates the use of [[[apply]]] for [lambda]s. ====== set data {this is tcl} lsort -command {apply {{a b} {expr {[string length $a] - [string length $b]}}}} $data # returns "is tcl this" ====== ---- See also [list], [lappend], [lindex], [linsert], [llength], [lrange], [lreplace], [lsearch], [Topological sort], [How fast can we sort 5 elements] ---- [sort index values] ---- !!!!!! [Tcl syntax help] - [Arts and crafts of Tcl-Tk programming] %| [Category Command] |% !!!!!!