Version 5 of unsort

Updated 2005-03-16 14:12:19 by lwv

AMG: unsort is a simple UNIXy stdin->stdout pipeline tool for randomizing line order. It's simple, works quite nicely, and is surprisingly useful.

Without further ado, I present to you the source:

 #!/bin/sh
 # \
 exec tclsh "$0" "$@"

 proc main {} {
     set lines [lrange [split [read stdin] \n] 0 end-1]
     set count [llength $lines]

     for {set idx_1 0} {$idx_1 < $count} {incr idx_1} {
         set idx_2 [expr {int($count * rand())}]
         set temp [lindex $lines $idx_1]
         lset lines $idx_1 [lindex $lines $idx_2]
         lset lines $idx_2 $temp
     }

     puts [join $lines \n]
 }

 main

 # vim: set ts=4 sts=4 sw=4 tw=80 et ft=tcl:

Even though the above really is good enough, I'm still interested in seeing performance improvements. Any ideas?

Just in case you're into ancient history, here's the slow, old, real-sorting version of main I wrote a few hours before the shiny new swappy version:

 proc main {} {
     set lines [list]
     foreach line [lrange [split [read stdin] \n] 0 end-1] {
          lappend lines [list [expr {rand()}] $line]
     }

     foreach line [lsort -real -index 0 $lines] {
         puts [lindex $line 1]
     }
 }

Lars H: This is probably nit-picking, but I'm a bit concerned by the way you permute the lines. You $count times pick one-out-of $count, so you are effectively choosing one out of $count**$count, which is not a multiple of the number of permutations of $count objects. Hence the unsorting isn't perfectly uniform.

If you do instead

     for {} {$count>1} {incr count -1} {
         set idx_1 [expr {$count-1}]
         set idx_2 [expr {int($count * rand())}]
         set temp [lindex $lines $idx_1]
         lset lines $idx_1 [lindex $lines $idx_2]
         lset lines $idx_2 $temp
     }

then (for a perfectly random rand (ha!)) the routine will pick any permutation with equal probability.


Let's give it a try, shall we?

 [andy@blender|~/devel/misc]$ unsort <<EOF
 > alpha
 > bravo
 > charlie
 > delta
 > EOF
 charlie
 bravo
 delta
 alpha

I usually use unsort for creating a randomized playlist:

 [andy@blender|~]$ mplayer $(find ~/media/music -type f | unsort)

Works great! I'm using it right now, as a matter of fact.


See also Shuffling a list


Category Application Category Word and Text Processing

Under which category or categories should this page be filed?

Ideas: [ [Category Sorting] | [Category UNIX] | [Category Command Line Tools] | [Category Text Processing] | [Category Entropy Source] :^) ]

LV Hmm - a sorting category might be useful to have. Perhaps a filter category as well.