AMG: unsort is a simple UNIXy stdin->stdout pipeline tool for randomizing line order. It's simple, works quite nicely, and is surprisingly useful.
Without further ado, I present to you the source:
#!/bin/sh # \ exec tclsh "$0" "$@" proc main {} { set lines [lrange [split [read stdin] \n] 0 end-1] set count [llength $lines] for {set idx_1 0} {$idx_1 < $count} {incr idx_1} { set idx_2 [expr {int($count * rand())}] set temp [lindex $lines $idx_1] lset lines $idx_1 [lindex $lines $idx_2] lset lines $idx_2 $temp } puts [join $lines \n] } main # vim: set ts=4 sts=4 sw=4 tw=80 et ft=tcl:
Even though the above really is good enough, I'm still interested in seeing performance improvements. Any ideas?
Just in case you're into ancient history, here's the slow, old, real-sorting version of main I wrote a few hours before the shiny new swappy version:
proc main {} { set lines [list] foreach line [lrange [split [read stdin] \n] 0 end-1] { lappend lines [list [expr {rand()}] $line] } foreach line [lsort -real -index 0 $lines] { puts [lindex $line 1] } }
Lars H: This is probably nit-picking, but I'm a bit concerned by the way you permute the lines. You $count times pick one-out-of $count, so you are effectively choosing one out of $count**$count, which is not a multiple of the number of permutations of $count objects. Hence the unsorting isn't perfectly uniform.
If you do instead
for {} {$count>1} {incr count -1} { set idx_1 [expr {$count-1}] set idx_2 [expr {int($count * rand())}] set temp [lindex $lines $idx_1] lset lines $idx_1 [lindex $lines $idx_2] lset lines $idx_2 $temp }
then (for a perfectly random rand (ha!)) the routine will pick any permutation with equal probability.
Let's give it a try, shall we?
[andy@blender|~/devel/misc]$ unsort <<EOF > alpha > bravo > charlie > delta > EOF charlie bravo delta alpha
I usually use unsort for creating a randomized playlist:
[andy@blender|~]$ mplayer $(find ~/media/music -type f | unsort)
Works great! I'm using it right now, as a matter of fact.
See also Shuffling a list
Under which category or categories should this page be filed?
Ideas: [ [Category Sorting] | [Category UNIX] | [Category Command Line Tools] | [Category Text Processing] | [Category Entropy Source] :^) ]