Version 2 of Shuffling a list

Updated 2005-03-16 08:56:49

AMG: I guess RS wanted this page, so here goes... :^)

This page discusses shuffling algorithms and implementations thereof.


Version 1: sorting random reals

Here's a proc, adapted from unsort:

 proc shuffle {data} {
     set rand_data [list]
     foreach elem $data {
          lappend rand_data [list [expr {rand()}] $elem]
     }

     set result [list]
     foreach elem [lsort -real -index 0 $rand_data] {
         lappend result [lindex $elem 1]
     }

     return $result
 }

It works, but it's seems suboptimal.


Version 2: sorting random integers

For starters, the real sort bothers me. I imagine sorting by integer would be more efficient, but then again producing random integers is (currently) slower than producing random reals. Change rand() for int(0x7fffffff * rand()) and -real for -integer and it winds up taking longer. See the benchmark below. (By the way, how am I supposed to portably get the maximum and minimum integer?)

Next, having up to three copies of the list (one of which is augmented) appears to be a waste of RAM.


Version 3: in-place swaps

So would it be better to do an in-place shuffle? For each element of $data it could swap with another, randomly-selected element of $data. I'll take a shot at it:

 proc shuffle {data} {
     set length [llength $data]
     for {set idx_1 0} {$idx_1 < $length} {incr idx_1} {
         set idx_2 [expr {int($length * rand())}]
         set temp [lindex $data $idx_1]
         lset data $idx_1 [lindex $data $idx_2]
         lset data $idx_2 $temp
     }
     return $data
 }

Big improvement.


Benchmarking 1, 2, and 3

Here's my magical test data:

 set data [lrepeat 100000 x y z z y]

If you're trying this at home from your interactive tclsh, you might want to type "; puts -nonewline {}" after the end of the above line or else its printed result will scroll for a very long time.

Now I time the shuffles:

 ..................................................
 :                    :   400 MHz   :  1500 MHz   :
 :....................:.............:.............:
 : 1, random reals    : 21.847024 s :  9.541117 s :
 : 2, random integers : 24.649960 s :  9.857447 s :
 : 3, in-place swaps  :  7.650484 s :  2.328508 s :
 :....................:.............:.............:

Wow, I think I will go with #3! Hold on while I update unsort... :^)


More?

I just thought of another algorithm. Basically I randomly select an element from the input list, remove it from the input list, and place it at the end of the output list. The reason I thought of it is, for unsort it's not necessary to maintain an output list; simply puts the elements as they come.

But this algorithm involves a lot of copying--- lset is nice for in-place modifies, but there's no in-place element removal. So it's liable to be slower than #3.


[ [Category Something] - please classify ]