Version 2 of Shuffle a list

Bob Techentin posted to comp.lang.tcl, in response to someone who wanted to get a list of numbers in a random order, the following:

To get a unique list of numbers (say, 1-1000) in a random order, you can first create an ordered list, then randomize the list by walking through once and performing a random "swap". Something like this:

  for {set i 0} {$i<1000} {incr i} {lappend nums $i}
  for {set i 0} {$i<1000} {incr i} {
    set j [expr {int(rand()*1000)}]
    set temp [lindex $nums $j]
    set nums [lreplace $nums $j $j [lindex $nums $i]]
    set nums [lreplace $nums $i $i $temp]
  }

This got me thinking about a variety of methods, and performance for each.

The cool thing about Bentley 's method is that it works in linear time; it's essentially selection sort where a random oracle determines the lement to be selected. Alas, lreplace isn't the greatest performer on the planet. The problem is that it copies the list every time, meaning that Bob's code degenerates into quadratic behavior, as I discovered when I timed it.

I then hacked up the following stuff to experiment with the behavior:

An afterthought: lreplace isn't the problem. The problem is a bizarre interaction with lreplace and the lifetime of a Tcl_Obj. Following up on an idea of Donal Fellows, I hacked up the shuffle1a procedure below to make sure that I was updating the list in place, and the performance improved incredibly!

Another afterthought: Bob Techentin posted to comp.lang.tcl the shuffle3 procedure with the message:

I've tried timing this on my WinNT machine, and it runs 3x slower than shuffle1a, which I don't completely understand. It should just be moving the elements from one list to another, but something else must be going on.

My (KBK) explanation is that lreplace has a special case when you're replacing elements of the list with the same number of elements. In this case, it is able to avoid copying. Otherwise, it winds up needing to slide the elements down, leading to quadratic behavior once again.

One more time... Steve Cohen posted:

This is a close approximation of/exactly the code that Kevin ran in his test, ins't it? I wonder if we could convince him to run *just one more routine* through his gammut. Here it is: <snip>

Note that the only big modification is to simply swap the list element selected from the first list with the nth element from the original list. By not modifying the size of the list, Tcl seems to leave it alone (in memory). On my (really slow P133 laptop) machine, this runs linearly with list size. I would be interested to see what Kevin gets with his big-hootie computer.

KBK: Congratulations, Steve, shuffle4 is the best so far, for large lists. Numbers below.

Christoph Bauer adds a significant improvement for short lists, in the shuffle5a procedure below. Note that it, too requires the K combinator for adequate performance. Like shuffle3, it begins to show the strain of quadratic behavior once the list has many thousands of elements.

MS: for slightly faster performance (5-10% on Tcl8.4a3, as of 2001/05/23), and even uglier appearance, replace the K combinator with the equivalent idiom:

    [K x y]   <=>  [lindex [list x y] 0]

Procedure to generate a list of n numbers:

  proc iota { n } {
      for { set i 0 } { $i < $n } { incr i } {
          lappend retval $i
      }
      return $retval
  }

Several of the procedures rely on Donal Fellows's K combinator:

  proc K { x y } { set x }

shuffle0 is the obvious method of generating random keys, then sorting the list according to those keys. TFW 4/18/01 This was broken due to inital newlist 0 (should be empty). Also, if you insert the set retval [list] you get a non trivial speedup. Moral, don't let lappend initialize your lists for you.

  proc shuffle0 { list } {
      set newlist [list]
      foreach element $list {
          lappend newlist [list [expr { rand() }] $element]
      }
      set retval [list]
      foreach pair [lsort -real -index 0 $newlist] {
          foreach { random item } $pair {
          lappend retval $item
          }
      }
      return $retval
  }

shuffle1 is Techentin's implementation of Bentley's method.

   proc shuffle1 { list } {
      set n [llength $list]
      for { set i 0 } { $i < $n } { incr i } {
          set j [expr {int(rand()*$n)}]
          set temp [lindex $list $j]
          set list [lreplace $list $j $j [lindex $list $i]]
          set list [lreplace $list $i $i $temp]
      }
      return $list
  }

shuffle1a is Techentin's code, with a clever hack (due to Donal Fellows) for managing the lifetime of the Tcl_Obj that represents the list so that it doesn't get copied needlessly.

  proc shuffle1a { list } {
      set n [llength $list]
      for { set i 0 } { $i < $n } { incr i } {
          set j [expr {int(rand()*$n)}]
          set temp1 [lindex $list $j]
          set temp2 [lindex $list $i]
          set list [lreplace [K $list [set list {}]] $j $j $temp2]
          set list [lreplace [K $list [set list {}]] $i $i $temp1]
      }
      return $list
  }

shuffle2 implements Bentley's method, unpacking the list to an array first.

  proc shuffle2 { list } {
      set n 0
      foreach element $list {
          set data($n) $element
          incr n
      }
      for { set i 0 } { $i < $n } { incr i } {
          set j [expr { int( rand() * $n ) }]
          set temp $data($j)
          set data($j) $data($i)
          set data($i) $temp
      }
      for { set i 0 } { $i < $n } { incr i } {
          lappend retval $data($i)
      }
      return $retval
  }

shuffle3 is Bob Techentin's implementation of Stephen D. Cohen's proposed method.

  proc shuffle3 { list } {
      set n [llength $list]
      while {$n>0} {
          set j [expr {int(rand()*$n)}]
          lappend slist [lindex $list $j]
          set list [lreplace [K $list [set list {}]] $j $j]
          incr n -1
      }
      return $slist
  }

shuffle4 is Steve Cohen's improved implementation:

  proc shuffle4 { list } {
      set n [llength $list]
      while {$n>0} {
          set j [expr {int(rand()*$n)}]
          lappend slist [lindex $list $j]
          incr n -1
          set temp [lindex $list $n]
          set list [lreplace [K $list [set list {}]] $j $j $temp]
      }
      return $slist
  }

shuffle5 and shuffle5a are from Christoph Bauer. They differ only in the use of the K combinator.

 proc shuffle5 { list } {
    set n 1
    set slist {}
    foreach item $list {
        set index [expr {int(rand()*$n)}]
        set slist [linsert $slist $index $item]
        incr n
    }
    return $slist
 }


 proc shuffle5a { list } {
    set n 1
    set slist {}
    foreach item $list {
        set index [expr {int(rand()*$n)}]
        set slist [linsert [K $slist [set slist {}]] $index $item]
        incr n
    }
    return $slist
 }

The test harness times the various methods and prints the results.

  puts "                Times in usec for shuffle methods"
  puts "    Method                    List length"
  puts "                    1      10     100    1000   10000"
  puts "    -------------------------------------------------"
  foreach method { 
      shuffle0 shuffle1 shuffle1a shuffle2 shuffle3 shuffle4
      shuffle5 shuffle5a
  } {
     set line "    "
     append line $method
     foreach n { 1 10 100 1000 10000 } {
         set list [iota $n]
         if { [string equal $method shuffle1] && $n > 1000 } {
             append line "  ------"
         } else {
             set t [lindex [time { $method $list } [expr 100000/$n]] 0]
             append line [format "%8d" $t]
         }
     }
     puts $line
  }

The results are summarized in the following table (8.3.2, WinNT 4.0 SP6, 550 MHz PIII-SpeedStep):

                Times in usec for shuffle methods
    Method                    List length
                   1      10     100    1000   10000
    -------------------------------------------------
    shuffle0      78     253    2053   21730  287400
    shuffle1      37     212    3675  215110  ------
    shuffle1a     47     287    2754   28140  286400
    shuffle2      58     271    2513   26240  301400
    shuffle3      47     252    2363   29140  890200
    shuffle4      50     274    2504   25240  256300
    shuffle5      28     104    1342   68800  ------
    shuffle5a     33     153    1382   18830  785100

If anyone wants to see a graph of the result, you can get one by running the code at Shuffle a list: graph results.

(All times were measured using Tcl 8.3.2 on a 550 MHz P-III SpeedStep laptop, with a modified tclWinTime.c that uses the Windows performance counter as a resolution multiplier. I'll be posting a patch for that shortly.)

Summary:

For short lists (up to a hundred elements or so is "short"), Bauer's shuffle5 procedure is a clear win, because it exploits the speed of foreach. (This tradeoff may change, given that the maintainer is contemplating bytecoding other list operations in version 8.4.)
For lists longer than a few hundred elements, any implementation that does list surgery without the K combinator runs into extremely expensive quadratic behavior. What happens is that linsert or lreplace discovers that it has a Tcl_Obj with a reference count greater than 1 and is forced to copy it. For medium-sized lists, say 100-1000 elements, shuffle5a, which is shuffle5 with the K combinator added, is a clear winner.
Similarly, shuffle3 causes lreplace to copy, on average, half the list. This copying leads to quadratic behavior as well, although it's not nearly as bad as 'shuffle1'. The shuffle5 procedure runs into the same behavior in linsert.
The shuffle4 procedure emerges a winner for the longest lists,

although shuffle0, shuffle1a, and shuffle2 are all extremely close. The lesson here is that it's important to structure the code so that the list never gets resized -- otherwise, there's a lot of memory copying involved. Some of the bytecode changes between 8.3.2 and 8.4 appear to favor shuffle1a instead.

lsort is surprisingly inexpensive. All of these procedures seem to be dominated by the linear overheads.

Kevin Kenny

Jeffrey Hobbs: I have added this to the tclbench suite, but I find after quite a bit of testing, that shuffle1a consistently performas as well as or better than shuffle4, with shuffle5a being a good general solution for lists up to 1000 or so items. My last timings with 8.3.3:

Win2K/P3-800:

  Tcl8.3.3      Times in usec for shuffle methods
  Method                    List length
                  1     10      100     1000    10000
 --------------------------------------------------
 shuffle0        50     155    1315   13920  214300
 shuffle1-s      24     125    2191  130895  ------
 shuffle1a       30     170    1578   15725  162750
 shuffle2        36     176    1515   15870  200800
 shuffle3        30     150    1378   16825  535800
 shuffle4        32     165    1478   14675  151700
 shuffle5-s      18      66     839   38505  ------
 shuffle5a       22      96     864   12015  478200

Linux/P3-???:

  Tcl8.3.3      Times in usec for shuffle methods
  Method                    List length
                  1     10      100     1000    10000
 --------------------------------------------------
 shuffle0       143     405    3089   32796  396658
 shuffle1-s      64     296    3579  398035  ------
 shuffle1a       79     407    3653   36153  363324
 shuffle2       104     431    3598   36537  402357
 shuffle3        87     406    3470   37679  778334
 shuffle4        92     436    3734   36761  368350
 shuffle5-s      47     182    1854   89727  ------
 shuffle5a       54     239    2033   22177  616730

Janos Holanyi

 Linux 2.4.14-686 / P3-667
 (almost no processes*, single user):

    Tcl 8.3.3     Times in usec for shuffle methods
    Method                    List length
                    1      10     100    1000   10000
    -------------------------------------------------
    shuffle0      34     152    1382   14706  205798
    shuffle1      25     151    1889   82133  ------
    shuffle1a     30     198    1854   18546  189828
    shuffle2      37     182    1632   16255  195863
    shuffle3      32     173    1592   16757  280023
    shuffle4      34     186    1700   17024  175342
    shuffle5      17      73     759   25910 3702327
    shuffle5a     20      92     786    8732  194077

 *'almost no processes' means:

  PID TTY      STAT   TIME COMMAND
    1 ?        S      0:00 init [S]
    2 ?        SW     0:00 [keventd]
    3 ?        SWN    0:00 [ksoftirqd_CPU0]
    4 ?        SW     0:00 [kswapd]
    5 ?        SW     0:00 [bdflush]
    6 ?        SW     0:00 [kupdated]
  161 tty1     S      0:00 init [S]
  162 tty1     S      0:00 bash

 # cat /proc/cpuinfo
 processor       : 0
 vendor_id       : GenuineIntel
 cpu family      : 6
 model           : 8
 model name      : Pentium III (Coppermine)
 stepping        : 3
 cpu MHz         : 666.545
 cache size      : 256 KB
 fdiv_bug        : no
 hlt_bug         : no
 f00f_bug        : no
 coma_bug        : no
 fpu             : yes
 fpu_exception   : yes
 cpuid level     : 2
 wp              : yes
 flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge                     mca cmov pat pse36 mmx fxsr sse
 bogomips        : 1330.38