Bob Techentin posted to comp.lang.tcl, in response to someone who wanted to get a list of numbers in a random order, the following:
To get a unique list of numbers (say, 1-1000) in a random order, you can first create an ordered list, then randomize the list by walking through once and performing a random "swap". Something like this:
for {set i 0} {$i < 1000} {incr i} {lappend nums $i} for {set i 0} {$i < 1000} {incr i} { set j [expr {int(rand() * 1000)}] set temp [lindex $nums $j] set nums [lreplace $nums $j $j [lindex $nums $i]] set nums [lreplace $nums $i $i $temp] }
KPV: I'd like to emphasize here a point that JE makes below which is that most of the algorithms here are biased. Specifically, not all permutations are equally likely. Only shuffle3, shuffle4, shuffle5a, and shuffle10 are unbiased.
Here's a quick proof that the algorithm above is biased: it iterates over N items each time selecting from N possible random numbers for a total of N**N possible random sequences. Each random sequence generates a permutation, but since N! is less than N**N, some permutations must be generated more than once. Furthermore, N! does not divide N**N, so some permutations must be generated more often than others.
The folklore unbiased algorithm is to iterate over the list and swapping the ith entry with a random entry between i and the end of the list. You can easily show that this produces exactly N! equally likely possible results.
JRW: Not really folklore -- it's in Knuth vol 2, and occurs as an exercise in Cormen-Leiserson-Rivest. Also see willdye's references below.
willdye: If you're here merely to find a fast, pre-written shuffle algorithm, be advised, dear coder, that shuffling is like sorting -- there is no single "correct" answer, and it's very easy to find answers which look to be correct but in fact have subtle (and occasionally serious) problems. One of my favorite discussions of the matter is at: [L1 ] and: [L2 ]. You'll probably be OK if you use an established algorithm such as Knuth-Fisher-Yates (shown in links above), AND take care to initialize your random number generator properly. As the long discussion on the rest of this page indicates, however, the subject simply does not have a single correct answer which applies to everyone in every situation.
---
JRW: I disagree with willdye. If you want an algorithm that will produce all possible permutations of the input list with equal probability in linear time, then there's only one alternative: the nice, neat implementation of Knuth-Fisher-Yates given in shuffle10a.
The other algorithms on this page are, for the most part, variations on the following strategy: assign each input element a random number in the range 1 to N, and then sort. Naive implementations use lsort (n log n comparisons); smarter ones use bucket sort (linear time). This strategy does NOT produce a uniform distribution. In fact, it's quite likely two or more list elements will be assigned the same random number, in which case they stay in input order. There's a few example implementations that may be faster than shuffle10a, but they have their own problems. Most everything else is pointless -- they don't give you a uniform distribution of permutations, they don't run any faster than shuffle10a, and they're no easier to code.
Off topic to the rest of the page - but an alternative to get the original request (list of numbers in random order) you could use ::RandomOrg::getSequence 1 1000 - see True Random Numbers for details
This got me thinking about a variety of methods, and performance for each.
The cool thing about Bentley 's method is that it works in linear time; it's essentially selection sort where a random oracle determines the element to be selected. Alas, lreplace isn't the greatest performer on the planet. The problem is that it copies the list every time, meaning that Bob's code degenerates into quadratic behavior, as I discovered when I timed it.
I then hacked up the following stuff to experiment with the behavior:
An afterthought: lreplace isn't the problem. The problem is a bizarre interaction with lreplace and the lifetime of a Tcl_Obj. Following up on an idea of Donal Fellows, I hacked up the shuffle1a procedure below to make sure that I was updating the list in place, and the performance improved incredibly!
Another afterthought: Bob Techentin posted to comp.lang.tcl the shuffle3 procedure with the message:
I've tried timing this on my WinNT machine, and it runs 3x slower than shuffle1a, which I don't completely understand. It should just be moving the elements from one list to another, but something else must be going on.
KBK: My explanation is that lreplace has a special case when you're replacing elements of the list with the same number of elements. In this case, it is able to avoid copying. Otherwise, it winds up needing to slide the elements down, leading to quadratic behavior once again.
One more time... Steve Cohen posted:
This is a close approximation of/exactly the code that Kevin ran in his test, isn't it? I wonder if we could convince him to run *just one more routine* through his gamut. Here it is: <snip>
Note that the only big modification is to simply swap the list element selected from the first list with the nth element from the original list. By not modifying the size of the list, Tcl seems to leave it alone (in memory). On my (really slow P133 laptop) machine, this runs linearly with list size. I would be interested to see what Kevin gets with his big-hootie computer.
KBK: Congratulations, Steve, shuffle4 is the best so far, for large lists. Numbers below.
Christoph Bauer adds a significant improvement for short lists, in the shuffle5a procedure below. Note that it, too requires the K combinator for adequate performance. Like shuffle3, it begins to show the strain of quadratic behavior once the list has many thousands of elements.
MS: for slightly faster performance (5-10% on Tcl8.4a3, as of 2001/05/23), and even uglier appearance, replace the K combinator with the equivalent idiom:
[K x y] <=> [lindex [list x y] 0]
Shuffle a list correctly: JE Note that of the procedures below, only shuffle3, shuffle4, and shuffle5(a) return a random permutation with uniform distribution (that is, each permutation is equally likely). With shuffle1(a), shuffle2, and shuffle6 some orders are more likely than others.
Also be aware that if you're using any of these to simulate shuffling a deck of cards, you should use a different random number generator. expr rand() uses a 32-bit seed, so the routines below return one of at most 2^32 different permutations; on the other hand there are 52! ways to shuffle a deck of cards (which is a lot bigger than 2^32).
proc iota n { for {set i 0} {$i < $n} {incr i} { lappend retval $i } return $retval }
proc K {x y} {set x}
proc shuffle0 list { set newlist [list] foreach element $list { lappend newlist [list [expr {rand()}] $element] } set retval [list] foreach pair [lsort -real -index 0 $newlist] { foreach {random item} $pair { lappend retval $item } } return $retval }
proc shuffle1 list { set n [llength $list] for {set i 0} {$i < $n} {incr i} { set j [expr {int(rand() * $n)}] set temp [lindex $list $j] set list [lreplace $list $j $j [lindex $list $i]] set list [lreplace $list $i $i $temp] } return $list }
proc shuffle1a list { set n [llength $list] for {set i 0} {$i < $n} {incr i} { set j [expr {int(rand()*$n)}] set temp1 [lindex $list $j] set temp2 [lindex $list $i] set list [lreplace [K $list [set list {}]] $j $j $temp2] set list [lreplace [K $list [set list {}]] $i $i $temp1] } return $list }
proc shuffle2 list { set n 0 foreach element $list { set data($n) $element incr n } for {set i 0} {$i < $n} {incr i} { set j [expr {int(rand() * $n)}] set temp $data($j) set data($j) $data($i) set data($i) $temp } for {set i 0} {$i < $n} {incr i} { lappend retval $data($i) } return $retval }
proc shuffle3 list { set n [llength $list] while {$n>0} { set j [expr {int(rand() * $n)}] lappend slist [lindex $list $j] set list [lreplace [K $list [set list {}]] $j $j] incr n -1 } return $slist }
proc shuffle4 list { set n [llength $list] while {$n > 0} { set j [expr {int(rand() * $n)}] lappend slist [lindex $list $j] incr n -1 set temp [lindex $list $n] set list [lreplace [K $list [set list {}]] $j $j $temp] } return $slist }
proc shuffle5 list { set n 1 set slist {} foreach item $list { set index [expr {int(rand() * $n)}] set slist [linsert $slist $index $item] incr n } return $slist }
proc shuffle5a list { set n 1 set slist {} foreach item $list { set index [expr {int(rand() * $n)}] set slist [linsert [K $slist [set slist {}]] $index $item] incr n } return $slist }
proc shuffle6 list { set n [llength $list] for {set i 1} {$i < $n} {incr i} { set j [expr {int(rand() * $n)}] set temp [lindex $list $i] lset list $i [lindex $list $j] lset list $j $temp } return $list }
puts " Times in usec for shuffle methods" puts " Method List length" puts " 1 10 100 1000 10000" puts " -------------------------------------------------" set methods [lsort -dictionary [info procs shuffle*]] foreach method $methods { puts -nonewline { } puts -nonewline [format %-9s $method] foreach n {1 10 100 1000 10000} { set list [iota $n] if { $method in {shuffle1 shuffle5} && $n > 1000 } { puts -nonewline { ------} } else { set t [lindex [time {$method $list} [expr {100000/$n}]] 0] puts -nonewline [format %8d $t] } } puts {} }
The results are summarized in the following table (8.4a4, WinNT 4.0 SP6, 550 MHz PIII-SpeedStep):
Times in usec for shuffle methods Method List length 1 10 100 1000 10000 ------------------------------------------------- shuffle0 33 118 997 11045 183446 shuffle1 28 169 2055 105000 ------ shuffle1a 33 187 1717 17243 176520 shuffle2 39 203 1852 19034 229225 shuffle3 28 128 1097 12195 267413 shuffle4 30 138 1182 11808 121336 shuffle5 20 95 1023 35988 ------ shuffle5a 24 108 908 10266 250796 shuffle6 8 74 633 6409 66740
More results (8.4.4 Win2k SP3)
Times in usec for shuffle methods Method List length 1 10 100 1000 10000 -------------------------------------------------- shuffle0 31 99 834 10114 168250 shuffle1 28 181 2733 188349 ------ shuffle1a 49 180 1659 16585 169083 shuffle2 32 154 1430 15825 200545 shuffle3(*) 59 131 1170 19785 798204 shuffle4(*) 29 139 1211 11937 122958 shuffle5 19 122 1261 46737 ------ shuffle5a(*) 23 104 917 12966 542084 shuffle6 9 67 609 4958 52110 shuffle7 23 56 291 2588 28875 shuffle9 22 59 431 5566 99201 shuffle10(*) 10 73 629 6190 64879 (*) unbiased shuffling
If anyone wants to see a graph of the result, you can get one by running the code at Shuffle a list: graph results.
Summary:
Among the other methods:
although shuffle0, shuffle1a, and shuffle2 are all extremely close. The lesson here is that it's important to structure the code so that the list never gets resized -- otherwise, there's a lot of memory copying involved. Some of the bytecode changes between 8.3.2 and 8.4 appear to favor shuffle1a instead.
Jeffrey Hobbs: I have added this to the tclbench suite, but I find after quite a bit of testing, that shuffle1a consistently performs as well as or better than shuffle4, with shuffle5a being a good general solution for lists up to 1000 or so items. My last timings with 8.3.3:
Win2K/P3-800:
Tcl8.3.3 Times in usec for shuffle methods Method List length 1 10 100 1000 10000 -------------------------------------------------- shuffle0 50 155 1315 13920 214300 shuffle1-s 24 125 2191 130895 ------ shuffle1a 30 170 1578 15725 162750 shuffle2 36 176 1515 15870 200800 shuffle3 30 150 1378 16825 535800 shuffle4 32 165 1478 14675 151700 shuffle5-s 18 66 839 38505 ------ shuffle5a 22 96 864 12015 478200
Linux/P3-???:
Tcl8.3.3 Times in usec for shuffle methods Method List length 1 10 100 1000 10000 -------------------------------------------------- shuffle0 143 405 3089 32796 396658 shuffle1-s 64 296 3579 398035 ------ shuffle1a 79 407 3653 36153 363324 shuffle2 104 431 3598 36537 402357 shuffle3 87 406 3470 37679 778334 shuffle4 92 436 3734 36761 368350 shuffle5-s 47 182 1854 89727 ------ shuffle5a 54 239 2033 22177 616730
Csan 2002-11-12: I've made a comparison between Tcl 8.3.3 and Tcl 8.4.0 on the same hardware. Hardware specs:
/proc/cpuinfo: vendor_id : GenuineIntel cpu family : 6 model : 8 model name : Pentium III (Coppermine) stepping : 3 cpu MHz : 666.545 cache size : 256 KB fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips : 1330.38
OS specs:
Debian GNU/Linux sid, using linux kernel 2.4.19-686 / P3-667 Times in usec for shuffle methods (A - Tcl 8.3.3 B - Tcl 8.4.0 C - B improvement over A, if A is 100%) List length | 1 | 10 | 100 | 1000 | 10000 | -----------+----+----+-----+-----+-----+-----+------+------+-----+-------+-------+-----+---------+--------+-----+ Method | A B C | A B C | A B C | A B C | A B C | -----------+----+----+-----+-----+-----+-----+------+------+-----+-------+-------+-----+---------+--------+-----+ shuffle0 | 34 21 -39% | 157 76 -52% | 1439 703 -52% | 15774 8454 -47% | 215209 147609 -32% | shuffle1 | 27 17 -38% | 144 102 -30% | 2013 1491 -26% | 87920 82390 -7% | ------ ------ ---- | shuffle1a | 32 22 -32% | 173 150 -14% | 1588 1413 -12% | 15662 14168 -10% | 158882 142911 -11% | shuffle2 | 40 22 -45% | 184 102 -45% | 1627 908 -45% | 16590 9488 -43% | 199949 126498 -37% | shuffle3 | 33 19 -43% | 160 106 -34% | 1412 967 -32% | 15059 10516 -31% | 273615 226784 -18% | shuffle4 | 35 20 -43% | 174 112 -36% | 1547 1012 -35% | 15137 9952 -35% | 154841 102327 -34% | shuffle5 | 20 12 -40% | 85 59 -31% | 953 710 -26% | 30021 28559 -5% | 4957511 4243210 -15% | shuffle5a | 23 15 -35% | 103 82 -21% | 897 745 -17% | 9694 8335 -15% | 215593 202902 -6% | shuffle6 | -- 6 ---- | --- 47 ---- | ---- 418 ---- | ----- 4134 ---- | ------- 42905 ---- | shuffle7 | 24 16 -33% | 78 40 -49% | 537 231 -57% | 4912 2027 -59% | 51494 22621 -56% |
(See shuffle7 procedure below) My tests show a rough 33% improvement of Tcl 8.4.0 over Tcl 8.3.3, not counting in the overall and stunning winner shuffle6!
DKF - Note for future work: It should be possible to produce a much more efficient version of the list shuffling with the new [lset] command in 8.4, but it is too late right now for me to actually write it (especially as I've not got a compiled version of 8.4 to hand right now anyway.)
KBK Done. Can you update the picture?
Csan Updated mine. ;)
I quote DKF: "A good way that is easy to understand is to convert the list into a list of pairs by pairing (converting to a two-element list) each element with a random number from [expr rand()]. Then use [lsort -index] to sort on those random numbers and remember to ditch them as you extract..."
RS wonders whether in shuffle6, testing for equality of i and j would scrape out some microseconds (swapping the same element would be wasted time, even with lset):
proc shuffle6a list { set n [llength $list] for {set i 1} {$i < $n} {incr i} { set j [expr {int(rand() * $n)}] if {$i != $j} { set temp [lindex $list $i] lset list $i [lindex $list $j] lset list $j $temp } } return $list }
suchenwi Jó napot, János! Csan I'm benchmarking your new shuffle6a... suchenwi Is it faster? Csan nope Csan shuffle6 6 47 417 4111 43409 shuffle6a 6 50 462 4602 47802
suchenwi: Ah. So the time the if eats on every turn is more that how much is won in case of i==j. This speaks for the efficiency of lindex and lset... Well, if the probability of i==j is 1/N (length of list), it can be expected to occur once in the iteration. OK, so withdraw that variant...
Csan 2002-11-13: I'd like to request for comments on my shuffle7 procedure below... it is the best so far (updated comparison charts). It is for certain that the resulting list shuffled. Thanks to RS and dkf, a few optimizations to the proc give further dramatical speed-ups!:
`{rand() < .5} instead of `{[expr rand() > .5}` - the {} argument is passed to [expr] anyway... [concat $l1 $l2 $l3 $l4] instead of "$l1 $l2 $l3 $l4" - to avoid list->string conversion
The magnitude of the improvement is:
shuffle7 | 25 16 -36% | 83 44 -47% | 583 286 -51% | 5415 2594 -52% | 58367 29838 -49% | shuffle7 | 24 16 -33% | 78 40 -49% | 537 231 -57% | 4912 2027 -59% | 51494 22621 -56% |
shuffle7:
proc shuffle7 list { set l1 {} set l2 {} set l3 {} set l4 {} foreach le $list { if {rand() < .5} { if {rand() < .5} { lappend l1 $le } else { lappend l2 $le } } { if {rand() < .5} { lappend l3 $le } else { lappend l4 $le } } } return [concat $l1 $l2 $l3 $l4] }
For example, given a 10 element list, shuffle7 has given the following results (10 consecutive runs):
1 7 9 4 8 0 2 3 5 6 4 6 7 9 0 3 5 8 1 2 0 7 3 6 8 1 2 5 4 9 5 7 9 4 6 0 1 2 3 8 0 3 9 2 6 4 5 7 1 8 1 2 0 9 3 6 7 4 5 8 1 7 0 4 8 2 6 3 5 9 0 4 5 1 3 6 8 9 2 7
I find one of the advantages of shuffle7 to be the backwards compatibility for the earlier versions of Tcl.
RS: Very impressive. However, I have the vague feeling that this shuffling is biased - the first list elements have the best chance to come out first again, while the last tend to come in the end - as your example above shows. With four "buckets", the first element of a list of any length has a 25% chance to be first again, against 1/N as a purely random shuffle would be.
Csan: RS: Yes, I agree, it only shuffles (almost like you would do when playing card games with friends... :)), but doesn't tend to return a perfectly randomly shuffled list. One could maybe use shuffle7 to simulate real-life behavior when writing card games in Tcl by adjusting the number of buckets.
JRW: shuffle7 only generates 4^N permutations of N elements, which is a vanishingly small fraction of N!. It is equivalent to randomly assigning each input element a bucket number in the range 1 to 4, and then sorting the elements by bucket number. (Using, obviously, bucket sort.) Elements in the same bucket retain their input order. shuffle7 cannot, for example, produce the permutation 2 1 4 3 6 5 8 7.
RS has this recursive variation, which does shuffle (without initial list length), but appears not blindingly fast, and has the same recursion depth problem:
proc shuffle8r list { if {[llength $list] > 1} { set i [expr {int([llength $list] * rand())}] set pick [lindex $list $i] set res [shuffle8r [lreplace $list $i $i]] lappend res $pick } else { set list } }
KPV: Here's yet a totally different way to shuffle a list. The shuffling is not perfectly random, but it's very easy to code, fairly fast and more often than not it's good enough.
proc shuffle9 list { foreach a $list { set arr($a) 1 } return [array names arr] }
RS: Note however that duplicates in the input list are removed. JRW: You could fix that by doing "incr arr($a)" instead of "set arr($a) 1", and then appending each element $arr($a) times to the output list. But that would clunk up the algorithm to the point where it might not be useful.
KPV: Here's an unbiased version that uses lset.
proc shuffle10 list { set len [llength $list] set len2 $len for {set i 0} {$i < $len-1} {incr i} { set n [expr {int($i + $len2 * rand())}] incr len2 -1 # Swap elements at i & n set temp [lindex $list $i] lset list $i [lindex $list $n] lset list $n $temp } return $list }
MS: here's a slightly faster one - it does essentially the same thing, but with fewer ops in the inner loop. KPV's chooses randomly the element to put at spot i among all elements with index >=i, starting at i=0 and going up: i.e., it chooses first element 0, then element 1 among those not already chosen, ... then element i among those not already chosen, ...
This one performs the same algorithm, but backwards: it chooses first the last element, then the second to last among those not already chosen, ...
proc shuffle10a list { set len [llength $list] while {$len} { set n [expr {int($len * rand())}] set tmp [lindex $list $n] lset list $n [lindex $list [incr len -1]] lset list $len $tmp } return $list }
TFW Here is a variation of shuffle10 that can morph a list
## # Morph a list (i.e. slightly shuffle it) # proc morph {list {perc 100}} { set modulo [expr {100 / $perc}] expr {srand([clock clicks])} set len [llength $list] set len2 $len incr len -1 for {set i 0} {$i < $len} {incr i} { set n [expr {int($i + $len2 * rand())}] incr len2 -1 if {$i%$modulo==0} { # Swap elements at i & n set temp [lindex $list $i] lset list $i [lindex $list $n] lset list $n $temp } } return $list }
%morph {1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20} 100 7 15 1 9 11 19 6 5 4 13 3 10 20 2 16 8 17 12 18 14 %morph {1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20} 10 ;# only 10% change 12 2 3 4 5 6 7 8 9 10 18 1 13 14 15 16 17 11 20 19
AMG: RS's innocuous typo on the unsort page caused me to write another version of this page under the title Shuffling a list. Shrug!
Using TclX functions:
fortinal 2007-09-18: This is a compact way to shuffle a list using random and lvarpop functions provided by TclX package.
package require Tclx # create data_list_rand by shuffling data_list while {[set list_length [llength $data_list]] > 0} { lappend data_list_rand [lvarpop data_list [random $list_length]] }
Note that the initial data_list is destroyed.
JRW: How is lvarpop implemented? I can't believe it's a constant time operation, and I worry this has hidden n log n or quadratic behavior.
Stu: 2008-11-02 No comment.
proc shuffle list { set sl {}; set i -1 foreach i [regsub -all {.\..+?%} [ lsort [subst [lrepeat [llength $list] \[expr\ rand()\]%\[incr\ i\]]] ] {}] { lappend sl [lindex $list $i] } return $sl }
aricb 2009-02-04 Exploiting [lsort -command].
proc randomsort {a b} { return [expr {int(rand()*2) * 2 - 1}] } proc shuffle list { return [lsort -command randomsort $list] }
JRW: I think you're being optimistic when you write a comparison function that violates the requirements of lsort. The behavior of lsort is no longer guaranteed. Another issue: lsort probably uses the quicksort algorithm. The recommended implementation of quicksort uses insertion sort for small sublists. But using your comparison function with insertion sort, each new element will on average be moved 1/2 a step from its initial position. So I think it's likely you'll get outputs that are not all that shuffled.
Lars H: Actually, lsort uses mergesort. But lsort -command is infamous for being slow, and the above should not produce a uniform distribution. From looking at the implementation , I get the feeling that for a list with 2^n+1 elements, this procedure would put the last element first with probability 0.5, second with probability 0.25, third with probability 0.125, etc.
BenGoldberg 2010-10-10 12:20:50:
Here's a way to shuffle a list which is based (loosely) on quicksort.
proc shuffle list { if {[llength $list] < 2} {return $list} set lft {} set rgt {} foreach item $list { if {rand() > 0.5} { lappend lft $item } else { lappend rgt $item } } concat [shuffle $lft] [shuffle $rgt] }
Caution: This code was typed from memory, and has not been tested.
Assuming that rand is not biased, correlated, or otherwise statically flawed, this procedure can be proven to be able to produce a uniform distribution.
TRA 2019-01-21
Adding my approach as there seems to be nothing quite like it here yet. Quickly checking it seems to be pretty robust.
proc shuffle list { set r {} foreach i $list { set r [linsert $r [expr {int(rand()*[incr n])}] $i] } return $r }
wdb 2019-10-14 My one is optimized for short code:
proc shuffle list { while {[llength $list] > 0} { set index [expr {int(rand() * [llength $list])}] lappend result [lindex $list $index] set list [lreplace $list $index $index] } lappend result }