## Shuffle a list: graph results

Difference between version 14 and 15 - Previous - Next
```** Description **
[Bob Techentin] posted to [comp.lang.tcl], in response to someone who
wanted to get a [list] of numbers in a random order, the following:

This graph shows the results of a tradeoff study on different procedures to
shuffle a list into random sequence.  The abscissa shows the length of the
list, and the ordinate the time taken (on a 550 MHz PIII) to shuffle it.  The
curves correspond to the ''shuffle0'', ''shuffle1'', ''shuffle1a'',
''shuffle2'', ''shuffle3'' and ''shuffle4'', ''shuffle5'', ''shuffle5a'', and
''shuffle6'' procedures, all of which appear on the page entitled [Shuffle a
list].
----

([KBK] Note that the vertical scale needed to be changed to accommodate
[[shuffle6]].)
''To get a unique list of numbers (say, 1-1000) in a random order, you can first create an ordered list, then randomize the list by walking through once and performing a random "swap".  Something like this:''

Please look below the image for the code that generated it.  Thanks to DKF for
making the image available!
======
for {set i 0} {\$i < 1000} {incr i} {lappend nums \$i}
for {set i 0} {\$i < 1000} {incr i} {
set j [expr {int(rand() * 1000)}]
set temp [lindex \$nums \$j]
set nums [lreplace \$nums \$j \$j [lindex \$nums \$i]]
set nums [lreplace \$nums \$i \$i \$temp]
}
======

[MG] wonders if the image-creation code could be redone using something like
[KPV]: I'd like to emphasize here a point that [JE] makes below which
is that most of the algorithms here are biased. Specifically, not all
permutations are equally likely. Only shuffle3, shuffle4, shuffle5a,
and shuffle10 are unbiased.

Here's a quick proof that the algorithm above is biased: it iterates over N items each time selecting from N
possible random numbers for a total of N**N possible random sequences.  Each random sequence generates a permutation, but since N! is less than N**N, some permutations must be generated more than once.  Furthermore, N! does not divide N**N, so some permutations must be generated more often than others.

The folklore unbiased algorithm is to iterate over the list
and swapping the ''i''th entry with a random entry between
''i'' and the end of the list. You can easily
show that this produces exactly N! equally likely possible
results.

[JRW]: Not really folklore -- it's in Knuth vol 2, and occurs as an exercise in Cormen-Leiserson-Rivest.  Also see [willdye]'s references below.

----

[willdye]: If you're here merely to find a fast, pre-written shuffle algorithm,
be advised, dear coder, that shuffling is like sorting -- there is no single
"correct" answer, and it's very easy to find answers which look to be correct
but in fact have subtle (and occasionally serious) problems.
One of my favorite discussions of the matter is at:
[http://www.codinghorror.com/blog/archives/001015.html] and:
[http://www.codinghorror.com/blog/archives/001008.html].
You'll probably be OK if you use an established algorithm such as
Knuth-Fisher-Yates (shown in links above), AND take care to initialize your
random number generator properly.  As the long discussion on the rest of this
page indicates, however, the subject simply does not have a single correct
answer which applies to everyone in every situation.

---

[JRW]: I disagree with [willdye].  If you want an algorithm that will produce all possible permutations of the input list with equal probability in linear time, then there's only one alternative: the nice, neat implementation of Knuth-Fisher-Yates given in shuffle10a.

The other algorithms on this page are, for the most part, variations on the following strategy:  assign each input element a random number in the range 1 to N, and then sort.  Naive implementations use lsort (n log n comparisons); smarter ones use bucket sort (linear time).  This strategy does NOT produce a uniform distribution.  In fact, it's quite likely two or more list elements will be assigned the same random number, in which case they stay in input order.  There's a few example implementations that may be faster than shuffle10a, but they have their own problems.  Most everything else is pointless -- they don't give you a uniform distribution of permutations, they don't run any faster than shuffle10a, and they're no easier to code.

----

Off topic to the rest of the page - but an alternative to get the original request (list of numbers in random order) you could use
'''::RandomOrg::getSequence 1 1000''' - see [True Random Numbers] for details

----

This got me thinking about a variety of methods, and performance
for each.

The cool thing about Bentley 's method is that it works in linear time; it's essentially selection sort where a random oracle determines the element to be selected.  Alas, ''[lreplace]'' isn't the greatest performer on the planet.  The problem is that it copies the list every time, meaning that Bob's code degenerates into quadratic behavior, as I discovered when I timed it.

I then hacked up the following stuff to experiment with the
behavior:

'''An afterthought:''' ''lreplace'' isn't the problem.  The problem is a bizarre interaction with ''lreplace'' and the lifetime of a ''Tcl_Obj''.  Following up on an idea of Donal Fellows, I hacked up the ''shuffle1a'' procedure below to make sure that I was updating the list in place, and the performance improved incredibly!

'''Another afterthought:''' Bob Techentin posted to comp.lang.tcl
the ''shuffle3'' procedure with the message:

I've tried timing this on my WinNT machine, and it runs 3x slower than
shuffle1a, which I don't completely understand.  It ''should'' just be
moving the elements from one list to another, but something else must be
going on.

[KBK]: My explanation is that ''lreplace'' has a special case when you're replacing elements of the list with the same number of elements.  In this case, it is able to avoid copying.  Otherwise, it winds up needing to slide the elements down, leading to quadratic behavior once again.

'''One more time...''' Steve Cohen posted:

This is a close approximation of/exactly the code that Kevin ran
in his test, isn't it?  I wonder if we could convince him to run *just
one more routine* through his gamut.  Here it is: <snip>

Note that the only big modification is to simply swap the list
element selected from the first list with the ''n''th element from the
original list.  By not modifying the size of the list, Tcl seems to
leave it alone (in memory).  On my (really slow P133 laptop) machine,
this runs linearly with list size.  I would be interested to see what
Kevin gets with his big-hootie computer.

[KBK]: Congratulations, Steve, ''shuffle4'' is the best so far,
for large lists.  Numbers below.

[Christoph Bauer] adds a significant improvement for short lists, in
the ''shuffle5a'' procedure below.  Note that it, too requires the
''K'' combinator for adequate performance.  Like ''shuffle3'',
it begins to show the strain of quadratic behavior once the list
has many thousands of elements.

[MS]: for slightly faster performance (5-10% on Tcl8.4a3, as of 2001/05/23), and even uglier appearance, replace the ''K'' combinator with the equivalent idiom:

[K x y]   <=>  [lindex [list x y] 0]

----

'''Shuffle a list correctly:'''  [JE] Note that of the procedures below, only shuffle3, shuffle4, and shuffle5(a) return a random permutation with uniform distribution (that is, each permutation is equally likely). With shuffle1(a), shuffle2, and shuffle6 some orders are more likely than others.

Also be aware that if you're using any of these to simulate shuffling a deck of cards, you should use a different random number generator.  ''expr rand()'' uses a 32-bit seed, so the routines below return one of at most 2^32 different permutations; on the other hand there are 52! ways to shuffle a deck of cards (which is a ''lot'' bigger than 2^32).

----

* Procedure to generate a list of ''n'' numbers:

======
package require Img
set graph [image create photo -data .c -format window]
\$graph write shufflegraph.gif -format gif
proc iota n {
for {set i 0} {\$i < \$n} {incr i} {
lappend retval \$i
}
return \$retval
}
======

to make it work in a few more places?
* Several of the procedures rely on Donal Fellows's ''K'' combinator:

(NOTE: The image is missing the curve for [[shuffle6]].)
======
proc K {x y} {set x}
======

[http://www.man.ac.uk/~zzcgudf/tcl/wikipix/shufflegraph.gif]
* ''shuffle0'' is the obvious method of generating random keys, then sorting the list according to those keys. ''TFW 4/18/01'' This was broken due to initial newlist 0 (should be empty). Also, if you insert the set retval [[list]] you get a non trivial speedup. Moral, don't let lappend initialize your lists for you.

======
proc shuffle0 list {
set newlist [list]
foreach element \$list {
lappend newlist [list [expr {rand()}] \$element]
}
set retval [list]
foreach pair [lsort -real -index 0 \$newlist] {
foreach {random item} \$pair {
lappend retval \$item
}
}
return \$retval
}
======

* [AMG]: [Shuffling a list]
* [AMG]: [unsort]
* ''shuffle1'' is Techentin's implementation of Bentley's method.

** Code **
======
proc shuffle1 list {
set n [llength \$list]
for {set i 0} {\$i < \$n} {incr i} {
set j [expr {int(rand() * \$n)}]
set temp [lindex \$list \$j]
set list [lreplace \$list \$j \$j [lindex \$list \$i]]
set list [lreplace \$list \$i \$i \$temp]
}
return \$list
}
======

* ''shuffle1a'' is Techentin's code, with a clever hack (due to Donal Fellows) for managing the lifetime of the Tcl_Obj that represents the list so that it doesn't get copied needlessly.

======
#! /bin/env tclsh
proc shuffle1a list {
set n [llength \$list]
for {set i 0} {\$i < \$n} {incr i} {
set j [expr {int(rand()*\$n)}]
set temp1 [lindex \$list \$j]
set temp2 [lindex \$list \$i]
set list [lreplace [K \$list [set list {}]] \$j \$j \$temp2]
set list [lreplace [K \$list [set list {}]] \$i \$i \$temp1]
}
return \$list
}
======

package require Tk
* ''shuffle2'' implements Bentley's method, unpacking the list to an array first.

# Graph the results of a tradeoff study on list shuffle procedures
======
proc shuffle2 list {
set n 0
foreach element \$list {
set data(\$n) \$element
incr n
}
for {set i 0} {\$i < \$n} {incr i} {
set j [expr {int(rand() * \$n)}]
set temp \$data(\$j)
set data(\$j) \$data(\$i)
set data(\$i) \$temp
}
for {set i 0} {\$i < \$n} {incr i} {
lappend retval \$data(\$i)
}
return \$retval
}
======

canvas .c -width 640 -height 480 -bg white
* ''shuffle3'' is Bob Techentin's implementation of Stephen D. Cohen's proposed method.

array set color {
0  black
1  blue
1a forestgreen
2  red
3  magenta
4  darkcyan
5  gray50
5a brown
6  orange
======
proc shuffle3 list {
set n [llength \$list]
while {\$n>0} {
set j [expr {int(rand() * \$n)}]
lappend slist [lindex \$list \$j]
set list [lreplace [K \$list [set list {}]] \$j \$j]
incr n -1
}
return \$slist
}
======

set yscale [expr { 400 / ( log( 1000000 ) - log ( 1 ) ) }]
set xscale [expr { 560 / ( log( 10000 ) - log ( 1 ) ) }]
* ''shuffle4'' is Steve Cohen's improved implementation:

.c create line 60 440 620 440
.c create line 60 440 60 40
for { set n 1 } { \$n <= 10000 } { set n [expr { \$n * 10 }] } {
set x [expr { 60 + \$xscale * log ( \$n ) }]
.c create text \$x 450 -text \$n -anchor n -font {Helvetica -10}
for { set nn \$n } { \$nn < 10*\$n && \$nn <= 10000 } { incr nn \$n } {
set x [expr { 60 + \$xscale * log ( \$nn ) }]
.c create line \$x 440 \$x 450
======
proc shuffle4 list {
set n [llength \$list]
while {\$n > 0} {
set j [expr {int(rand() * \$n)}]
lappend slist [lindex \$list \$j]
incr n -1
set temp [lindex \$list \$n]
set list [lreplace [K \$list [set list {}]] \$j \$j \$temp]
}
return \$slist
}
for { set t 1 } { \$t <= 1000000 } { set t [expr { \$t * 10 }] } {
puts "t = \$t"
set y [expr { 440 - \$yscale * log (\$t ) }]
.c create text 50 \$y -text [expr { 0.000001 * \$t}] -anchor e \
-font {Helvetica -10}
for { set tt \$t } { \$tt < 10*\$t && \$tt <= 1000000 } { incr tt \$t } {
set y [expr { 440 - \$yscale * log (\$tt ) }]
.c create line 60 \$y 50 \$y
======

* ''shuffle5'' and ''shuffle5a'' are from Christoph Bauer.  They differ only in the use of the ''K'' combinator.

======
proc shuffle5 list {
set n 1
set slist {}
foreach item \$list {
set index [expr {int(rand() * \$n)}]
set slist [linsert \$slist \$index \$item]
incr n
}
return \$slist
}
# default font is {Helvetica -12}
.c create text 320 5 -text "Comparison of list shuffle methods" -anchor n \
-font {Helvetica -12 bold}
.c create text 330 475 -text "List size" -anchor s -font {Helvetica -10}
.c create text 5 210 -text "Time (s)" -anchor w -font {Helvetica -10}
======

set y0 42
set dy [font metrics {Helvetica -12} -linespace]
.c create text 75 \$y0 -anchor nw -text "Key:" -fill black -tag key \
-font {Helvetica -12 bold}
incr y0 \$dy
foreach { method tm(1) tm(10) tm(100) tm(1000) tm(10000) } {
shuffle0       33     118     997   11045  183446
shuffle1       28     169    2055  105000  ------
shuffle1a      33     187    1717   17243  176520
shuffle2       39     203    1852   19034  229225
shuffle3       28     128    1097   12195  267413
shuffle4       30     138    1182   11808  121336
shuffle5       20      95    1023   35988  ------
shuffle5a      24     108     908   10266  250796
shuffle6        8      74     633    6409   66740
} {
set ab [string range \$method 7 end]
set y1 [expr {\$y0+\$dy/2}]
.c create line 75 \$y1 95 \$y1  -fill \$color(\$ab)  -tag key
.c create text 100 \$y0  -anchor nw  -text \$method  -fill \$color(\$ab) \
-tag key
incr y0 \$dy
foreach n { 1 10 100 1000 10000 } {
set t \$tm(\$n)
if { [string compare ------ \$t] } {
set x [expr { 60 + \$xscale * log ( \$n ) }]
set y [expr { 440 - \$yscale * log (\$t ) }]
.c create text \$x \$y  -text \$ab  -anchor center \
-fill \$color(\$ab)  -font {Helvetica -10}
if { [info exists lastx(\$ab)] } {
.c create line \$lastx(\$ab) \$lasty(\$ab) \$x \$y \
-fill \$color(\$ab)

======
proc shuffle5a list {
set n 1
set slist {}
foreach item \$list {
set index [expr {int(rand() * \$n)}]
set slist [linsert [K \$slist [set slist {}]] \$index \$item]
incr n
}
return \$slist
}
======

* The 'shuffle6' procedure uses `[lset]`, which is in Tcl [Changes in Tcl/Tk 8.4%|%8.4]:

======
proc shuffle6 list {
set n [llength \$list]
for {set i 1} {\$i < \$n} {incr i} {
set j [expr {int(rand() * \$n)}]
set temp [lindex \$list \$i]
lset list \$i [lindex \$list \$j]
lset list \$j \$temp
}
return \$list
}
======

* The test harness times the various methods and prints the results.

======
puts "                Times in usec for shuffle methods"
puts "    Method                    List length"
puts "                    1      10     100    1000   10000"
puts "    -------------------------------------------------"
set methods [lsort -dictionary [info procs shuffle*]]
foreach method \$methods {
puts -nonewline {    }
puts -nonewline [format %-9s \$method]
foreach n {1 10 100 1000 10000} {
set list [iota \$n]
if {
\$method in {shuffle1 shuffle5}
&& \$n > 1000
} {
puts -nonewline {  ------}
} else {
set t [lindex [time {\$method \$list} [expr {100000/\$n}]] 0]
puts -nonewline [format %8d \$t]
}
}
puts {}
}
======

----
The results are summarized in the following table (8.4a4, WinNT 4.0 SP6, 550 MHz PIII-SpeedStep):

Times in usec for shuffle methods
Method                    List length
1      10     100    1000   10000
-------------------------------------------------
shuffle0       33     118     997   11045  183446
shuffle1       28     169    2055  105000  ------
shuffle1a      33     187    1717   17243  176520
shuffle2       39     203    1852   19034  229225
shuffle3       28     128    1097   12195  267413
shuffle4       30     138    1182   11808  121336
shuffle5       20      95    1023   35988  ------
shuffle5a      24     108     908   10266  250796
shuffle6        8      74     633    6409   66740

More results (8.4.4 Win2k SP3)

Times in usec for shuffle methods
Method                     List length
1      10     100    1000   10000
--------------------------------------------------
shuffle0        31      99     834   10114  168250
shuffle1        28     181    2733  188349  ------
shuffle1a       49     180    1659   16585  169083
shuffle2        32     154    1430   15825  200545
shuffle3(*)     59     131    1170   19785  798204
shuffle4(*)     29     139    1211   11937  122958
shuffle5        19     122    1261   46737  ------
shuffle5a(*)    23     104     917   12966  542084
shuffle6         9      67     609    4958   52110
shuffle7        23      56     291    2588   28875
shuffle9        22      59     431    5566   99201
shuffle10(*)    10      73     629    6190   64879

(*) unbiased shuffling

If anyone wants to see a graph of the result, you can get one by
running the code at [Shuffle a list: graph results].

----
'''[Kevin Kenny]:'''

Summary:
* Using `[lset]` wins over everything else.  This benchmark is just made for `[lset]`.

Among the other methods:

* For short lists (up to a hundred elements or so is "short"), Bauer's ''shuffle5'' procedure is a clear win, because it exploits the speed of ''foreach''.  (This tradeoff may change, given that the maintainer is contemplating bytecoding other list operations in version 8.4.)
* For lists longer than a few hundred elements, any implementation that does list surgery without the ''K'' combinator runs into extremely expensive quadratic behavior.  What happens is that ''linsert'' or ''lreplace'' discovers that it has a ''Tcl_Obj'' with a reference count greater than 1 and is forced to copy it.  For medium-sized lists, say 100-1000 elements, ''shuffle5a'', which is ''shuffle5'' with the K combinator added, is a clear winner.
* Similarly, ''shuffle3'' causes ''lreplace'' to copy, on average, half the list.  This copying leads to quadratic behavior as well, although it's not nearly as bad as 'shuffle1'.  The ''shuffle5'' procedure runs into the same behavior in ''linsert.''
* The ''shuffle4'' procedure emerges a winner for the longest lists,
although ''shuffle0'', ''shuffle1a'', and ''shuffle2'' are all extremely close.  The lesson here is that it's important to structure the code so that the list never gets resized -- otherwise, there's a lot of memory copying involved.  Some of the bytecode changes between 8.3.2 and 8.4 appear to favor ''shuffle1a'' instead.
* ''lsort'' is surprisingly inexpensive.  All of these procedures seem to be dominated by the linear overheads.

----

[Jeffrey Hobbs]:
I have added this to the tclbench suite, but I find after quite a bit of testing, that ''shuffle1a'' consistently performs as well as or better than ''shuffle4'', with ''shuffle5a'' being a good general solution for lists up to 1000 or so items.  My last timings with 8.3.3:

Win2K/P3-800:
Tcl8.3.3      Times in usec for shuffle methods
Method                    List length
1     10      100     1000    10000
--------------------------------------------------
shuffle0        50     155    1315   13920  214300
shuffle1-s      24     125    2191  130895  ------
shuffle1a       30     170    1578   15725  162750
shuffle2        36     176    1515   15870  200800
shuffle3        30     150    1378   16825  535800
shuffle4        32     165    1478   14675  151700
shuffle5-s      18      66     839   38505  ------
shuffle5a       22      96     864   12015  478200

Linux/P3-???:
Tcl8.3.3      Times in usec for shuffle methods
Method                    List length
1     10      100     1000    10000
--------------------------------------------------
shuffle0       143     405    3089   32796  396658
shuffle1-s      64     296    3579  398035  ------
shuffle1a       79     407    3653   36153  363324
shuffle2       104     431    3598   36537  402357
shuffle3        87     406    3470   37679  778334
shuffle4        92     436    3734   36761  368350
shuffle5-s      47     182    1854   89727  ------
shuffle5a       54     239    2033   22177  616730

----

[Csan] 2002-11-12: I've made a comparison between Tcl 8.3.3 and Tcl 8.4.0 on the same hardware.
Hardware specs:
/proc/cpuinfo:
vendor_id       : GenuineIntel
cpu family      : 6
model           : 8
model name      : Pentium III (Coppermine)
stepping        : 3
cpu MHz         : 666.545
cache size      : 256 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips        : 1330.38

OS specs:
Debian GNU/Linux sid, using linux kernel 2.4.19-686 / P3-667

Times in usec for shuffle methods
(A - Tcl 8.3.3   B - Tcl 8.4.0    C - B improvement over A, if A is 100%)
List length
|       1       |       10        |        100        |         1000        |           10000        |
-----------+----+----+-----+-----+-----+-----+------+------+-----+-------+-------+-----+---------+--------+-----+
Method     |  A    B    C  |  A     B     C  |   A      B     C  |   A       B      C  |    A        B       C  |
-----------+----+----+-----+-----+-----+-----+------+------+-----+-------+-------+-----+---------+--------+-----+
shuffle0   | 34   21  -39% | 157    76  -52% | 1439    703  -52% | 15774    8454  -47% |  215209   147609  -32% |
shuffle1   | 27   17  -38% | 144   102  -30% | 2013   1491  -26% | 87920   82390   -7% |  ------   ------  ---- |
shuffle1a  | 32   22  -32% | 173   150  -14% | 1588   1413  -12% | 15662   14168  -10% |  158882   142911  -11% |
shuffle2   | 40   22  -45% | 184   102  -45% | 1627    908  -45% | 16590    9488  -43% |  199949   126498  -37% |
shuffle3   | 33   19  -43% | 160   106  -34% | 1412    967  -32% | 15059   10516  -31% |  273615   226784  -18% |
shuffle4   | 35   20  -43% | 174   112  -36% | 1547   1012  -35% | 15137    9952  -35% |  154841   102327  -34% |
shuffle5   | 20   12  -40% |  85    59  -31% |  953    710  -26% | 30021   28559   -5% | 4957511  4243210  -15% |
shuffle5a  | 23   15  -35% | 103    82  -21% |  897    745  -17% |  9694    8335  -15% |  215593   202902   -6% |
shuffle6   | --    6  ---- | ---    47  ---- | ----    418  ---- | -----    4134  ---- | -------    42905  ---- |
shuffle7   | 24   16  -33% |  78    40  -49% |  537    231  -57% |  4912    2027  -59% |   51494    22621  -56% |

(See shuffle7 procedure below)
My tests show a rough 33% improvement of Tcl 8.4.0 over Tcl 8.3.3, not counting in the overall and stunning winner shuffle6!

----

'''[DKF]''' ''- Note for future work:'' It should be possible to produce a much more efficient version of the list shuffling with the new [[lset]] command in 8.4, but it is too late right now for me to actually write it (especially as I've not got a compiled version of 8.4 to hand right now anyway.)

[KBK] Done. Can you update the picture?

[Csan] Updated mine. ;)

----

I quote DKF:  "A good way that is easy to understand is to convert the list into a list of
pairs by pairing (converting to a two-element list) each element with a random
number from [[expr rand()]].  Then use [[lsort -index]] to sort on those random
numbers and remember to ditch them as you extract..."

----

[RS] wonders whether in shuffle6, testing for equality of i and j would scrape out some microseconds (swapping the same element would be wasted time, even with [lset]):

======
proc shuffle6a list {
set n [llength \$list]
for {set i 1} {\$i < \$n} {incr i} {
set j [expr {int(rand() * \$n)}]
if {\$i != \$j} {
set temp [lindex \$list \$i]
lset list \$i [lindex \$list \$j]
lset list \$j \$temp
}
}
return \$list
}
======

suchenwi        Jó napot, János!
Csan         I'm benchmarking your new shuffle6a...
suchenwi        Is it faster?
Csan        nope
Csan         shuffle6       6      47     417    4111   43409
shuffle6a       6      50     462    4602   47802

[suchenwi]: Ah. So the time the [if] eats on every turn is more that how much is won in case of i==j.
This speaks for the efficiency of [lindex] and [lset]...
Well, if the probability of i==j is 1/N (length of list), it can be expected to occur once in the iteration. OK, so withdraw that variant...

----

[Csan] 2002-11-13: I'd like to request for comments on my shuffle7 procedure below... it is the best so far (updated comparison charts).
It is for certain that the resulting list shuffled.
Thanks to [RS] and [dkf], a few optimizations to the proc give further dramatical speed-ups!:

`{rand() < .5} instead of `{[expr rand() > .5}` - the {} argument is passed to [expr] anyway...
[concat \$l1 \$l2 \$l3 \$l4] instead of "\$l1 \$l2 \$l3 \$l4" - to avoid list->string conversion

The magnitude of the improvement is:
shuffle7   | 25   16  -36% |  83    44  -47% |  583    286  -51% |  5415    2594  -52% |   58367    29838  -49% |
shuffle7   | 24   16  -33% |  78    40  -49% |  537    231  -57% |  4912    2027  -59% |   51494    22621  -56% |

shuffle7:

======
proc shuffle7 list {
set l1 {}
set l2 {}
set l3 {}
set l4 {}
foreach le \$list {
if {rand() < .5} {
if {rand() < .5} {
lappend l1 \$le
} else {
lappend l2 \$le
}
set lastx(\$ab) \$x
set lasty(\$ab) \$y
} {
if {rand() < .5} {
lappend l3 \$le
} else {
lappend l4 \$le
}
}
}
return [concat \$l1 \$l2 \$l3 \$l4]
}
======

foreach {x1 y1 x2 y2} [.c bbox key] {}
.c create rectangle [incr x1 -2] [incr y1 -2] [incr x2 2] [incr y2 2] \
-fill grey95  -outline black  -tags {key keybg}
.c lower keybg
For example, given a 10 element list, shuffle7 has given the following results (10 consecutive runs):

bind .c <Map> {
if {[tk_messageBox -type yesno  -title "Snapshot?"  -icon question \
-message "Take snapshot of graph?"] == "yes"} {
1 7 9 4 8 0 2 3 5 6
4 6 7 9 0 3 5 8 1 2
0 7 3 6 8 1 2 5 4 9
5 7 9 4 6 0 1 2 3 8
0 3 9 2 6 4 5 7 1 8
1 2 0 9 3 6 7 4 5 8
1 7 0 4 8 2 6 3 5 9
0 4 5 1 3 6 8 9 2 7

#[pyk] 2012-11-28: are these updates really needed?
#update
#update
I find one of the advantages of shuffle7 to be the backwards compatibility for the earlier versions of Tcl.

# DKF - And Now a little code to convert this graph to a GIF
# image file.  Obviously, this doesn't work everywhere and the
# filename is hardcoded...
exec xwd -id [winfo id .c] | xwdtopnm 2>@stderr | ppmquant 256 | \
ppmtogif >shufflegraph.gif 2>@stderr
exit
----

[RS]: Very impressive. However, I have the vague feeling that this shuffling is biased - the first list elements have the best chance to come out first again, while the last tend to come in the end - as your example above shows. With four "buckets", the first element of a list of any length has a 25% chance to be first again, against 1/N as a purely random shuffle would be.

----

[Csan]: RS: Yes, I agree, it only shuffles (almost like you would do when playing card games with friends... :)), but doesn't tend to return a perfectly randomly shuffled list. One could maybe use shuffle7 to simulate real-life behavior when writing card games in Tcl by adjusting the number of buckets.

[JRW]: shuffle7 only generates 4^N permutations of N elements, which is a vanishingly small fraction of N!.  It is equivalent to randomly assigning each input element a bucket number in the range 1 to 4, and then sorting the elements by bucket number.  (Using, obviously, bucket sort.)  Elements in the same bucket retain their input order.  shuffle7 cannot, for example, produce the permutation 2 1 4 3 6 5 8 7.

----

[RS] has this recursive variation, which does shuffle (without initial list length), but appears not blindingly fast, and has the same recursion depth problem:

======
proc shuffle8r list {
if {[llength \$list] > 1} {
set i [expr {int([llength \$list] * rand())}]
set pick [lindex \$list \$i]
set res [shuffle8r [lreplace \$list \$i \$i]]
lappend res \$pick
} else {
set list
}
}
======

grid .c
----

[KPV]: Here's yet a totally different way to shuffle a list. The
shuffling is '''not''' perfectly random, but it's very easy to
code, fairly fast and more often than not it's good enough.

======
proc shuffle9 list {
foreach a \$list {
set arr(\$a) 1
}
return [array names arr]
}
======

<<categories>> Mathematics | Plotting
[RS]: Note however that duplicates in the input list are removed.
[JRW]: You could fix that by doing "incr arr(\$a)" instead of "set arr(\$a) 1", and then appending each element \$arr(\$a) times to the output list.  But that would clunk up the algorithm to the point where it might not be useful.

----

[KPV]: Here's an unbiased version that uses [lset].

======
proc shuffle10 list {
set len [llength \$list]
set len2 \$len
for {set i 0} {\$i < \$len-1} {incr i} {
set n [expr {int(\$i + \$len2 * rand())}]
incr len2 -1

# Swap elements at i & n
set temp [lindex \$list \$i]
lset list \$i [lindex \$list \$n]
lset list \$n \$temp
}
return \$list
}
======

[MS]: here's a slightly faster one - it does essentially the same thing, but with fewer ops in the inner loop.
[KPV]'s chooses randomly the element to put at spot ''i'' among all elements with index ''>=i'', starting at ''i=0'' and going up: i.e., it chooses first element ''0'', then element ''1'' among those not already chosen, ... then element ''i'' among those not already chosen, ...

This one performs the same algorithm, but backwards: it chooses first the last element, then the second to last among those not already chosen, ...

======
proc shuffle10a list {
set len [llength \$list]
while {\$len} {
set n [expr {int(\$len * rand())}]
set tmp [lindex \$list \$n]
lset list \$n [lindex \$list [incr len -1]]
lset list \$len \$tmp
}
return \$list
}
======

----

[TFW] Here is a variation of shuffle10 that can ''morph'' a list

======
##
# Morph a list (i.e. slightly shuffle it)
#
proc morph {list {perc 100}} {
set modulo [expr {100 / \$perc}]
expr {srand([clock clicks])}
set len [llength \$list]
set len2 \$len
incr len -1
for {set i 0} {\$i < \$len} {incr i} {
set n [expr {int(\$i + \$len2 * rand())}]
incr len2 -1
if {\$i%\$modulo==0} {
# Swap elements at i & n
set temp [lindex \$list \$i]
lset list \$i [lindex \$list \$n]
lset list \$n \$temp
}
}
return \$list
}
======

======none
%morph {1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20} 100
7 15 1 9 11 19 6 5 4 13 3 10 20 2 16 8 17 12 18 14
%morph {1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20} 10 ;# only 10% change
12 2 3 4 5 6 7 8 9 10 18 1 13 14 15 16 17 11 20 19
======

----

[AMG]: [RS]'s innocuous typo on the [unsort] page caused me to write another version of this page under the title [Shuffling a list].  Shrug!

----

'''Using TclX functions:'''

[fortinal] 2007-09-18: This is a compact way to shuffle a list using [random] and [lvarpop] functions provided by [TclX] package.

======
package require Tclx

# create data_list_rand by shuffling data_list
while {[set list_length [llength \$data_list]] > 0} {
lappend data_list_rand [lvarpop data_list [random \$list_length]]
}
======

Note that the initial ''data_list'' is destroyed.

[JRW]: How is lvarpop implemented?  I can't believe it's a constant time operation, and I worry this has hidden n log n or quadratic behavior.

----

[Stu]: 2008-11-02 No comment.

======
proc shuffle list {
set sl {}; set i -1
foreach i [regsub -all {.\..+?%} [
lsort [subst [lrepeat [llength \$list] \[expr\ rand()\]%\[incr\ i\]]]
] {}] {
lappend sl [lindex \$list \$i]
}
return \$sl
}
======

----

[aricb] 2009-02-04 Exploiting [[lsort -command]].

======
proc randomsort {a b} {
return [expr {int(rand()*2) * 2 - 1}]
}
proc shuffle list {
return [lsort -command randomsort \$list]
}
======

[JRW]: I think you're being optimistic when you write a comparison function that violates the requirements of lsort.  The behavior of lsort is no longer guaranteed.  Another issue: lsort probably uses the quicksort algorithm.  The recommended implementation of quicksort uses insertion sort for small sublists.  But using your comparison function with insertion sort, each new element will on average be moved 1/2 a step from its initial position.  So I think it's likely you'll get outputs that are not all that shuffled.

[Lars H]: Actually, [lsort] uses mergesort. But '''lsort -command''' is infamous for being slow, and the above should not produce a uniform distribution. From looking at the [http://tcl.cvs.sourceforge.net/viewvc/tcl/tcl/generic/tclCmdIL.c?revision=1.181&view=markup%|%implementation%|%], I get the feeling that for a list with 2^n+1 elements, this procedure would put the last element first with probability 0.5, second with probability 0.25, third with probability 0.125, etc.

----

'''[BenGoldberg] 2010-10-10 12:20:50''':

Here's a way to shuffle a list which is based (loosely) on quicksort.

======
proc shuffle list {
if {[llength \$list] < 2} {return \$list}
set lft {}
set rgt {}
foreach item \$list {
if {rand() > 0.5} {
lappend lft \$item
} else {
lappend rgt \$item
}
}
concat [shuffle \$lft] [shuffle \$rgt]
}
======

Caution: This code was typed from memory, and has not been tested.

Assuming that rand is not biased, correlated, or otherwise statically flawed,
this procedure can be proven to be able to produce a uniform distribution.

----

[TRA] 2019-01-21

Adding my approach as there seems to be nothing quite like it here yet. Quickly checking it seems to be pretty robust.

======
proc shuffle list {
set r {}
foreach i \$list {
set r [linsert \$r [expr {int(rand()*[incr n])}] \$i]
}
return \$r
}
======

[wdb] 2019-10-14 My one is optimized for short code:

======
proc shuffle list {
while {[llength \$list] > 0} {
set index [expr {int(rand() * [llength \$list])}]
lappend result [lindex \$list \$index]
set list [lreplace \$list \$index \$index]
}
lappend result
}
======

<<categories>> Algorithm | Performance

```