Version 5 of Dict VS Array Speed

Updated 2005-03-16 18:27:09 by RoyT

RHS 16March2005

I was doing some coding for the Language Shootout, and implemented the nsieve test using both arrays and dicts. While I was aware that dicts are faster than arrays, I was amazed at the difference.

UPDATE: The diff was due to using pattern match on array unset - see end notes RT

The array version:

 proc main {n} {
    foreach value [list $n [incr n -1] [incr n -1]] {
        set num [expr { int(pow(2, $value) * 10000) }]
        puts [format "Primes up to %8d %8d" $num [nsieve $num]]
    }
 }

 proc nsieve {n} {
    set inData {}
    for {set i 2} {$i <= $n} {incr i} {
        lappend inData $i {}
    }
    set data {}; unset data
    array set data $inData

    for {set i 2} {$i <= $n} {incr i} {
        if { ! [info exists data($i)] } {
            continue
        }
        for {set j [expr {$i * $i}]} {$j <= $n} {incr j $i} {
            array unset data $j
        }
    }

    llength [array names data]
 }

 main [lindex $argv 0]

The dict version:

 proc main {n} {
    foreach value [list $n [incr n -1] [incr n -1]] {
        set num [expr { int(pow(2, $value) * 10000) }]
        puts [format "Primes up to %8d %8d" $num [nsieve $num]]
    }
 }

 proc nsieve {n} {
    set data {}
    for {set i 2} {$i <= $n} {incr i} {
        lappend data $i {}
    }

    for {set i 2} {$i <= $n} {incr i} {
        if { ! [dict exists $data $i] } {
            continue
        }
        for {set j [expr {$i * $i}]} {$j <= $n} {incr j $i} {
            dict unset data $j
        }
    }

    llength [dict keys $data]
 }

 main [lindex $argv 0]

The timings:

 (11)>> time ./nsieve.tcl 2
 Primes up to    40000     4203
 Primes up to    20000     2262
 Primes up to    10000     1229

 real    12m3.968s
 user    12m1.470s
 sys     0m0.030s

 (8)>> time ./nsieve.dict.tcl 2
 Primes up to    40000     4203
 Primes up to    20000     2262
 Primes up to    10000     1229

 real    0m4.115s
 user    0m2.020s
 sys     0m0.000s

DGP Is most of the difference due to [array unset] versus [dict unset] ? Note that [aray unset] operates on a pattern argument that must be matched against all elements of the array. While [dict unset] does not respect patterns and operates on exactly one key. That's an N vs N-squared difference in the algorithms.

Try replacing [array unset data $j] with [unset -nocomplain data($j)] and see whether that brings the two implementations into closer alignment.

RHS Indeed, that change alters the array timing to almost as fast as the dict:

 (13)>> time ./nsieve.tcl 2
 Primes up to    40000     4203
 Primes up to    20000     2262
 Primes up to    10000     1229

 real    0m2.737s
 user    0m2.710s
 sys     0m0.030s

Morale of the story: Don't use pattern matching operations unless you need to.


Category Performance