Version 10 of iterating through an array

Updated 2003-01-24 01:59:28

Sometimes, if you have large arrays, it is better to not use foreach and array get to loop through an array, as it creates a list copy of your array with a quite hefty memory cost:

 foreach {key value} [array get arrName] {#do something with key and value}

Another alternative is

 foreach key [array names arrName]] { #get current value with $arrName($key) }

 foreach key [lsort -dictionary [array names foo]] { ... }

Better do it like this, using while and the array search facilities. (Those are a direct mapping of the C level functionality for a Tcl_HashTable ).

 set searchToken [array startsearch largeArray]
 while {[array anymore largeArray $searchToken]} {
     set key [array nextelement largeArray $searchToken]
     set value $largeArray($key)

     # do something with key/value

 }
 array donesearch largeArray $searchToken

Question: how big does an array need to be to make this the better approach?

KBK - How big is your physical memory? The [array startsearch]/[array nextelement] loop is awfully slow. I see it as a desperate move to be used only if the application thrashes horribly without it. The lists generated by [array names] and [array get] are more compact than the hashtable itself, because neither the keys nor the elements are copied, only pointers to them. Moreover, the lists are in contiguous memory, so they have good locality. The eight bytes per array element are generally negligible compared with the size of the array. In short, if your array is big enough that [array get] thrashes, it's getting dangerously close to thrashing without it.

One-line summary: "When is [array startsearch] the better approach?" "Almost never."

I think often what people are looking for when searching an array for specific keys can be done using array get with a wildcard.

Example:

  foreach {key value} [array get ::largeArray customer*] {
    #do something foreach customer
  }

Michael Schlenker 2 Gigabytes. Perhaps my old code was horribly broken (may be), but exchanging foreach with this construct did help. Some numbers: I had three arrays, totalling 4.2 million entries. Using the foreach approach my script ate up the whole 2 Gigs and died. Using array startsearch it used only 580 MB. Maybe it is because auf 8.4b2, but i do not think so (will test it soon). (this was with Tcl 8.4b2, and TclX loaded under SuSE 8.0 on a dual Athlon MP 1.2 Ghz)

In script one I used this code (inside a larger proc, this dumps an array to disk):

    set fid1 [open $::config(wordlist) w+]
    fconfigure $fid1 -buffering line  
    foreach {key count} [array get ::words] {
        set word [string map [list \n \u2028] $key]
        puts $fid1 [list $word $count]
    }
    close $fid1
    array unset ::words

This died horribly.

In script two i used this code:

    set fid1 [open $::config(wordlist) w+]
    fconfigure $fid1 -buffering line
    set t [array startsearch ::words]
    while {![string equal [set key [array nextelement ::words $t]] ""]} {
        set count $::words($key) 
        set word [string map [list \n \u2028] $key]
        puts $fid1 [list $word $count]
    }
    array donesearch ::words $t
    close $fid1
    array unset ::words

This worked.