iterating through an array

Sometimes, if you have large arrays, it is better to not use foreach and array get to loop through an array, as it creates a list copy of your array with a quite hefty memory cost:

 foreach {key value} [array get arrName] {#do something with key and value}

Another alternative is

 foreach key [array names arrName] { #get current value with $arrName($key) }

 foreach key [lsort -dictionary [array names foo]] { ... }

Better do it like this, using while and the array search facilities. (Those are a direct mapping of the C level functionality for a Tcl_HashTable ).

 set searchToken [array startsearch largeArray]
 while {[array anymore largeArray $searchToken]} {
     set key [array nextelement largeArray $searchToken]
     set value $largeArray($key)

     # do something with key/value

 }
 array donesearch largeArray $searchToken

Question: how big does an array need to be to make this the better approach?

KBK - As a guideline, [array names] and [array get] will allocate:

  • a C array of up to 2N (for [array names]) or 4N (for [array get]) Tcl_Obj pointers, where N is the size of the Tcl array.
  • N Tcl_Obj's to hold the array keys.
  • A copy of the string representation of each of the array keys.

This overhead can be substantial if the array already nearly fills memory.

Using startsearch, anymore, nextelement and donesearch allocates only a constant amount of memory (plus the size of the largest key). On the other hand, it's atrociously slow, because it executes so much Tcl code, and because it has to revalidate the search handle on every call. [array get] and [array names] are much, much faster because they copy the data - and don't need to worry about someone changing the array while they're executing. (If you know why this statement is an oversimplification, you don't need to read this discussion!)

So, I think that resorting to the element-at-a-time array iteration is a desperate move, to be resorted to only when the application will consume excessive memory without it. For day-to-day array operations, [array get] is both clearer and faster.

One-line summary: "When is [array startsearch] the better approach?" "Almost never, unless the array is so huge that a second copy would crash the program or thrash the virtual memory."


I think often what people are looking for when searching an array for specific keys can be done using array get with a wildcard.

Example:

  foreach {key value} [array get ::largeArray customer*] {
    #do something foreach customer
  }

Q: "When is [array startsearch] the better approach?" ...

A: When you have a huge array where each element must be visited. Memory wasn't so much the issue as building and passing a list of array names.

I have an array (arr) of 500,000 iTcl objects and must invoke a method on each:

 time {foreach n [array names arr] {$arr($n) dosomething}} 5
 2602631 microseconds per iteration

Now, consider this adhoc foreach_name proc (it's a hack and can be improved):

 proc foreach_name {key array cmd} {
    set id [uplevel "array startsearch $array"]
    uplevel "while {\[set $key \[array nextelement $array $id]] != {}} { $cmd }"
    uplevel "array donesearch $array $id"
 }

 time {foreach_name n arr {$arr($n) dosomething}} 5
 1896987 microseconds per iteration

As implemented in Tcl 8.4, on a Pentium 4 under Windows 2000.

-- Todd Coram

This looks a bit like unequal starting conditions. The first example uses proc, the second does not. Did you time the difference if you put the foreach loop into a proc?


Martin Lemburg 31.01.2003:

Why should he?

He shows very good, some qualities of the bytecompiler!

The first runtime example is not bytecompiled and slower than the bytecompiled proc in the second runtime example!

My opinion after this example: the array iterating interface is not as slow as said, if bytecompiled! Let's bytecompile!


Don't overlook the expense of [array names arr] (and passing it to a proc -- in this case 'foreach'). This is not just a memory expense, but a time one too. Consider,

 time {array names arr} 5
 517208 microseconds per iteration

and

 proc noop {l} {}; time {noop [array names arr]} 5
 557274 microseconds per iteration

Schnexel Array search sucks (I´d even say it is bugged). Sure, deleting elements within an array search loop might be too much to ask for. But why can´t I do [info exists] ? Example:

 set arr(a) 123
 set arr(b) 456

 # works:
 set searchId [array startsearch arr]
 while { [array anymore arr $searchId] } {
         set key [array nextelement arr $searchId]
         set arr($key) 789
 }
 array donesearch arr $searchId

 # works:
 set searchId [array startsearch arr]
 while { [array anymore arr $searchId] } {
         set key [array nextelement arr $searchId]
         puts [info exists arr($key)] ;# always 1
 }
 array donesearch arr $searchId

 # BREAKS: (&%$*!!!)
 set searchId [array startsearch arr]
 while { [array anymore arr $searchId] } {
         set key [array nextelement arr $searchId]
         puts [info exists arr(X$key)] ;# no!
 }
 array donesearch arr $searchId

 # SAME SHAME:
 set searchId [array startsearch arr]
 while { [array anymore arr $searchId] } {
         set key [array nextelement arr $searchId]
         if { [catch {set fff $arr(X$key)} errMsg] } { puts *$errMsg* }
 }

Luckily I´ve got 512M of RAM to waste for a copy of my array...


Category Performance