Version 28 of Arrays / Hash Maps

Updated 2010-03-23 00:13:10 by AMG

Purpose: to discuss the benefits and tricks to using Tcl arrays.


What Tcl refers to as an array has other names in other languages. Awk calls them associative arrays and some language (Perl?) calls them hash maps. In Python, dictionaries are pretty much the same.

The reason this is an important concept up front is that Tcl does not provide a simple numerically indexed array in the traditional sense. Instead, the index for a Tcl array is a Tcl string.

Tcl arrays are one dimensional. However, as Laurent Duperval recently mentioned in news:comp.lang.tcl , if you are careful you can simulate a multi-dimensional array like this:

 for {set i 0} {$i < 10} {incr i} {
   for {set j 0} {$j < 10} {incr j} {
     set a($i,$j) "Value for i = $i and j = $j"
     puts "$a($i,$j)"
   }
 }

Note however that "$i,$j" is parsed into a string that differs from the string "$i, $j" for instance -- it is best never to put blanks in the text of your array keys.

Also, for this type of application, it is much better to use integer values for i and j rather than characters. Otherwise, as in the previous point, you might run into cases where white space varies within the values of i and j, resulting in missed values.


Note that if you really do want to have spaces (or other Tcl-significant characters) in your array indexes, you can still do this by putting the index into a variable and then using the dereferenced variable as your index:

   % set idx "The quick brown fox"
   % set ary($idx) "jumped over the lazy dogs."
   % parray ary
   ary(The quick brown fox) = jumped over the lazy dogs.

You can even have the name of the array with a space in, but accessing it then takes the application of an [upvar] to convert to a more usable form...

DKF - RS: .. or just backslashes: set foo\ bar(baz\ grill) 1

Ken Jones -- You can also include whitespace characters in a key by quoting the entire argument with double quotes:

    % set capital(New Mexico) "Santa Fe"
    wrong # args: should be "set varName ?newValue?"
    % set "capital(New Mexico)" "Santa Fe"
    Santa Fe
    % puts "$capital(New Mexico)"
    Santa Fe
    % 

Of course, it's still best to avoid whitespace characters in keys whenever possible...


It's also important to know that arrays have no implicit string representation other than their name (and are thus mostly passed around by name-dropping and upvar). Explicit conversion from and to pairlists with alternating key/value is done with

   set L [array get A] ;# resp.
   array set A $L

You can use this technique for complex structures, e.g. put arrays (transformed to pairlists) into array elements and unpack them into a temporary array as needed. Copying an array to another is also easy with a combination of the two:

 array set B [array get A]

A Java programmer would recognise this sort of technique as being the analog of serialization.

Also, you can use this sort of scheme to parse key-value lists quickly (assuming you are not bothered about handling missing values gracefully) and, if you preload the array with your default values, this becomes even more efficient.

DKF


Unlike lists, hashes don't retain the sequence in which they were filled. You can retrieve the (complete or filtered) list of keys with array names and lsort that. By constructing keys with a running number, you can have the "history" too - see Numbered arrays. RS


Paul Duffin points out that this can also be done by keeping a chronological list of array keys. Adding elements goes like this

   if { ![info exists a($e)] } {
            set a($e) $v
            lappend a() $e
        }

and this is how to retrieve them in chronological order:

        foreach e $a() {
        }

Clearing an array: here's two ways to do it, courtesy Bob Techentin:

Be aware that array set will add elements to an array, but will not remove any of the old elements. So a simple array set arr {} will not do anything useful. Instead, if you've got Tcl 8.3 or newer:

        array unset arr

You can also supply a pattern argument to delete only selected array elements. If you're using or supporting older Tcls, then you can clear the entire array by unsetting the array variable and recreatint the array, like this:

        unset arr
        array set arr {}

Or, if you've got a lot of CPU time on your hands, you can use a much less efficient technique, getting a list of names and removing array entries one at a time.

        foreach key [array names arr] { unset arr($key) }

The only reason for handling the array this last way is if the array is large and expected to become large again. In this case, deleting the array elements one-by-one could save quite a few realloc/rehash operations. But this might not gain you very much. It depends on where you want to spend the time... (DKF via RS)


On May 10, 2000, Federic BONNET wrote news: [email protected] on news:comp.lang.tcl :

Tcl arrays shouldn't be seen as data types but as variable types opposed to scalar variables. Arrays are collections of scalar variables. A variable cannot be put in another variable (only its name), the same for arrays. Only a variable's value can be used to store information, and only scalar variables have a value, each array element being a scalar variable. An array variable has no value by itself. Thus the nature of Tcl arrays and the way they are implemented.


Morten Skaarup Jensen wrote in news:comp.lang.tcl :

 What I want is really more like data.members[x-1].children[y+1].age=15;

At first I didn't believe it, but you can have nested parens inside array indices too, so how's

 set x 3
 3
 set y 4
 4
 set data(members([expr $x-1]).children([expr $y+1]).age) 15
 # which the parser simplifies to:
 set data(members(2).children(5).age) 15
 15

The only drawback is the clumsy expr style. It's as easy as this:

  • Whatever you can give a C compiler as directions to a struct element, is a string (in the C source at least).
  • And Tcl arrays are addressed with strings.
  • So: Tcl arrays can be addressed with any string that corresponds to a C addressing, plus the huge set of all other strings the C compiler would bark at ;-) -- RS

 What: dictionary
 Where: http://www.purl.org/net/bonnet/pub/dictionary.tar.gz
 Description: Implementation of a Tcl dictionary object type.  A
  dictionary is equivalent to an array that is a first class object
  which can be used as proc arguments, inside other objects, etc.  It's
  contents looks like a list to Tcl commands, but internally things are
  stored similar to a hash.
  Requires Tcl 8.2 or newer.  Currently at v1.0.1.
 Updated: 01/2000
 Contact: mailto:[email protected]

The TclX extension also has a construct called keyed lists. Here's an example:

    keylset ttyFields ttyName tty1a
    keylset ttyFields baudRate 57600
    keylset ttyFields parity strip

Then when you would type

    echo $ttyFields

you would see:

    {ttyName tty1a} {baudRate 57600} {parity strip}

Using arrays, one would set things up as:

    set ttyFields(ttyName)  tty1a
    set ttyFields(baudRate) 57600
    set ttyFields(parity)   strip

Is there any practical or theoretical difference between the following?

    array set foo [list bar baz]

    set foo(bar) baz

The only thing that comes to mind is that in the latter example, set may not have been able to handle arrays in earlier versions of Tcl. But if that's the case, then neither would array. :P I generally prefer the second form as it is easier to read but upon further reflection, I can see how the first example would make it easier to set a bunch of keys at once.


One Perl function that I've not seen in a Tcl library is the reverse function - when given a hash, it returns a new hash, whose keys are the argument hash's values and whose values are the argument hash's keys.

How would that work in Tcl? Somehow, one would need to pass the name of an array to the proc, and then, from that, one would have to somehow work through the array.

 proc reversehash { hash } {
    variable working
        set old [array get $hash]
        set new [list]
        foreach {key value} $old {
                lappend new $value $key
        }
        array set newhash $new
 }

But how would you return the new list? RS: You have to give both names, as an array cannot be returned as value:

 proc array'reverse {oldName newName} {
    upvar 1 $oldName old $newName new
    foreach {key value} [array get old] {set new($value) $key}
 }

#---- Testing:

 % array set a1 {1 eins 2 zwei 3 drei}
 % array'reverse a1 a2
 % array get a2
 drei 3 zwei 2 eins 1

slebetman 10 Sep 2004: One nice thing I like about tcl arrays is that it is trivial to do simple searches on the key value:

 % array set test {1.1 first 1.2 second 2.1 third}
 % array get test 1.*
 1.1 first 1.2 sedcond

This feature is surprisingly very powerful because it allows you to group sets of data in a single array. In the above example, the format of the key is group.element. In the example, we got all the elements of group 1. To get the first element of all groups, we can simply do:

 % array get test *.1
 1.1 first 2.1 third

garynkill 27 Oct 2005 Sorry I have a question as regards to array set command. I did the command as coded below and notice that the list i got from the 'puts' is in absymal manner. So i am wondering why does this happen?

 % array set NodeR [list D1 0 D2 0 D3 0 D4 0 D5 0 Power 0]
 % puts "[array get NodeR];
   D4 0 Power 0 D5 D1 0 D2 0 D3 0 

MS An array is not a list, the different elements have no order. Note that you could define exactly the same array with

 % array set NodeR [list D1 25 Power 0 D2 0 D1 0 D5 0 D4 0 D3 0]

That means: array set will create an array (hash map) using in order the key-value pairs it finds in its list argument (where later entries overwrite earlier ones, if there are duplicate keys). These are stored internally in a hash map. OTOH, array get traverses the hash table and retrieves key-value pairs to store in the result; the order of traversal is the in-memory order of the keys within the hash table, essentially random (note that if it was too far from random, that would mean that the hashing algorithm is not too good).


kostix just faced one problem (resulting from sloppy programming) which is better to be known by the next newcomer:

 foreach key [array names foo] {
  if {$foo($key) ...} {
   ...
   array unset foo $key
  }
 }

In some cases, the if statement inside the loop failed due to non-existing element. The problem was that the keys in my array had strange values, like

 someprefix*

Of course, the

 array unset foo $key

with such a value in the key variable effectively deleted all the keys sharing the someprefix prefix.

So always unset array values using [unset array($key)] syntax to be safe. (It also should work faster, I think.)


avi Is there a way one could pass a object from C/C++ such that Tcl treats it as an array [Apart from sending a list of list and further manipulating it]?

AMG: Yes, use the Tcl_*SetVar*() functions [L1 ] to create the array, setting one element at a time. This a somewhat messy way to do things, since the C/C++ code has to give the object a name; it can't just pass it around as an anonymous value. But this approach will do as you ask. Tcl arrays are groups of variables; Tcl dicts are groups of values. You almost certainly want dict. The only time I ever want arrays is when I really need the properties of variables, e.g. I am creating traces on the elements.


See also: A simple database in an array