dict

Difference between version 219 and 222 - Previous - Next
'''[http://www.tcl.tk/man/tcl/TclCmd/dict.htm%|%dict]''' is a [Tcl Commands%|%built-in]  command for creating and manipulating '''dictionaries''' ('''dicts''').



** Documentation **

   [http://www.tcl.tk/man/tcl/TclCmd/dict.htm%|%official reference]:   

   [Changes in Tcl/Tk 8.5]:   Links to the [TIP]s that specify this command.

   [TIP] [http://www.tcl.tk/cgi-bin/tct/tip/111.html%|%111]:   



** Reference **

   [http://en.wikipedia.org/wiki/Associative_array%|%Associative Array], Wikipedia:   

   [http://en.wikipedia.org/wiki/Comparison_of_programming_languages_(mapping)#Tcl%|%Comparison of programming languages (mapping)], Wikipedia:   



** See Also **

   [container]:   A guide to containers in Tcl.

   [deep dict]:   A [deep list] as a dictionary.
   [dict extensions]:   

   [dict tips and tricks]:   

   [forward-compatible dict]:   A [pure-Tcl] 8.4 implementation of `dict`.

   [dictn]:   An experimental wrapper over `dict`, focused on nested lists.

   [dict in snit]:   

   [dictionaries as arrays]:   

   [dictionaries as arrays by Stubs]:   

   [everything is a dict]:   

   [dict probe]:   

   [My dict with]:   Add keys inside dict with. 

   [dict vs array speed]:   

   [eDictor]:   A visual editor for nested Tcl dicts and lists saved as data files.

   [dictree]:   Another visual editor for nested Tcl dicts using [ttk::treeview].

   [dict lappend2]:   
   [http://stevehavelka.com/tcl-dict-operation-nested-merge/%|%dict merge], by [st3ve] :   A merge routine that handles nested dictionaries

   [http://stevehavelka.com/tcl-dict-operations-difference-symmetric/%|%diff], by [st3ve]:   nested dictionary diff routines:

   [most-recently-used]:   Leverages the fact that the value `[dict keys]` is sorted by order of insertion to maintain a cache of values.
   [pdict: Pretty print a dict%|%pdict]:   Pretty-print a dictionary

   [nxs]:   Nested heterogeneous data structures.

   [shorthand dict set]:   Merge `[dict set]` into `[set]`.

   [http://stevehavelka.com/blog/2014/07/tcl-dict-operation-nested-merge/%|%Another Tcl dict operation: nested merge]:   

   [ycl eav], by [PYK]:   `dset`, `dget`, and `dexists` implement a nested dictionary backed by a database.  Currently, [sqlite] is supported. 
   [ycl%|%ycl env], by [PYK]:   Environments are like dictionaries but can be searched upwards as well as downwards, can contain out-of-band information, and can be traversed with [xpath] queries.  Since they are implemented over [tdom] they can be serialized into [XML] and other [data format%|%formats].


** Synopsis **

*** Reading  ***

    :   '''[dict create]''' ?''key value ...''?

    :   '''[dict exists]''' ''dictionaryValue key'' ?''key ...''?

    :   '''[dict filter]''' ''dictionaryValue filterType arg'' ?''arg ...''?

    :   '''[dict for]''' '''{'''''keyVar valueVar'''''}''' ''dictionaryValue body''

    :   '''[dict get]''' ''dictionaryValue'' ?''key ...''?

    :   '''[dict info]''' ''dictionaryValue''

    :   '''[dict keys]''' ''dictionaryValue'' ?''globPattern''?

    :   '''[dict map]''' '''{'''''keyVar valueVar'''''}''' ''dictionaryValue body''

    :   '''[dict merge]''' ?''dictionaryValue ...''?

    :   '''[dict remove]''' ''dictionaryValue'' ?''key ...''?

    :   '''[dict replace]''' ''dictionaryValue'' ?''key value ...''?

    :   '''[dict size]''' ''dictionaryValue''

    :   '''[dict values]''' ''dictionaryValue'' ?''globPattern''?

*** Writing ***

    :   '''[dict append]''' ''dictionaryVariable key'' ?''string ...''?

    :   '''[dict incr]''' ''dictionaryVariable key'' ?''increment''?

    :   '''[dict lappend]''' ''dictionaryVariable key'' ?''value ...''?

    :   '''[dict set]''' ''dictionaryVariable key'' ?''key ...''? ''value''

    :   '''[dict unset]''' ''dictionaryVariable key'' ?''key ...''?

    :   '''[dict update]''' ''dictionaryVariable key varName'' ?''key varName ...''? ''body''

    :   '''[dict with]''' ''dictionaryVariable'' ?''key ...''? ''body''



*** Additional Commands ***

   '''`[dictutils%|%dictutils apply]`''':   Like `[apply]`, but unpack a dictionary into the new local [level] first.

   '''`[dictutils%|%dictutils capture]`''':   Reflect the variable available at some [level] into a new dictionary.

   '''`[dictutils%|%dictutils dotpath]`''':   Treat a value as keys delimited by `.` and accesse its value.

   '''`[dictutils%|%dictutils equal]`''':   Determine whether two dictionaries are equal.

   '''`[dictutils%|%dictutils nlappend]`''':   Like `[dict append]`, but can target nested dictionaries.

   '''`[dictutils%|%dictutils switch]`''':   Perform the given transformations on specified values in a dictionary.

   '''`[dictutils%|%dictutils transmute]`''':   Like `[dictutils%|%dictutils switch]`, but return a new dictionary rather than muting the original. 

   '''`[dictutils%|%dictutils witharray]`''':   Like `[dict with]`, but use a temporary array rather than creating variables.

   '''`[pdict: Pretty print a dict%|%pdict]`''':   Pretty-print a `dict`.

   '''`[ycl%|%ycl dict var]`''':   Link a variable to a path in a dictionary.

   '''`[ycl%|%ycl dict varu]`''':   Like `[ycl%|%ycl dict var]`, but also delete the path in the dictionary when the variable is deleted.
   
   '''`[ycl%|%ycl dict search]`''':   Like `[lsearch]`, but match keys in the dictionary and return results in reverse order.

   '''`[ycl%|%ycl proc checkargs]`''':   Unpack a dictionary into the local level, processing the dictionary along the way.  Specify mandatory, optional, and default values, validate fields, constrain the number of occurrences, and transform keys and values.



** Description **

a '''dict''', or '''dictionary''', is a `[list]` containing an even number of
words. All dictionaries are lists, but not all lists are dictionaries.  
Words at even indices are keys, and words at odd indices are the
corresponding values.  When an item is added to a dictionary, the string
representation for that item is generated if it does not already exist.

A dictionary is the Tcl analogue of what other [programming
language%|%languages] call a ''associative array'', ''[hash]'', or ''map''.
Internally, Tcl uses a [hash] table to implement a dictionary, so its
performance characteristics are quite different from those of a plain [list].
To avoid the performance cost of [shimmering], use only `dict` or ˇlistˇ
commands to modify a dictionary.

The conversion between internal representations of a dictionary and a list is
lossless. A round-trip conversion from `dict` to `[list]` and back again yields
the original value.  Duplicate keys are retained during such a conversion.  Converting a
list to a dictionary has the effect of generating string representations for
all values in the list at even indices that didn't previously have string
representations.

The order of elements in a dictionary is the order in which they were
added.  `[list]` operations on a dictionary value reflect this.   With `[dict
create]`, keys and their corresponding values are added from left to right as
they occur in the command.

'''dict''' was introduced in Tcl 8.5.

[CecilWesterhof] Default there is no way to check if a value is a dict, that is why I created [isDict].



** Etymology **

[LV] 2007-12-21: Anyone have an explanation why the data structure managed by
this command is called '''dictionary'''? What is the relationship between the
name and the behavior? Just curious...

[Lars H]: This is what they're called in for example [PostScript], so I suppose
the usage is fairly old and wide-spread. Etymologically the origin is probably
the word translation dictionary as shown below, but there is also the question
of where that made it into the realm of computer science.

[LV]: Thanks. Knowing that it is the name used in other languages helps. When I
hear people refer to a [Tcl] [array] as a [hash], I wonder where the references
come from and why.

[Lars H]: Perhaps [Perl]? Basically, a ''dictionary'' corresponds to the
set-theoretic function concept: The domain is the set of keys, and to each key
is assigned exactly one value, period. A ''hash'' is a data structure that can
be used to efficiently implement dictionaries, but there are alternatives such
as balanced trees and [skiplist]s.



** Simple example **

Example dictionary value (English-French):

======
% set e_f [dict create dog chien cat chat cow vache horse cheval]
dog chien cat chat cow vache horse cheval
======

The resulting value reflects the order of the arguments.



** More examples **

[pcm]: Here is a useful example: an array of dicts.

======
array set U {
    tom  { Name {Tom Brown}  Sex M  Age 19  Class {4 5} }
    mary { Name {Mary Brown} Sex F  Age 16  Class {5}   }
    sam  { Name {Sam Spade}  Sex M  Age 19  Class {3 4} }
}

dict set     U(tom)  Sex F
dict append  U(sam)  Name { Jr}
dict lappend U(sam)  Class 5
dict incr    U(mary) Age
dict set     U(tom)  Sax Y;  # Creates a new key.
dict set     U(bill) Sax N;  # Creates a new entry.

parray U
======

[AMW]: Here is another example, storing filesystem information in a dict that
reflects the hierarchy (taken from '''[dictree]'''):

======
proc dictdir {{dir .}} {
    set d {} 
    foreach subdir [lsort [glob -directory $dir -nocomplain -types d *]] {
        dict set d [file tail $dir]/ [dictdir $subdir]
    }
    foreach fname [lsort [glob -directory $dir -nocomplain -types f *]] {
        file stat $fname fstat
        # unsorted but faster:
        # dict set d [file tail $dir]/ [file tail $fname] [array get fstat]
        # sorted:
        foreach item [lsort [array names fstat]] {
            dict set d [file tail $dir]/ [file tail $fname] $item $fstat($item)
        }
    }
    return $d
}
======


** Implementation of ordered `dict` **

With revision 1.53 of tclDictObj.c (2007-11-20 20:43:11), `dict` was changed
to preserve item order.  The log entry by [dkf] says:

    :   Changed the underlying implementation of the hash table used in dictionaries to additionally keep all entries in the hash table in a linked list, which is only ever added to at the end. This makes iteration over all entries in the dictionary in key insertion order a trivial operation, and so cleans up a great deal of complexity relating to dictionary representation and stability of iteration order.

This change makes the string representations of pure dicts ''more'' sensitive
to the way the dictionary was created than they used to be, but the dependency
was always there.

[DKF]: Technically, sort of. No more so than with lists. What it does do is
eliminate the fact that certain sequences of operations previously could
(without changing the final mapping from the initial mapping) cause the
reordering of all elements in the dictionary. Basically, the sequence of
operations was to add dummy mappings until the backing hash was rebuilt,
causing the allocation of mapping entries differently to the hash table
buckets. Strip out the dummy mappings then to get back the "original" dict,
except now with the elements in a different order. Moreover the representation
of a dict with a new mapping added to it would probably be different if you
started with a dict that had gone through this process than if you had not.

To summarize, the old dict code's iteration order (which was the natural
iteration order of the underlying Tcl_HashTable) was unstable and exposed far
too much information about the history of the dictionary value. The new dict's
stable iteration order depends only on the order in which new keys are inserted
and old keys removed, and could be modeled (inefficiently) using lists or
strings; its behaviour requires no understanding of the implementation to
explain.



** Dictionaries and arrays **

The result of `[array get]` and the argument expected by `[array set]`
are in the same format as a dict, so you can naturally switch between them:

======
set myDict [array get myArray]
dict set myDict foo bar bob $newValue
array set myArray $myDict
======

[RHS]: was doing some coding for the Language Shootout, and implemented the
nsieve test using both arrays and dicts. Results can be found on the [Dict vs
Array Speed] page

[AMG]: Be aware that `dict` preserves key order but `[array]` does not.

** Shimmering between dict and list **


[AMG]: I wonder if shimmering between dict and list can be minimized by
unifying their [Tcl_Obj] internal data structures.  Just add a Tcl_HashTable
pointer to struct List, and get rid of struct Dict.  If the Tcl_HashTable
pointer is NULL, a dict representation doesn't currently exist for the list.
If the pointer isn't NULL, it points to a hash table that indexes the list.
Dict operations performed on a list Tcl_Obj will use the Tcl_HashTable,
creating it if it doesn't already exist.  Read-only list operations leave the
Tcl_HashTable untouched.  Destructive list operations set the Tcl_HashTable
pointer to NULL.  (Or maybe they only do this when changing an even-numbered,
i.e. key, element.)  struct List and struct Dict are Tcl internals, so (as far
as I know) this change can be made without breaking compatibility.

This would be very useful to me in [Wibble] for creating dict-like objects with
potential (but rare) key duplication.  `[lappend]` is used to make the data
structure.  If duplicate elements need to be kept separate, ordinary list
operations are used to extract the data.  If duplicate elements don't exist or
can be ignored (the common case), dict operations are used instead, and the
first such operation automatically creates the hash table.  Actually this mode
of operation is currently available, but the hash table would be regenerated
every time a `dict` operation is performed.

Hmm, would epoch and chain also need to be imported into struct List?  If so,
instead of directly putting a `Tcl_HashTable` pointer into struct List, I would
instead use a struct Dict pointer, to avoid growing struct List more than
necessary.  struct Dict would have everything removed that's already present in
struct List, so it would be reduced to table, epoch, and chain.



** Invalid and Noncanonical Returned Values **

[AMG]: When given only one argument, `[dict merge]`, `[dict remove]`, and `[dict replace]` return that argument directly.  Therefore, they can return non-dicts or noncanonical dicts, which is contrary to the documentation.  See [http://core.tcl.tk/tcl/info/1b0266d8bbc649bcb4df5dc4a1edca238536262b] for the bug writeup.  A fix to this bug would introduce a potential incompatibility, in case any existing programs depend on this direct pass-through behavior even in the case of bad dicts.

[AMG]: This has now been fixed [http://core.tcl.tk/tcl/ci/847fa2261d016aac0c70ea1c39075827f50e3603].  [[[dict merge]]] still doesn't force canonicalization, but "fixing" that would cost more than it's worth.



** Care needed when extracting a dict from a list **

[LV] 2007-10-17: [RS] wrote on comp.lang.tcl the following helpful advice:

Be careful with the argument "args" - that is a list. If you pass one dict in
args, retrieve it with:

======
set dict [lindex $args 0]
======

Because dicts always need an even number of key value key value..



** `dict` vs TclX's [keyed list]? **

[LV]:  Has anyone compared [TclX]'s '''key list''' commands to ''dict'' to see
how much of the functionality of a key list is missing if one were to try to
use a dict in place of a TclX key list?

[Lars H]: This might be a subject for the [Complex data structures] page.



** `dict` for Tcl 8.4 **

If you want `dict` support in Tcl 8.4, download and compile
http://pascal.scheffers.net/software/tclDict-8.5.2.tar.gz (Windows binary
[http://pascal.scheffers.net/stan/dict/dict-8.5.2-win32.zip]).

Version 8.5.3 of this extension is available from the [Tcl Extension Archive]
(teapot) [http://teapot.activestate.com/entity/name/dict/ver/8.5.3/index].

[AMG]: Does anyone have a [pure-Tcl] implementation of `dict` for Tcl 8.4.7?

[RLE]: What about [forward-compatible dict]?

[AMG]: That works, thanks!

** Stubs version of dict for Tcl 8.4 **

[PS] 2004-04-14: (updated 2004-05-12): After a bit of chatting on the chat
between dgp, dkf and myself, I have created a stubs package version of `dict`
for tcl-8.4. You can download it from

http://pascal.scheffers.net/software/tclDict-8.5.2.tar.gz also see
[http://pascal.scheffers.net/software/]

Status:

   * Builds, installs, passes all tests on SuSE/Linux 9.0 and Windows (MingW)
   * Has an extension stubs table (untested), public C API exported.
   * Needs more testing

The package will only load in tcl-8.4.x, it uses features first introduced in
Tcl-8.4. I have no idea what it would take to make it work in 8.3. It will
gracefully not load in tcl-8.5+, and not raise an error.

Update 2005-11-18:

I have updated the tclDict code, by syncing it with Tcl8.5 CVS. The 8.5.2 Url
above is the new version. This version behaves just like the real thing, except
for bignum support by `[dict incr]`, which I regressed to wide int support from
the previous version of tclDict.

A [Microsoft Windows] binary version can be downloaded from
http://pascal.scheffers.net/stan/dict/dict-8.5.2-win32.zip.

The dict extension is being maintained in my subversion archive at
http://svn.scheffers.net/misc

- Pascal.

----

I tried installing this ... but get the following error:

======none
./configure --with-tcl=/usr/local/lib/tcl8.4 --with-tclinclude=/usr/local/include/tcl8.4
.................. skip some stuff ....................
checking for Tcl public headers... /usr/local/include/tcl8.4
checking for building with threads... no (default)
checking how to build libraries... shared
checking if 64bit support is enabled... no
checking if 64bit Sparc VIS support is requested... no
checking system version (for dynamic loading)... ./configure: line 9892: syntax error near unexpected token `('
./configure: line 9892: `    case `(ac_space=' '; set | grep ac_space) 2>&1` in'

(/tmp/tclDict-8.5.2)-> uname -sr
FreeBSD 5.4-RELEASE-p16
======

I'll keep checking here for any help.  this is using tclDict-8.5.2.tar.gz (it
says use that for tcl 8.4...)

[NEM]: Is this comment still in need of answering? See also [comp.lang.tcl] for general questions.

[MJ]: It seems it is. This is caused by tclDict using an outdated tcl.m4 which
has a quoting issue with [bash]. The tcl.m4 (and generated configure) in the
download should be fixed.



** Anyone considered dict equal? **

[LV]:  Has anyone considered either submitting a [TIP] for a '''dict equal'''
command or at least writing up a first draft of the script for [tcllib]?  It
would have to have a '''callback''-ish type feature, ala [lsort], because you
couldn't know, in general, whether two things are "equal" or not...

[NEM]: Such a command is not too hard to write, see [dictutils] for one
implementation. Note that a general scheme for equality of arbitrary data
structures is quite tricky to write. See [unification] for one approach.




** Original `dict` implementation **

 What: dict
 Where: http://home.earthlink.net/~m-patton/dict-0.01.tar.gz
        http://purl.org/tcl/tip/111.html
 Description: Tcl implementation of TIP 111 - a new Tcl data type called
        dictionary, which consists of an array of values and manipulators
        of those values.
        Currently at version 0.1 .
 Updated: 12/2002
 Contact: See web site



** Archived discussion **

[NEM] 2007-02-12: Moved lots of discussion on the design of dict to [dict
discussion] in an attempt to focus this page more on the actual uses and
techniques for working with dicts. Apologies if I moved too much by mistake.

----

** Internal copies of dicts **

[HaO] 2015-10-14:

If you use dicts as a data store, be aware, that one changed key will cause the whole dict to be copied, if another reference exists.
This is only important for huge dicts...

Example:

======tcl
set d {}
for {set i 1} {i<1000000} {incr i} {
    dict set d k$i d$i
}
proc m {dictIn} {
    # This will copy the whole dict as the caller still has a reference which must stay unmodified
    dict set dictIn d 1
}
m $d
======

It is often wise to use an array of dicts as data store.

[PYK] 2015-10-14:  This is a great example of why procedures that modify inputs
often take the name of a variable rather than the value itself, and also of
where the [unshared value idiom] might be useful.  Also, a simple example of
the "array of dicts as data store" would be a nice addition here.


----

**Error handling in dict unset**

***No error if the key does not exist***

[HaO] 2018-07-02: Unsetting keys which do not exist do not raise any error:

======tcl
% set d {}
% dict unset d a
======

***dict unset raises error if a parent key of a nested dict is not present***

Nevertheless, an error is raised, if a parent key is not present:

======tcl
% set d {}
% dict unset d a b
key "a" not known in dictionary
======

(see [https://core.tcl.tk/tcl/info/d22ad20205509329%|%ticket d22ad202%|%])

***Error if value is not a dict***

If an addressed dict (or subdict) may not converted to a dict, an error is rised.

======tcl
% set d {1}
% dict unset d a
missing value to go with key
% set d {a 1}
% dict unset d a b
missing value to go with key
======

----

**unset item in nested dict: artefacts and recursive unset**

[HaO] 2018-07-02: Given a tree-like storage within a dict:

======tcl
% set d {}
% dict set d a b c 1
a {b {c 1}}
% dict set d a2 b c 1
a {b {c 1}} a2 {b {c 1}}
% dict set d a2 b2 c 1
a {b {c 1}} a2 {b {c 1} b2 {c 1}}
======

Now unsetting an item may leave an empty parent list as artefact:

======tcl
% dict unset d a b c
a {b {}} a2 {b {c 1} b2 {c 1}}
% dict unset d a2 b c
a {b {}} a2 {b {} b2 {c 1}}
======

See [dictutils] for the utility "dictTreeUnset" to remove those artefacts.

<<categories>> Command | Package | Data Structure