Arjen Markus Quite often the question arises: how to model complex data structures? As Tcl allows various solutions, it seemed to me a good idea to start a new page on this subject.
escargo 2003-01-17: Just for the sake of illustration, could you give some examples of these complex data structures? Are we trying to talk about AVL trees, red-black trees, tries, DAGs, petri nets, bags, or what all else?
AM 2003-01-24: When I set up this page, I was thinking of C structs, Fortran arrays and other "passive" data structures. Note that the Tcllib contains a number of modules for managing data structures - matrix, tree, ...
awalczak: I deal with tcl code generated by idl2tcl. In my case IDL code is modeling struct with other structs and collections inside. I would be very thankfull for example how to build in TCL a big struct-like construct step-by-step (having smaller units defined earlier and combining them together before return form a function).
RBTree is a Tcl extension that implements Red-Black trees, having features of an ordered list and direct access as array. - RS found explanations here (link broken 2013-05-12), but from there I gather it's just well-balanced binary trees, and on insertion are (by possible rotation) brought back to well-balancedness.
DRH: Complex data structures can be modeled using an in-memory SQLite database. (Or use a standard disk-resident database if persistence is important.) Think of each table as a class and each row as a structure instance or object. With this in mind, it is trivial to build arrays, queues, stacks, or graphs of objects.
escargo 2005-09-23: Possible, yes, but not necessarily efficient. If what I want is a Tclish version of Judy arrays tries, or a PATRICIA tree , I lose all the specialized performance advantages if I have to build them out of the heavy plumbing of an in-memory database.
DRH: escargo seems to confuse abstract data structures (ex: arrays, bags, DAGs, queues, etc) with their implementations (red/black trees, tries, hash tables, linked lists, etc). The Zen of Tcl is that you don't worry about the details of implementation, you focus instead on the higher-level aspects of the problem. An in-memory SQL database provides O(logN) search and O(N) traversal of data using a variety of underlying algorithms. The specific algorithms used (B-tree, hashing, linked lists, etc) are irrelevant to the programmer using SQL. All she cares about is that the results come up quickly. And an SQL database provides the same asymptotic performance as the exotic data structures mentioned by escargo above. The exotic data structures may be faster by a constant factor in certain scenarios. But generally speaking, that constant is measured in percent. And the speedup only applies for narrowly defined problems. If performance is so critical that I am willing to write intricate, customized code to achieve a speedup of 20 or 30%, then I'm going to code the problem in C, not in TCL. For 95% of the things that I and most other people do, where ease and generality of implementation and simplicity of maintenance are the driving concerns, then a generalized abstraction such TCL's built-in arrays or an in-memory SQL database is a far better choice than trying to employ an exotic data storage and retrieval algorithm.
escargo: I can't program with abstract data structures; I need to use their implementations. (And I do understand the differences between policy and mechanism, too.) I never said that the data structures would be implemented in Tcl. I fully expect that the implementations would be in C (or C++) with a wrapper that allows Tcl to use them. An in-memory SQL database seems far more exotic to me than Tcl access to an existing C library for Judy arrays, etc. I certainly believe that the KISS principle applies. That constant k that differentiates performance between an ideal implementation of a complex data structure and one layered on an in-memory SQL database can be significant.
And there are times when I have a program that is in the 5% where such things matter. (Also, I'm aware that writing a simpler but naive implementation and profiling it to see where it's slow is better than picking a complex algorithm just to speed up part of a program guessing that you know ahead of time where the bottlenecks will be.)
L: I agree wholeheartedly with escargo. It is nice in many cases to ignore the "details of implementation and focus on the higher-level aspects of the problem". However, sometimes this is impossible. Example: I have a 15GB text file that I want to index by creating a trie of all the words in it to record at which offset in the file this word first occurs. Because I know that the same word will be repeated a million times
kanryu 2008-06-08: huddle is a complex container with abstract container algorithms (such as dict and list) which can contain multirank nodes.
Ordered roughly after how "exotic" they are, taking into account requirements, user base, and history:
When you have a bunch of related data, collect it in an object (e.g. separate fields are separate instance variables).
Larry Smith 2008-06-08: Lost and Most are very compact, likely solutions to this problem since their overhead for object-orientation is minimal.
ID = 106 NAME: FIRST = Frank LAST = Zappa
[So, what commands create this keyed list?]
The thing I like best about keyedlist is how it uses the dot notation when defining the key. You can build nested structures with a single command. The previous example would be create like this.
keylset mykeylist ID 106 NAME.FIRST Frank NAME.LAST Zappa set mykeylist {ID 106} {NAME {{FIRST Frank} {LAST Zappa}}} You retrieve the same way. keylget mykeylist NAME.LAST Zappa
I definitely prefer keyedlist over dict when dealing with nest structures.
Separate fields are separate entries in an array (a data structure which has been in the standard Tcl distribution for quite some time).
set personal_data($name,birthday) "..." set personal_data($name,telnum) "..." ...
AIN (Array In Namespace) is a package for this kind of data structure, providing commands for e.g. copying an array and generating names for arrays.
dicts became available in Tcl 8.5.
set personal_data [dict create name [dict create FIRST "Frank" LAST "Zappa"] ID 106] dict set personal_data telnum $phone dict get personal_data name FIRST ; # => Frank ...
A tagged list is a list where the first element is a "type name" and subsequent entries contain data. Each field has a separate position in the list, but the meaning of a list element may depend on the "type name"; in C terms, the data type of a tagged list is a union of structs. [So, where does one find an implementation of this data structure? Well, data is code shows some things that can be done with it. If that is too hand-on, then you might prefer to use something like struct::tree instead.]
The comparison on this page is in a sense unfair to tagged lists, since it focuses on table-style data structures. While tagged lists can be used to implement such things, their primary use-case is rather for tree-like data structures. For an application of this, see A little XML parser.
The nice thing about this wiki is that it is relatively simple to add more content. If the existing comparison seeems unfair, feel free to add examples that demonstrate the additional functionality. It appears, to this reader, that the page is not intent on being an advocate for one thing over another, but instead providing the means to describe a data structure's benefits and detriments, so that someone can more easily determine when the appropriate times are to make use of them.
When you have a bunch of related data, collect it as variables in a particular namespace. Separate data sets can have separate namespaces.
anonymous: What do you mean by namespaces here? I can think of two uses: One use is simply using namespaces as a way to group a bunch of variables, and the other would include procs, to create something that acts more like an object. Perhaps both approaches need entries?
A key–value list is a list where every even index item is a "key" (arbitrary identifier string) and the following (odd index) item is the value associated with that key.
ID = 106 NAME: FIRST = Frank LAST = Zappa
The most convenient way of accessing information in a key–value list is to unroll it into an array. (In practice, key–value lists spend a lot of time unrolled into arrays.) Typical code might look like
proc update_KVL {KVL some more args} { array set A $KVL # Do the update return [array get A] }
It is often convenient to use a key–value list for storing "the settings". Consider
proc do_something {settings_KVL args} { # Store defaults into array A array set A $settings_KVL; # Override with specific settings # Use data in A when doing something }
jcw 2003-05-12: FWIW, there's a little "ihash" package in critlib coded in C (and I have a minimal pure Tcl script for doing the same somewhere), which makes key-value lists (which I used to call interleaved lists) in the same performance order as arrays, because it maintains a hash table alongside. This is transparent, but you have to be careful to avoid shimmering or things will slow down as the hash table gets rebuilt way too often. Probably also some more info, search for ihash on the wiki.
Of course it is possible to combine these solutions - to get the best (or worst) out of them.
Ken: Are there other ways to create C-like structure besides TclX keyed lists; for example like the "record" below indicated by Brett? Regarding the solution you've given, can you kindly explain how to get it working cause when i tried the code, it failed to recognize the package. And when i tried to use TclX keyed lists, tclsh8.4 fail to recognize the commands. How to get TclX working under Tcl? And is TclX part of Tcl? if yes, why is it separate? Sorry new to this language.
Lars H: Tclx is a package. You need to be certain to install Tclx as a package, then code a package require Tclx to load it. But dicts are probably a better choice than keyed lists nowadays.''
Ken: Another question: How do i create an array of structures? Is it there any way of doing it besides manually naming it directly, for example:
set ArrayStructure(0.value) 1 set ArrayStructure(1.value) 2
Lars H: Depends on how you implement your "structures". Maybe C struct is Tcl! will give you some ideas? But if you're new to Tcl then you're likely to be thinking in too C-ish terms about these matters. Perhaps a list of structs would suffice as good? Also take a look at the struct package.
RS 2002-07-05: In the Tcl chatroom we discussed the potential of nested lists for data storage, given the new "multi-dimensional" lindex and lset starting from 8.4. Steve Landers wished for structures so list elements could be addressed with meaningful strings rather than positional indices only. Here's a nice minimal solution that I came up with:
proc struct {name elements} {interp alias {} $name {} lsearch $elements} #... used to define a "struct" (map from names to positions): struct staff-> {firstname lastname birthdate phone} #... used to abstract such details in coding accesses: lset database $employeeID [staff-> phone] 123-4567890
Letting the struct name end in "->" is cute sugar. Note however that after it (and not before it) there must be whitespace.
jcw: Nother way out would be to use "staff:"
RS: Yep - see Implementing enumerated types for a two-way map that generates both "staff:" and "staff@" for the inverse direction.
Brett Schwarz: humm, I just found this page. I actually put a package in tcllib that does what RS is talking about (essentially). It is called 'record', and is located under the 'struct' module in tcllib. It is only in CVS now (as of Jan 2003), but should go into next tcllib release. Here is how you would use it:
package require struct namespace import struct::* # define records record phones { home work cell } record addr { name street {country USA} {record phones phlist} } # instantiate addr inst1 # assign members inst1 configure -name Brett -street main # get a values (returns list) inst1 cget -name -street # you also use the dot notation set name [inst1.name] inst1.street = Main # the phlist is a nested record, and is accesses like this: inst1 configure -phlist.home 425-555-1212 inst1.phlist.work = 206-555-1212 # following would return a list of the phones (home work cell) set phones [inst1.phlist]
Note the 'country' member. I assigned it a default value of "USA". Anyways, you get the picture. See the docs in CVS for more info...
George Peter Staplin: I couldn't resist writing an example of doing that with my megapkg structure extension.
load ./structure.so proc phone {} { set p [structure] # Initialize the phone record $p -home "" -work "" -cell "" return $p } proc addr {} { set addr [structure] $addr -name "" -street "" -country USA -phoneNumbers [phone] return $addr } set addr [addr] # This prevents setting new keys by accident lock.structure.creation $addr $addr -name "John Doe" -street "Logan" [$addr -phoneNumbers] -home 555-1234 -work 555-7812 puts "The person's name is [$addr -name]. He lives on [$addr -street]. His work number is [[$addr -phoneNumbers] -work]."
The - in a key is just some added sugar. Any data can be used for a key (even binary data).
escargo 2005-08-30: Are there any extensions or packages for implementing a trie in Tcl? (I didn't say tries because that's too easy to hit when searching.) A hashing Trie in Tcl
What other complex data structures are available? The tcllib distribution has graph, sets, matrix, pool, prioqueue, queue, record, skiplist, stack, and tree.