Version 35 of Complex data structures

Updated 2003-01-18 02:09:04

e-vandals versus the wiki moved...


Title: Complex data structures

Arjen Markus Quite often the question arises: how to model complex data structures? As Tcl allows various solutions, it seemed to me a good idea to start a new page on this subject.

escargo 17 Jan 2003 - Just for the sake of illustration, could you give some examples of these complex data structures? Are we trying to talk about AVL trees, red-black trees, tries, DAGs, petri nets, bags, or what all else?


Possible solutions:

  • Use one of the object-oriented extensions
  • Use TclX keyed lists
  • Use arrays (associative arrays allow indices that are strings, rather than mere numbers or tuples of numbers)
  • Use tagged lists
  • Use namespaces

Pros and cons of the object-oriented extensions:

P.M.


Pros and cons of TclX keyed lists:

  • Pro: a keyed list can handle nested data structures. For example, the keyed list "{ID 106} {NAME {{FIRST Frank} {LAST Zappa}}}" represents the following hierarchical structure:
     ID = 106
     NAME:
        FIRST = Frank
        LAST = Zappa
  • Pro: keyed lists are first-class data items. They can be passed in and out of procs like any other list or string.
  • Pro: it is possible to query the existence and value of a key in a single command, without the speed drawback of catch.
  • Con: the access syntax is more verbose than for arrays.
  • Con: access to nested elements requires multiple commands.

Pros and cons of arrays:

  • Pro: by one name, you gather all the pieces of information
     set personal_data($name,birthday) "..."
     set personal_data($name,telnum) "..."
     ...
  • Pro: easy access to all the data pieces.
  • Pro: implemented internally by hash tables, hence access is fast.
  • Con: arrays are not "first-class" variables in Tcl, you will have to use upvar or array get and set commands to transfer them between procedures.
  • Con: essentially only one level of structuring. Otherwise you would have to use structured indices to store the values.

Pros and cons of tagged lists:

  • Pro: tagged lists {NAME ... {BIRTHDAY ...} {TELNUM ...} ...} are simple to use, they are first-class variables (they are simply a form of list variables). They can have any level of substructure.
  • Con: replacing values deep inside such a structure can be a nuisance. (But this may force you to use constructor and accessor procs to do this - nice modular design).
  • Con: because of the need for linear or binary searching, tagged lists will be slower than arrays for large data structures.

Pros and cons of namespaces:

  • Pro: separate variables into independent groups, all variables are first-class. No special syntax required

.

  • Pro: implemented internally with hash tables, for O(1) access.
  • Con: if you need many namespaces it can be difficult to maintain the application.
  • Con: variables will exist independent of any procedure, so they have a non-local scope. Again, maintenance problems will arise.

What do you mean by namespaces here? I can think of two uses: One simply as a way to group a bunch of variabl es, and the other with included procs to act more like an object. Perhaps both approaches need entries?


Of course it is possible to combine these solutions - to get the best (or worst) out of them.


RS 2002-07-05 - In the Tcl chatroom we discussed the potential of nested lists for data storage, given the new "multi-dimensional" lindex and lset starting from 8.4. Steve Landers wished for structures so list elements could be addressed with meaningful strings rather than positional indices only. Here's a nice minimal solution that I came up with:

 proc struct {name elements} {interp alias {} $name {} lsearch $elements}

 #... used to define a "struct" (map from names to positions):
 struct staff-> {firstname lastname birthdate phone}

 #... used to abstract such details in coding accesses:
 lset database $employeeID [staff-> phone] 123-4567890

Letting the struct name end in "->" is cute sugar. Note however that after it (and not before it) there must be whitespace.

Nother way out would be to use "staff:" -jcw

RS Yep - see Implementing enumerated types for a two-way map that generates both "staff:" and "staff@" for the inverse direction.

Brett Schwarz - humm, I just found this page. I actually put a package in tcllib that does what RS is talking about (essentially). It is called 'record', and is located under the 'struct' module in tcllib. It is only in CVS now (as of Jan 2003), but should go into next tcllib release. Here is how you would use it:

 package require struct

 namespace import struct::*

 # define records
 record phones {
    home
    work
    cell
 }

 record addr {
   name
   street
   {country USA}
   {record phones phlist}
 }

 # instantiate
 addr inst1
 # assign members
 inst1 configure -name Brett -street main
 # get a values (returns list)
 inst1 cget -name -street
 # you also use the dot notation
 set name [inst1.name]
 inst1.street = Main
 # the phlist is a nested record, and is accesses like this:
 inst1 configure -phlist.home 425-555-1212
 inst1.phlist.work = 206-555-1212
 # following would return a list of the phones (home work cell)
 set phones [inst1.phlist]

Note the 'country' member. I assigned it a default value of "USA". Anyways, you get the picture. See the docs in CVS for more info...