Version 9 of Trees as nested lists

Updated 2009-02-11 08:32:18 by AlexCaldwell

if 0 {Richard Suchenwirth 2003-03-18 - Trees are a fundamental graph and data structure. They consist of nodes, where each node has a content (e.g. a string) and zero or more child nodes. Each node except the "root" has exactly one parent node.

In Tcl, trees can be represented in various ways. Since 8.4, nested lists make an efficient tree representation, where access goes with lset and multi-index lindex. The following routine traverses such a tree and returns a list of node indices that can be used to iterate with foreach and lindex to access each node in sequence.

For a silly example, consider the following directory tree:

 /
 /bin
 /usr
 /usr/bin
 /usr/local
 /usr/local/bin
 /usr/local/lib

which as a nested list, where each node is a directory, can very compactly be written as

 {"" bin {usr bin {local bin lib}}}

The list of all node indices is

 0 1 {2 0} {2 1} {2 2 0} {2 2 1} {2 2 2}

which, when iterated over with lindex, enumerates all directory basenames:

 0:
 1:bin
 2 0:usr
 2 1:bin
 2 2 0:local
 2 2 1:bin
 2 2 2:lib

and, with the additional code in absolutePath and fromRoot, can also reconstruct the absolute paths (with an anomaly in /, which comes as empty string - but that's not a bug of these algorithms, but a peculiarity that Unix-like pathnames, Tk widget pathnames, Tcl namespace names have in common):

 0:,
 1:bin,/bin
 2 0:usr,/usr
 2 1:bin,/usr/bin
 2 2 0:local,/usr/local
 2 2 1:bin,/usr/local/bin
 2 2 2:lib,/usr/local/lib

We observe that "leaves", i.e. nodes which have no children, have a nonzero as last index element, while nodes with children have a zero there. If you chop the trailing zero off, lindex gives you the subtree starting from that node.

Lars H, 13 May 2005: No, all node indices should end with a zero. The only reason it works to leave it out in this example is that all node contents are equal to their list-quoted forms. Consider the following tree (containing the Swedish monarchs of the Vasa dynasty):

  set tree {{Gustav Vasa} {{Erik XIV}} {{Johan III} Sigismund} {{Karl IX} {{Gustav II Adolf} Kristina}}}

The proper index of Erik XIV is "1 0" despite him being a leaf, because he has a space in his name. "1" is the index for the subtree containing only that leaf, but that still has list-quoting in place.

An alternative would have been to represent each node as a pair {content children}, where the children are again a list. This would however lead to a much higher nesting depth:

 {"" {{bin {}} {usr {{bin {}} {local {{bin {}} {lib {}}}}}}}}

while making the algorithms slightly simpler. As the procedures are written once, but hopefully used on many big trees, I decided for the simpler data representation. }

 proc traverse {tree {prefix ""}} {
    set res {}
    if {[llength $tree]>1} {
        lappend res [concat $prefix 0] ;# content
        set i 0
        foreach child [lrange $tree 1 end] {
            eval lappend res [traverse $child [concat $prefix [incr i]]]
        }
    } else {set res [list $prefix]} ;# leaf
    set res
 }
 proc fromRoot index {
    set res {}
    set path {}
    foreach i $index {
        if $i {lappend res [concat $path 0]}
        lappend path $i
    }
    lappend res $index
 }
 proc absolutePath {tree index} {
    set res {}
    foreach i [fromRoot $index] {
        lappend res [lindex $tree $i]
    }
    set res
 }

if 0 {Of course we want to modify such trees too - here's a first shot which inserts into a given tree, at given node ID, another (sub)tree (which might of course be just a single node) as child of the specified node. See usage examples in the test code at bottom: }

 proc addSubtree {tree index subtree} {
    if {[lindex $index end]==0} {set index [lrange $index 0 end-1]}
    set node [lindex $tree $index]
    lappend node $subtree
    lset tree $index $node
    set tree
 }

#------------ Testing:

 set testtree {"" bin {usr bin {local bin lib}}}
 puts [traverse $testtree]
 foreach i [traverse $testtree] {
    puts $i:[lindex $testtree $i],[join [absolutePath $testtree $i] /]
 }
 set testtree [addSubtree $testtree {2 0} lib]
 set testtree [addSubtree $testtree {2 3} tcl8.4]
 puts "added /usr/lib"
 foreach i [traverse $testtree] {
    puts $i:[lindex $testtree $i],[join [absolutePath $testtree $i] /]
 }

if 0 {More tree routines: Determining the parent of a node, given its index, can be done without having to look at the tree itself. We have to distinguish the cases of a non-leaf, where we first chop off the trailing 0. As parent for root an empty string is returned by the second, "one-armed" if - the empty string result should be checked after calling: it is a valid index, but one that returns the whole tree if used with lindex. }

 proc parent index {
    if {[lindex $index end]==0} {set index [lrange $index 0 end-1]}
    if {$index != ""} {lreplace $index end end 0}
 }

Pierre Coueffin - 2005-05-12 I wanted to embed a graphical representation of some nested-list type trees that my code generates into a paper I am working on. I came up with the following code, which needs BWidget to work.

 proc gui {w tree} {
    package require BWidget
    Tree $w

    foreach i [lrange [traverse $tree] 1 end] {
        set parent [absolutePath $tree [parent $i]]
        if {$parent == {{}} } {
           set parent root
        }

        set node [absolutePath $tree $i]
        set text [lindex $tree $i]
        $w insert end $parent $node -text $text -open yes
    }

    return $w
 }

 if 0 {

Then I do:

 pack [gui .tree $treedata]

and I can generate a nice postscript representation by abusing the knowledge that BWidget uses a canvas widget to draw trees on:

 set postscript [.tree.c postscript]

 }

 if 0 {I noticed one problem with the gui proc above. It assigns node names based on the text in the list item. The

problem with that is that in some applications, you may have duplicate text in different list members, so it raises an error if it tries to name a new node with duplicate text from a different list item if the nodes are at the same level in the tree. You need a unique ID for each new node. I changed the proc like this, and it seems to work better.

 }

 proc gui {w tree} {
    package require BWidget
    Tree $w -width 30 -height 35
    # the variable i here seem to be unique to each node, so I used that for the node names
    foreach i [lrange [traverse $tree] 1 end] {

        set parent [absolutePath $tree [parent $i]]

        if {$parent == {{}} } {
            set parent root
        }
        if {$parent != "root"} {
            set parent [parent $i]
        }




        set node "$i"

        set text [lindex $tree $i]
        $w insert end $parent $node -text $text -open yes
    }

    return $w
 }

if 0 {Here's some test data I used that did not work with the first proc, but does work with the above changes:

     set treedata {
     "" {{appearance} {no_distress}}
           {Viral}
        {{appearance} {toxic}}
          {Epiglottitis}
        {{appearance}  {uncomfortable}

            {{exudate}  {no} 
                {{ulcers} {no}}
                {Viral}
                {{ulcers} {yes}}
                {{Herpes_Stomatitis}}}

            {{exudate} {yes} 
                {{temp} {100.5}}
                {Mononucleosis}
                {{temp} {101}}
                  {Mononucleosis}
                {{temp} {103}}
                {Streptococcal}}}
   }

}

if 0 {


See also:


}