Graph theory in Tcl

Richard Suchenwirth 2001-11-05 - Another weekend fun project, so don't blame me for an inaccuracy or two.. Graphs as in Graph theory (see also tcllib::graph; http://www.math.fau.edu/locke/graphthe.htm for a compact page of definitions and pointers) are defined as a tuple of: a set of nodes or "vertices" (which might be visualized as points on a canvas); a set of edges (e.g. lines between two points); and an incidence map that lists, for each edge, from and to which vertex it goes (they might be identical, forming a small "loop").

In Tcl, "everything is a string", so let's see how to describe a graph as a string most ergonomically (so it can be specified as a string constant, without the array bookkeeping used in Simple tree layout. The set of edges is completely contained in the incidence map, so we don't need it (except for parallel edges, we might need a distinguishing feature). The set of nodes is also found in the incidence map - except for singleton nodes that are reached by no edge. Let's start with naming edges after their start and end node, separated by a comma (as is done in incidence maps too), so

 A,B B,C C,A

describes a triangle with three nodes (A,B,C) and three edges, completely connecting the nodes. We also allow node names (like "A") in the graph description, and for singleton nodes, they're mandatory - othwise those nodes would get lost. Here's how to derive the node set from the graph description with a temporary array:

 proc nodes g {
     foreach i $g {
         foreach {from to} [split $i ,] break
         set t($from) .
         if {$to!=""} {set t($to) .} ;# avoid singleton nodes
     }
     lsort [array names t]
 }
 proc nodes g {lsort -unique [string map {, " "} $g]} ;# BBH - needs 8.3.4
 proc order g {llength [nodes $g]} 

 # MDR This performs as the first; it omits singleton nodes.
 proc nodes g {lsort -unique [concat {*}[lmap {x} $g {set y [string map {, " "} $x]; if {[llength $y]==2} {set y}}]]}

And, as our graph description may be a hybrid of edge and node names, we supply a simple filter that keeps only the edges:

 proc edges g {
     set res {}
     foreach i $g {
        if {[string match *,* $i]} {lappend res $i}
     }
     set res
 }

A feature in graph theory is the degree of a node, i.e. the number of edges that touch ("incide with") this node, where "looping" edges like X,X are counted twice:

 proc degree {node g} {
    set res 0
    foreach edge [edges $g] {
        foreach {from to} [split $edge ,] break
        if {$from==$node} {incr res}
        if {$to==$node}   {incr res}
    }
    set res
 }

(see also Degree histograms). This is sufficient to implement the Königsberg bridges test (on an adapted simple graph, where digits stand for bridges and letters for "crossroads"):

 proc isEulerian g {
    foreach node [nodes $g] {
        if {[degree $node $g]%2} {return 0}
    }
    return 1
 }
 isEulerian {A,1 A,2 A,3 B,1 B,4 B,7 C,2 C,3 C,5 C,6 D,5 D,6 D,7}

See Math sugar on how to rewrite this into the fancier

 proc isEulerian g {requires no node in $g {[degree $node $g]%2}}

Edges connect nodes, but not necessarily all nodes are connected with all. Rather, a graph may "fall apart" into connected components which internally are linked by edges, but no path leads from one c.c. to the other. Here's how to get the disjoint node sets of connected components, in which first each node is put into a separate set, and sets are merged if an edge connects them:

 proc connectedComponents g {
    set i 0
    foreach node [nodes $g] {
        set set($node) [incr i]
        set ccs($i) $node
    }
    foreach edge [edges $g] {
        foreach {from to} [split $edge ,] break
        set s1 $set($from)
        set s2 $set($to)
        if {$s1!=$s2} {
            foreach node $ccs($s2) {
                set set($node) $s1
                lappend ccs($s1) $node
            }
            unset ccs($s2) ;# 
        }
    }
    set res {}
    foreach i [array names ccs] {lappend res $ccs($i)}
    set res
 }

Having connectedComponents, it's very easy to find out whether a graph is continuous:

 proc isContinuous g {expr {[llength [connectedComponents $g]]==1}}

A continuous graph, where any two nodes are connected by exactly one path (sequence of zero or more edges), is called a tree. We simplify the tree predicate by using the Euler characteristic that in a tree, the number of edges is the number of nodes minus one:

 proc isTree g {
    expr [isContinuous $g] && [llength [edges $g]]==[llength [nodes $g]]-1
 }

A bridge is an edge which, when removed, would cause the graph to fall apart, breaking off another connected component. All edges in a tree are bridges, and here's how to check that property for a given edge of a continuous graph, by removing it (but leaving references to the from and to nodes''), and seeing whether it's still one:

 proc isBridge {edge g} {
    set pos [lsearch $g $edge]
    foreach {from to} [split $edge ,] break
    expr {![isContinuous [lreplace $g $pos $pos $from $to]]}
 }

Studies like the famous Four-color problem operate on maps, which are 2-continuous graphs, i.e. between any two nodes, there are two or more non-crossing paths. This requires that no edge is a bridge, but the preceding test for degree<2 should respond faster:

 proc is2Continuous g {
    foreach node [nodes $g] {if {[degree $node $g]<2} {return 0}}
    foreach edge [edges $g] {if [isBridge $edge $g]   {return 0}}
    return 1
 }

Disclaimer: These conditions are not enough, they will fail on graphs "pivoting" on a vertex, like

 A,B A,C B,C B,E B,F E,F ;# two triangles touching in point B

But the task of determining the "countries" of which a 'map' is made of, was too much for this weekend - if you have something in this direction, just put it here!

A possible application for the routines above is a maze generator:

start from a rectangular grid of h*w nodes
connect neighbors with horizontal/vertical edges, altogether 2*h*w-h-w
while the number of edges is > h*w-1 (=tree), pick a random edge; if it is not a bridge, remove it
render the resulting tree as a maze

Read more on paths and cycles in undirected graphs, More graph theory on measuring distance, determining the centre, and much more ;-)

Now for some routines dealing with directed graphs, where, in contrast to general graphs, edges are "one-way streets" and would have to be explicitly mirrored (X,Y Y,X) for two-way usability. Here, edges also define predecessor-successor relations, and there may be a set of initial nodes which have no predecessor:

 proc initials dg {
    foreach edge [edges $dg] {set t($edge) .} ;# dump into array
    set res {}
    foreach node [nodes $dg] {
        if {[array names t *,$node]==""} {lappend res $node}
    }
    set res
 }

Terminal nodes ("leaves") can be equally enumerated, by just changing the array names pattern to "$node,*".

A frequent requirement in directed graphs is to enumerate all paths, which is possible if the graph contains no cycles - is "acyclic" (directed acyclic graphs, DAGs, are popular in formal syntax studies). If however the DG is cyclic, the following routine just returns an empty list (I thought of raising an error, but checking for {} is just more convenient ;-):

 proc paths dg {
    foreach edge [edges $dg] {
        foreach {from to} [split $edge ,] break
        lappend sons($from) $to
    }
    set res {}
    set todo [initials $dg]
    while {[llength $todo]} {
        set path [lpop todo]
        set last [lindex $path end]
        if [info exists sons($last)] {
            foreach son $sons($last) {
                    if {[lsearch $path $son]>=0} {return ""} ;# cycle
                lappend todo [concat $path $son]
            }
        } else {
            lappend res $path
        }
    }
    set res
 }
 proc lpop _L {
    upvar 1 $_L L
    set res [lindex $L 0]
    set L [lrange $L 1 end]
    set res
 }
 # so the check for acyclic property is just
 proc isAcyclic dg {expr {[paths $dg]!=""}}

If you want to enumerate all paths, a breadth-first search like above is sufficient. But if edges have associated costs, and the aim is to find the shortest path (with the least sum of costs) first, then various search strategies continue this story - see Searching A Star in Space.

DKF - Things get much more interesting when you don't declare the list of nodes and edges directly, but instead describe them via a function that maps from each node to the set of nodes which that node has an out-going edge to. There's a lot more to this (which forms the basis of technologies like on-the-fly deadlock- and model-checking) but it's a bit much for me to write about tonight.

RS: Like in The word-chain game?

Stephen D. Cohen added:

In the first paragraph above, should:

and an incidence map that lists for each edge from and to which edge it goes (they might be identical, forming a small "loop").

actually read:

and an incidence map that lists, for each edge, from and to which node it goes (they might be identical, forming a small "loop").

??? RS Of course - thank you for the correction!

David Cobac added : I'm writing a little graph editor. http://www.larochelle-innovation.com/tcltk/208

AM See also: Plotting a simple graph

Oh and a short expose on Colouring graphs with a light-weight introduction to the four colour problem.

AM As Tcllib has a graph module for manipulating general graphs, I used that to implement a basic algorithm for finding the shortest paths in a graph.

The calculus of opetopes in XML using Tcl and tDOM

Opetopes are combinatorial structures parametrising higher-dimensional many-in/one-out operations, and can be seen as higher-dimensional generalisations of trees.