A lightweight digraph package

NEM 23 May 2007: I've just moved house, and found myself last night needing to precompute some route plans for a group of agents embodied in a virtual environment. As I haven't even got a working phone line in the new house, I had no access to the wiki, and no copy of tcllib on my laptop. So I hacked up this lightweight directed-graph (digraph) package with some utilities for searching the space with a variety of strategies (breadth-first, depth-first, A* etc). The code is quite lightweight and fairly speedy. It makes extensive use of dicts and some other 8.5isms, but I expect a back-port to 8.4 wouldn't be too difficult.

Version 0.2: Fixed some bugs in the search and made node/edge methods take a graph variable rather than the graph itself.

Version 0.3: More fixes...

Version 0.4: And a few more. It is now possible to prevent the current node being expanded in digraph search by using the continue command. This can be useful if you want to avoid expanding the same node twice from different search paths. (Useful when using the uniform-cost strategy which guarantees in certain conditions that any node you encounter for the first time will be on the minimum route to that node).

Version 0.5: The digraph is now strict: there can only be a single edge in each direction between a pair of nodes (i.e., edges are now stored in a dict rather than a list). I may make this a switch.

Version 0.6 (2008-11-11): Nodes and edges can now store arbitrary option/value pairs. The syntax of the node and edge commands has been extended so that you can do:

 digraph node mygraph $mynode -name Foo -age 99 ...
 digraph edge mygraph $src $dest -cost 12 -colour blue ...

You can use these same commands to set the options as well (i.e., if the node already exists then it just sets any options given), and to query an option value:

 digraph node mygraph $mynode -name
 digraph edge mygraph $src $dest -age

There is an incompatibility in that the cost associated with an edge is now just an attribute like any other, and must be specified as such: -cost 12, rather than being a distinguished argument to the edge command. The layout of the graph data-structure has also now changed to accommodate these additions, so if you were serialising graphs and reading them back in then the format will have changed in this version (as it also changed in 0.5).

Also added is an implementation of Dijkstra's algorithm, which I apparently wrote some time ago. The interface for that will probably change and/or merge with the search strategies.

The -progress callbacks for various iterator methods are now called less often (a max of 100 times) in order to improve performance. Should only really make a difference for large graphs.

I'm planning more substantial changes to this package to neaten up the API some more and round out the feature-set. That will probably result in a 1.0 version with proper docs and test-suite, to arrive some time in January I hope.

# digraph.tcl --
#
#       A basic directed graph library.
#
# Copyright (c) 2007-2008 Neil Madden ([email protected]).
# License: Tcl-style

package require Tcl     8.5
package provide digraph 0.6

namespace eval digraph {
    namespace export {[a-z]*}
    namespace ensemble create

    # DATA-STRUCTURE LAYOUT
    # A graph consists of a nested dictionary data structure. At the top level,
    # it is a mapping from node names to nodes:
    #   graph   = string -> node
    # Each node is then a dictionary consisting of two elements: a set of
    # out-going nodes and a set of user-supplied attributes:
    #   node    = { edges: edge-dict, attrs: dict }
    # The edges themselves are also dictionaries, mapping destination nodes to
    # dictionaries of attributes associated with that edge:
    #   edge-dict = string -> dict
    # Thus to get the outgoing edges from a node, we can use:
    #   dict get $graph $node edges
    # To get the attributes of a node we use:
    #   dict get $graph $node attrs
    # To get the attributes associated with an edge, we use
    #   dict get $graph $node edges $destination
    # To test for existence of a node:
    #   dict exists $graph $node
    # To test for existence of an edge, use:
    #   dict exists $graph $source edges $dest
    #
    # As you can see, the data structure is very much optimised for finding the
    # outgoing edges from a node (a constant-time operation). Finding the edges
    # coming *into* a node requires a search over the entire graph, resulting in
    # time linear in the total number of edges in the graph. For these kinds of
    # operations, it is best to store two graphs -- one storing the inverse
    # edges of the other.

    # create --
    #
    #       Create an empty graph.
    #
    proc create {} { return [dict create] }

    # node graph name ?-option ?value ...?? --
    #
    #       Create a new node in the graph if it doesn't already exist. An
    #       arbitrary collection of options and values can be associated with
    #       the node, which will be stored with it. If no options are given,
    #       then this returns the dictionary of option/value pairs for that node
    #       (or an empty dictionary if it was created). If just an option is
    #       given then the value of that option is returned. Otherwise, the
    #       options given are set in the dictionary.
    #
    proc node {graphVar name args} {
        upvar 1 $graphVar graph
        if {[llength $args] == 1} {
            return [dict get $graph $name attrs [lindex $args 0]]
        } else {
            if {![dict exists $graph $name]} {
                dict set graph $name [dict create edges {} attrs {}]
            }
            # Merge options with existing dict - might be quicker to use [dict
            # merge] here.
            foreach {option value} $args {
                dict set graph $name attrs $option $value
            }
            return [dict get $graph $name attrs]
        }
    }

    proc exists {graph node} { dict exists $graph $node }

    # edge graph source dest ?-option ?value ...?? --
    #
    #       Create a directed edge (arc) from source to dest in the graph,
    #       with an optional cost weighting. As with nodes, an arbitrary
    #       dictionary of option/value pairs can be associated with each edge.
    #       The option '-cost' is reserved for storing the cost associated with
    #       an edge.
    #
    proc edge {graphVar source dest args} {
        upvar 1 $graphVar graph
        # ensure nodes exist
        node graph $source; node graph $dest
        if {[llength $args] == 1} {
            # Get one attribute
            return [dict get $graph $source edges $dest [lindex $args 0]]
        } else {
            if {[dict exists $graph $source edges $dest]} {
                set default [dict get $graph $source edges $dest]
            } else {
                set default [dict create -cost 1]
            }
            dict set graph $source edges $dest [dict merge $default $args]
            return [dict get $graph $source edges $dest]
        }
    }

    # order graph --
    #
    #       Return the number of nodes in the graph
    #
    proc order graph { dict size $graph }

    # size graph --
    #
    #       Return the number of edges in the graph
    #
    proc size graph {
        set total 0
        dict for {_ v} $graph { incr total [dict size [dict get $v edges]] }
        return $total
    }

    # degree graph node --
    #
    #      Return the number of outgoing edges from a node in the graph.
    #
    proc degree {graph node} { dict size [dict get $graph $node edges] }

    # nodes graph ?-progress cmd? nodeVar script --
    #
    #       Iterate through the nodes in the graph (in arbitrary order)
    #       calling a script for each node. If the -progress option is given
    #       then this command is called periodically with the current iteration
    #       count and the total number of nodes.
    #
    proc nodes {graph args} {
        array set opts { -progress "" }
        array set opts [lrange $args 0 end-2]
        lassign [lrange $args end-1 end] nodeVar script
        upvar 1 $nodeVar node
        set total [order $graph]
        set period [expr {max(1,$total / 100)}]
        set done 0
        foreach node [dict keys $graph] {
            if {[llength $opts(-progress)] && [incr done] % $period == 0} {
                uplevel #0 [linsert $opts(-progress) end $done $total]
            }
            Uplevel 1 $script 
        }
    }

    # edges graph ?-progess cmd? {sourceVar destVar attrVar} script --
    #
    #      Iterate through the edges in the graph (in arbitrary order)
    #      calling a script for each edge, passing in the source and
    #      destination nodes of the edge and the attributes. If the -progress
    #      option is given then this command is called with the current
    #      iteration count and the total number of edges periodically.
    #
    proc edges {graph args} {
        array set opts { -progress "" }
        array set opts [lrange $args 0 end-2]
        lassign [lrange $args end-1 end] vars script
        lassign $vars sourceVar destVar attrVar
        upvar 1 $sourceVar source $destVar dest $attrVar attrs
        # calculate total number of edges
        set total [size $graph]
        set done 0
        set period [expr {max(1,$total / 100)}]
        dict for {source dests} $graph {
            dict for {dest attrs} [dict get $dests edges] {
                if {[llength $opts(-progress)] && [incr done] % $period == 0} {
                    uplevel #0 [linsert $opts(-progress) end $done $total]
                }
                Uplevel 1 $script
            }
        }
    }

    # search graph source strategy {pathVar costVar} script --
    #
    #       Walk the entire graph structure from the source node without
    #       cycles. New nodes are searched in the order determined by the
    #       strategy command which queues new nodes in some order. For each
    #       new path encountered the script is called with the current path
    #       from the source node to that node and the total cost of that path.
    #
    proc search {graph source strategy vars script} {
        lassign $vars pathVar costVar
        upvar 1 $pathVar path $costVar cost
        set queue [list [list [list $source] 0]]
        do {
            # pop first item off queue
            set next [lindex $queue 0]
            set queue [lreplace $queue [set queue 0] 0]
            lassign $next path cost
            set node [lindex $path end]
            # visit the node
            Uplevel 1 $script
            # expand this node
            expand $graph $node $path {newPath newCost} {
                # Invoke the strategy command for the new path.
                set cmd $strategy
                lappend cmd $queue $newPath [expr {$cost+$newCost}]
                # Setting the queue and cmd vars to "" here is an optimisation
                # that ensures that the underlying queue has ref-count 1 when
                # passed to the strategy, allowing in-place modification by Tcl.
                set queue ""
                set queue [uplevel #0 $cmd][set cmd ""]
            }
        } while {[llength $queue]}
    }

    # expand graph node path {pathVar costVar} script --
    #
    #       Iterate through all new paths that are reachable from "node" via
    #       an outgoing edge.
    #
    proc expand {graph node path vars script} {
        lassign $vars pathVar costVar
        upvar 1 $pathVar newPath $costVar cost
        if {![dict exists $graph $node]} { return }
        dict for {dest attrs} [dict get $graph $node edges] {
            set cost [dict get $attrs -cost]
            if {$dest ni $path} {
                set newPath [linsert $path end $dest]
                Uplevel 1 $script
            }
        }
    }

    # Implementation of Dijkstra's shortest-paths algorithm.
    # Doesn't really fit with the rest of the search strategies currently so is
    # a stand-alone method. These should be unified (perhaps via a general fold
    # method).
    proc dijkstra {graph start} {
        set D [dict create]    ;# distance function
        set P [dict create]    ;# previous node in path
        set V [list]
        nodes $graph node { 
            lappend V $node
            dict set D $node Inf 
        }
        dict set D $start 0
        set n [order $graph]

        for {set i 0} {$i < $n} {incr i} {
            set v [MinVertex $graph $D V]
            if {[dict get $D $v] == Inf} { break } ;# unreachable
            expand $graph $v {} {path cost} {
                set node [lindex $path end]
                if {[dict get $D $node] > ([dict get $D $v] + $cost)} {
                    dict set P $node $v
                    dict set D $node [expr {[dict get $D $v]+$cost}]
                }
            }
        }
        return [list $D $P]
    }

    proc MinVertex {graph D vVar} {
        upvar 1 $vVar V
        set idx 0
        set v [lindex $V 0]
        for {set i 0} {$i < [llength $V]} {incr i} {
            set node [lindex $V $i]
            if {[dict get $D $node] < [dict get $D $v]} {
                set v $node
                set idx $i
            }
        }
        set V [lreplace $V $idx $idx]
        return $v
    }

    # readdot dotfile --
    #
    #       Read a Graphviz .dot file and construct a digraph based on the
    #       information contained in it. Currently this just forms a simple
    #       unweighted graph and ignores most information in the file.
    #
    proc readdot {dotfile} {
        set in [open $dotfile]
        set g [create]
        foreach line [split [read $in] \n] {
            if {[regexp {(.*)->(.*)[;\[]} $line -> source dest]} {
                edge g [string trim $source] [string trim $dest]
            } elseif {[regexp {(\S+);} $line -> node]} {
                node g [string trim $node]
            }
        }
        return $g
    }

    proc savedot {graph name dotfile} {
        set out [open $dotfile w]
        puts $out "digraph $name {"
        nodes $graph node {
            puts $out "    $node;"
        }
        edges $graph {src dest attrs} {
            set cost [dict get $attrs -cost]
            puts $out "    $src -> $dest \[weight=$cost\];"
        }
        puts $out "}"
        close $out
    }

    # STRATEGIES...
    proc depth-first {queue element cost} {
        # add to front of queue
        linsert $queue 0 [list $element $cost]
    }
    proc breadth-first {queue element cost} {
        # add to rear of queue
        linsert $queue end [list $element $cost]
    }
    # expand nodes with lowest cost-so-far first
    proc uniform-cost {queue element cost} {
        # find place to insert in queue
        set idx 0
        foreach elem $queue {
            lassign $elem path itemcost
            if {$itemcost >= $cost} { break }
            incr idx
        }
        linsert $queue $idx [list $element $cost]
    }
    # expand nodes with best (lowest) heuristic cost first
    proc best-first {f queue element cost} {
        set node [lindex $element end]
        set h [uplevel #0 [linsert $f end $node]]
        # find place to insert in queue
        set idx 0
        foreach item $queue {
            lassign $item path itemg itemh
            if {$itemh >= $h} { break }
            incr idx
        }
        linsert $queue $idx [list $element $cost $h]
    }
    proc a-star {heuristic queue element cost} {
        best-first [namespace code [list A* $heuristic $cost]] \
            $queue $element $cost
    }
    proc A* {h cost element} { expr {[uplevel #0 [lappend h $element]]+$cost} }
    
    # Private helper utilities.
    proc do {script _while_ cond} {
        set rc [catch { uplevel 1 $script } result]
        if {$rc == 3} { return }
        if {$rc != 0 && $rc != 4} { return -code $rc $result }
        Uplevel 1 [list while $cond $script]
    }
    proc Uplevel {level script} {
        if {[string is integer -strict $level]} { incr level }
        set rc [catch { uplevel $level $script } result]
        if {$rc == 0} { return }
        if {$rc != 1} { return -code $rc $result }
        return -code error -errorcode $::errorCode -errorinfo $::errorInfo $result
    }
}

Example

I'll try and update my pages on state space searching to use this simple package now. Until then, here is the example code that I was using to generate least-cost routes (I store the routes in an existing SQLite database):

 package require Tcl     8.5
 package require digraph 0.1
 package require sqlite3 3.3

 proc routes {graph file} {
   sqlite3 db $file
   db transaction {
     digraph nodes $graph -progress ::progress source {
       set visited [dict create]
       digraph search $graph $source {digraph least-cost} path {
         set dest [lindex $path end]
         if {[dict exists $visited $dest]} { continue } ;# don't expand again
         dict set visited $dest 1
         db eval {INSERT INTO route VALUES($source, $dest, $path)}
       }
     }
   }
   db close
 }
 proc progress {done total} {
   set percent [expr {int(double($done)/double($total)*100)}]
   puts -nonewline [format "\rProgress: |%-50.50s| %3d%% (%d/%d)" \
     [string repeat "=" [expr {$percent/2}]]> $percent \
     $done $total]
   flush stdout
 }
 set rhun [digraph readdot ~/Desktop/Rhun_map.dot]
 puts "Computing routes..."
 routes $rhun rhun_routes.db
 puts "\nDone."

Lars H: Hmm... Is this something like the real problem you're after, or just an example? (Minimum-cost path (a.k.a. shortest path) is one of the basic graph problems for which there exist low order polynomial (i.e., fast) algorithms, but searching the space of all paths isn't the right way to do it.) The digraph package itself looks nice, but a problem for many algorithms is that it only gives you quick access to edges leaving a vertex; it is often also necessary to know which edges are coming in to a vertex.

NEM: Well, for my problems I only care about outgoing edges, hence the bias in the package. I'd be interested in pointers to faster minimum-cost algorithms, but the basic approach here worked fine -- it generated all ~170000 routes I needed in under 10 minutes, which is fine for a one-off operation. I'm currently working on an implementation of Dijkstra's algorithm, if that is what you were referring to.

Lars H: OK, the must-know algorithm for Shortest Path is Dijkstra's algorithm [L1 ]. One that is more immediately lends itself to your package is the Moore–Bellman–Ford algorithm [L2 ], which can be coded as follows:

  proc MBF {G source} {
     digraph nodes $G v {set l($v) infinity}
     set l($source) 0
     for {set count [digraph size $G]} {$count>1} {incr count -1} {
        digraph edges $G {u v c} {
           if {$l($u) != "infinity" && $l($v) > $l($u) + $c} then {
              set l($v) [expr {$l($u) + $c}]
              set p($v) $u
           }
        }
     }
     set res {}
     foreach v [array names p] {
        lappend res $v [list $l($v) $p($v)]
     }
     return $res
  }

(Untested, I don't have an 8.5 immediately available.) MBF returns a dictionary with one entry for every vertex reachable from $source, whose entries are lists of two elements. The first element is the length of the shortest (cost of the cheapest) path from $source to that vertex. The second element is the previous vertex on that path. Dijkstra's algorithm is faster (particularly so for dense graphs), whereas Moore–Bellman–Ford can cope with cycles of negative weight in the graph.

NEM: Thanks for this, I'll try and integrate it tonight. (Note: in recent Tcls expr recognises "Inf" as infinity). I'll also see about implementing Floyd's algorithm or some other all-pairs shortest-path algorithm, as that is what I primarily needed this package for (I am pre-computing all routes between all pairs of locations in a virtual environment so that agents only have to do local path-finding at runtime). Feel free to update the code directly if you have good ideas (e.g., new algorithms, or if you want to adjust the representation to make it quicker to find incoming edges).

BTW, the "size" procedure should really be named "order". The size of a graph is the number of edges.

NEM: Fixed, thanks.

LV Is there any thought to adding your package to tcllib, or at least incorporating what you have learned into whatever relevant module might already be there?

NEM Tcllib already has a graph package which is roughly a superset of the functionality here. The main differences are that the graphs here are values (strings) rather than opaque "objects", and the search methods here are different to the ones in tcllib. The tcllib graph has a "walk" method which includes breadth-first and depth-first searches, and could perhaps incorporate a general search strategy queuing function, as here, but I don't know.

APE 01 August 2013 Using SQLite to store graph related data, isn't it a first step toward the NoSQL world ?

See also Heuristic Searches for another example.

RFox - 2013-08-01 12:28:43

The topic fooled me completely : http://en.wikipedia.org/wiki/Digraph_(computing%29%|%http://en.wikipedia.org/wiki/Digraph_(computing)%|%

Category Graph theory

Category Data Structure