More graph theory

Richard Suchenwirth - More graph theory in Tcl again, and some thoughts on simplicity vs. efficiency. This weekend I played with building distance maps (that hold the distance, in edges, between each pair of nodes, in a matrix), so M($x,$y) gives the length of the shortest path between nodes $x and $y, or "simulated infinity", 999999, if there is no connecting path. The map is implemented as an array in caller's scope, passed in by name and upvared - notice the fancy name *M to indicate it is not passed by value ;-

 proc distanceMap {g *M} {
    upvar 1 ${*M} M
    set nodes [nodes $g]
    foreach i $nodes {
        foreach j $nodes {
            set M($i,$j) unknown ;# set up matrix elements
        }
        set M($i,$i) 0 ;# main diagonal
    }
    foreach edge [edges $g] {
        foreach {from to} [split $edge ,] break
        set M($from,$to) 1
        set M($to,$from) 1 ;# undirected graphs, so apply for both ways
    }
    foreach i [array names M] {
        if {$M($i)=="unknown"} {
            foreach {from to} [split $i ,] break
            set dist [distance $g $from $to]
            set M($from,$to) $dist
            set M($to,$from) $dist
        }
    }
    join [map:format M $nodes] \n
 }

The shortest path is found by breadth-first search, bounded by the shortest we found so far, and cycle detection:

 proc distance {g from to} {

    set length 999999   ;# simulated infinity
    set todo $from      ;# list of things to try
    while {[llength $todo]} {
        set try [lpop todo] ;# first thing to do
        set last [lindex $try end]
        foreach node [neighbors $last $g] {
            if {$node==$to} {
                if {[llength $try]<$length} {
                    set length [llength $try]
                }
                break    ;# found a path
            } elseif {[lsearch $try $node]>=0} {
                continue ;# detected a cycle
            } elseif {[llength $try]<$length} {
                lappend todo [concat $try [list $node]]
            } ;# lappend and lpop make a FIFO queue
        }
    }
    set length
 }

# Formatting the distance matrix for easier reading:

 proc map:format {*M nodes} {
    upvar 1 ${*M} M
    set res [list [concat - $nodes max]] 
    foreach i $nodes {
        set t $i
        foreach j $nodes {
            lappend t $M($i,$j)
        }
        lappend t [max [lrange $t 1 end]]
        lappend res $t
    }
    set res
 }

Having distances, we can also find out the centre of a graph, those vertices that share the least maximal distance (radius) from all other nodes. This is most evident in a star, while in complete graphs, all nodes are central. In contrast, the diameter of a graph is just the longest maximal distance, where rad(G) <= diam(G) <= 2 rad(G). This routine returns all three in one list, since their calculation is closely intertwined:

 proc centre,radius,diameter g {
    distanceMap $g M
    set nodes [nodes $g]
    set distances {}
    foreach node $nodes {
        foreach n2 $nodes {lappend distances $M($node,$n2)}
        set maxdist($node) [max $distances]
        lappend maxdistances $maxdist($node)
    }
    set radius   [min $maxdistances]
    set diameter [max $maxdistances]
    set centre {}
    foreach node $nodes {
        if {$maxdist($node)==$radius} {
            lappend centre $node
        }
    }
    list $centre $radius $diameter
 }

It's nice that we can name a procedure like this, but it's ugly that we need to lump three functionalities together like above. I seem to have come to the limits of my simple approach: to pass the graph by value as graph description list (mostly edges), which initially is just a string like a,b b,c c,d. I still like this format for its minimality, but intermediate results need to be computed repeatedly, which is unelegant and inefficient (and when graphs get bigger, you'll need efficiency very much ;-). An alternative would be to create objects that store things like edgelist, nodelist, distance map, neighbors table... but it is clumsier. An array would not suffice, since the distance map itself is an array, which cannot be contained in another array. So what remains is to use a namespace for data storage. I like namespaces for procedure-hiding; however, variables in namespaces are just less visible than global variables, but as persistent and potentially name-clashing as globals are. Maybe a future Tcl will make this dream of simple data, simple interfaces, and efficient execution come true, sort of like byte compilation did in a first step? - But wait, Tcl isn't that bad, witness the next routines for producing an induced subgraph (given a subset of the nodes of one graph, all edges that incide with this subset) and determining whether a cycle (see Paths and cycles) in a graph is an induced cycle (having no chord, or short-cutting edge):

 proc inducedSubgraph {vset g} {
    set res {}
    foreach edge [edges $g] {
        foreach {from to} [split $edge ,] break
        if {[contains $vset $from] && [contains $vset $to]} {
            lappend res $edge
        }
    }
    set res
 }
 proc contains {list sublist} {
    if {$sublist==""} {set sublist {{}}}
    foreach element $sublist {
        if {[lsearch $list $element]==-1} {return 0}
    }
    return 1
 }

 # note how ''nodes'' can be reused for cyclic node lists: 

 proc isInducedCycle {cycle g} {
    set nodes [nodes $cycle]
    expr {[llength $nodes]>2 && [isCycle [inducedSubgraph $nodes $g]]}
 }
 proc isCycle g {
    foreach node [nodes $g] {
        if {[degree $node $g]!=2} {return 0}
    }
    expr [llength $g]>0
 }

We can also get all induced cycles of a graph by testing all subsets of the node set (re-using the truthtable enumerator from A little proving engine):

 proc subsets list {
    set res {}
    foreach case [truthtable [llength $list]] {
        set subset {}
        foreach element $list bit $case {
            if {$bit} {lappend subset $element}
        }
        lappend res $subset
    }
    lsort $res
 }
 proc inducedCycles g {
    foreach subset [subsets [nodes $g]] {
        if {[isInducedCycle $subset $g]} {
            set cycle [inducedSubgraph $subset $g]
            if {[llength $cycle]>2} {
                set t([normalizeCycle [nodes $cycle]]) .
            }
        }
    }
    lsort [array names t]
 }
 # This however sometimes runs faster for bigger graphs: 
 proc inducedCycles2 g {
    foreach cycle [cycles $g] {
        if {[isInducedCycle $cycle $g]} {
            lappend res $cycle
        }
    }
    set res
 }
 # or, in a "sugared" version (not noticeable slower - how sweet Tcl can be!): 
 proc inducedCycles3 g {
    each cycle in [cycles $g] where {isInducedCycle $cycle $g}
 }
 proc each {*i "in" list "where" cond} {
        set res {}
        upvar ${*i} i
        foreach i $list {
                if {[uplevel $cond]} {lappend res $i}
        }
        set res
 }

Finally, here's a "densitometer" for graphs, indicating how many of the edges of a complete graph on the same node set are present: 0.0 for edgeless graphs, 1.0 for complete graphs; values above 1.0 indicate multigraphs with loops or parallel edges:

 proc density g {
        set e [llength [edges $g]]
        set v [llength [nodes $g]]
        set completeEdges [expr {$v*($v-1)/2.}]
        expr {$completeEdges? $e/$completeEdges : 0.0}
 }