Bayesian Networks

During my travels I had to calculate some values given certain conditions. Appallingly I was unable to find any instance of Bayesian networks implemented in pure Tcl. The closest thing was Bayesian Spam Filtering by Donal Fellows which calculated probabilities by examining token frequencies. What I needed was a more general purpose network that could handle any arbitrary deep network. To remedy said condition and with the aid of Eliezer Yudkowsky's outstanding Bayesian Reasoning tutorial I wrote one myself.

This program implements the support-except / evidence-except algorithm (see [L1 ] et al) and thus is bounded by the limitation that the network must be a polytree (a directed acyclic graph with exactly one path from any node to another). Add nodes to it by calling [addNode] then initialize the beliefs network through [initBeliefs]. Now query things by calling the functions [evidence], [support], or [infer].

Suppose one wanted to try the example in Yudkowsky's site:

  1% of women at age forty who participate in routine screening have
  breast cancer.  80% of women with breast cancer will get positive
  mammographies.  9.6% of women without breast cancer will also get
  positive mammographies.  A woman in this age group had a positive
  mammography in a routine screening.
  What is the probability that she actually has breast cancer?

Using the code below one could build the Bayesian network like so:

 addNode bc {} {s} 0.01 "Have Breast Cancer"
 addNode s {bc} {} {0.80 0.096} "Screened Positive"

When combined these two create a two node network that look like:

  (bc) -> (s)

Node bc has a probability of 0.01, corresponding with 1% of all women. Node s has a single parent, bc. The probability that s is true given bc is 0.80; its probability of truth when bc is false is 0.096. Now one can use the network to answer the question if she actually has cancer by calling:

 puts [infer +bc +s]

which means "if s is true, what is the probability that bc is also true?" Assuming that your computer is not broken Tcl will respond with the correct answer:

 0.0776397515528

Here is a longer example of what Bayesian networks can do. In our fictional world 20% of all programmers Know Tcl (P(kt) = 0.20). 25% of all programmers also Know Python (P(kp) = 0.25). Note that it is possible for a programmer to know both Tcl and Python. A subset of these programmers are also Language Lawyers. Its conditional probability table is:

   kt | kp | P(ll)
  -----------------
    T |  T | 0.80
    F |  T | 0.75
    T |  F | 0.50
    F |  F | 0.20

What this means is "if someone knew both Tcl and Python, the odds that he is a language lawyer are 80%". If instead he knew neither language then there is only a 20% chance he is a language lawyer. Note that this conditional table is written such that the first parent node varies in truthhood before the others. Also in this world are Operating System Geeks. Its conditional probability table is:

   kt | kp | P(osg)
  -----------------
    T |  T | 0.60
    F |  T | 0.58
    T |  F | 0.30
    F |  F | 0.05

Finally there are those who are Hireable based upon their skills with operating systems. Its table is merely:

   osg | P(h)
  -----------
    T  | 0.95
    F  | 0.33

Graphically this node looks like (pardon the ASCII art):

      (kt)  (kp)
       |  \/  |
       v  /\  v
      (ll) (osg) ---> (h)

First set up the network like so:

  addNode kt {} {ll osg} 0.20 "Knows Tcl"
  addNode kp {} {ll osg} 0.25 "Knows Python"
  addNode ll {kt kp} {} {0.80 0.75 0.50 0.20} "Language Lawyer"
  addNode osg {kt kp} {h} {0.60 0.58 0.30 0.05} "OS Geek"
  addNode h {osg} {} {0.95 0.33} "Hireable"

Now make some queries. Suppose you wanted to know what percentage of all programmers are hireable. Phrase the query like this:

  puts [evidence +h]

and the Bayesian network responds with 0.46702. What if you wanted to know how many non-OS geeks know Python but not Tcl:

  puts [support ~osg +kp~kt]

The network reports the answer to be 0.42. The complete code (blah blah oll blah) below:

 # An implementation of a a Bayesian belief network by way of
 # support-except and evidence-except algorithm by Jason Tang
 # ([email protected]).  The network is limited to just polytrees.
 
 # Populates the table of calculated beliefs using values given by the
 # Bayesian network.
 proc initBeliefs {} {
     array unset ::beliefs
     foreach n $::bayes(nodes) {
         if {$::bayes($n:p) == {}} {
             # head node
             set ::beliefs(+$n) $::bayes($n:w)
             set ::beliefs(~$n) [not $::bayes($n:w)]
         } else {
             # iterate through the parents and populate beliefs with
             # given evidence
             set weightNum 0
             foreach given [permutate $::bayes($n:p)] {
                 set given [join $given {}]
                 set ::beliefs(+$n|$given) [lindex $::bayes($n:w) $weightNum]
                 set ::beliefs(~$n|$given) [not $::beliefs(+$n|$given)]
                 incr weightNum
             }
         }
     }
 }
 
 # Given a child node and some condition by the parents, calculates the
 # probability of that child existing.  $a must be of form "+X" (to
 # indicate that state X should be true) or "~X" (to indicate
 # falsehood).  $given may be any complex truthhood (e.g., "+X~Y~Z") as
 # long as all of the states are the same distance away from $a.
 proc support {a given} {
     if {$given == {}} {
         # asked for a conditional variable, but a has no parents
         return {}
     }
     if {![info exists ::beliefs($a|$given)]} {
         set x [string range $a 1 end]
         set tmp $given
         while {[regexp {\A.([^+~]+)(.*)} $tmp foo g tmp]} {
             lappend gList $g
         }
         if [elemSubset $gList $::bayes($x:p)] {
             # condition x for just $given and no other variables
             # within x's parentage
             set prob 0
             foreach xgiven [permutate $::bayes($x:p)] {
                 set xgiven [join $xgiven {}]
                 if [truthSubset $given $xgiven] {
                     set p $::beliefs(+$x|$xgiven)
                     foreach r [truthRemainder $given $xgiven] {
                         set p [expr {double($p * [evidence $r])}]
                     }
                     set prob [expr {$prob + $p}]
                 }
             }
         } else {
             # search through parents until a path is found
             set foundPath 0
             foreach p $::bayes($x:p) {
                 set prob0 [support +$p $given]
                 if {$prob0 != {}} {
                     set prob1 [support ~$p $given]
                     set prob [expr {[support +$x +$p] * $prob0 +
                                     [support +$x ~$p] * $prob1}]
                     set foundPath 1
                     break
                 }
             }
             if !$foundPath {
                 error "no path found from $x to $given"
             }            
         }
         set ::beliefs(+$x|$given) $prob
         set ::beliefs(~$x|$given) [not $prob]
     }
 #    puts "support returns for $a|$given: $::beliefs($a|$given)"
     return $::beliefs($a|$given)
 }
 
 # Calculates the probability the truthhood of some state.  $a must be
 # of form "+X" (to indicate that state X should be true) or "~X" (to
 # indicate falsehood).
 proc evidence {a} {
     if {![info exists ::beliefs($a)]} {
         set x [string range $a 1 end]
         set prob 0.0
         foreach given [permutate $::bayes($x:p)] {
             set p [support +$x [join $given {}]]
             foreach g $given {
                 set p [expr {double($p * [evidence $g])}]
             }
             set prob [expr {$prob + $p}]
         }
         set ::beliefs(+$x) $prob
         set ::beliefs(~$x) [not $prob]
     }
 #    puts "evidence returns for $a: $::beliefs($a)"
     return $::beliefs($a)
 }
 
 # Uses Bayes' theorem to infer some parent condition ($a) given the
 # truthhood of a single child state ($given).  Both $a and $given must
 # be a single state of form "+X" (to indicate that state X should be
 # true) or "~X" (to indicate falsehood).
 proc infer {a given} {
     return [expr {[support $given $a] * [evidence $a] / [evidence $given]}]
 }
 
 ######################################################################
 # Bayesian utilities
 
 # Given a list of nodes, returns a permutation of them alternating
 # between true and false.  For example, given the list "a b c" this
 # function returns:
 #   +a+b+c ~a+b+c +a~b+c ~a~b+c +a+b~c ~a+b~c +a~b~c ~a~b~c
 # meaning {a, b, c}, {not a, b, c}, {a, not b, c}, etc.
 proc permutate {nodes} {
     if {$nodes == {}} {
         error "permutated across nothing!"
     }
     if {[llength $nodes] == 1} {
         return [list [list "+$nodes"] [list "~$nodes"]]
     } else {
         set head [lindex $nodes 0]
         foreach n [permutate [lrange $nodes 1 end]] {
             lappend retlist [concat "+$head" $n] [concat "~$head" $n]
         }
         return $retlist
     }
 }
 
 # returns non-zero if every element in needle is in the haystack list
 proc elemSubset {needle haystack} {
     foreach n $needle {
         if {[lsearch -exact $haystack $n] == -1} {
             return 0
         }
     }
     return 1
 }
 
 # Returns true if the truth $needle (e.g., "+a~b") is a subset of the
 # truth $haystack (e.g., "+a~x+y+b~z")
 proc truthSubset {needle haystack} {
     while {[regexp {\A(.[^+~]+)(.*)} $haystack foo h haystack]} {
         lappend stackList $h
     }
     while {[regexp {\A(.[^+~]+)(.*)} $needle foo n needle]} {
         if {[lsearch -exact $stackList $n] == -1} {
             return 0
         }
     }
     return 1
 }
 
 # Given a truth $needle (e.g., "+a~b") and a truth $haystack (e.g.,
 # "+a~x+y+b~z") return the truth without any from $needle in list form
 # (in this case, {~x +y ~z})
 proc truthRemainder {needle haystack} {
     while {[regexp {\A(.[^+~]+)(.*)} $needle foo n needle]} {
         lappend needleList $n
     }
     set remainder ""
     while {[regexp {\A(.[^+~]+)(.*)} $haystack foo h haystack]} {
         if {[lsearch -exact $needleList $h] == -1} {
             lappend remainder $h
         }
     }
     return $remainder
 }
 
 # Takes a probability and returns its inverse.
 proc not {p} {
     return [expr {1.0 - $p}]
 }
 
 # adds a node to the network.  be aware that weights must be phrased a
 # certain way
 proc addNode {shortName parents children weights {fullName ""}} {
     if {$fullName == ""} {
         set fullName $shortName
     }
     set ::bayes($shortName) $fullName
     set ::bayes($shortName:p) $parents
     set ::bayes($shortName:c) $children
     set ::bayes($shortName:w) $weights
     lappend ::bayes(nodes) $shortName
 }

 ###################################################################### 
 # start of main script below
 
 # do the breast cancer example
 addNode bc {} {s} 0.01 "Have Breast Cancer"
 addNode s {bc} {} {0.80 0.096} "Screened Positive"
 initBeliefs
 puts "Percent who have breast cancer given a positive screening: [infer +bc +s]"
 
 # now do the Tcl/Python example
 
 # first clear away the old Bayesian network
 array unset ::bayes
 # then populate it with nodes
 addNode kt {} {ll osg} 0.20 "Knows Tcl"
 addNode kp {} {ll osg} 0.25 "Knows Python"
 addNode ll {kt kp} {} {0.80 0.75 0.50 0.20} "Language Lawyer"
 addNode osg {kt kp} {h} {0.60 0.58 0.30 0.05} "OS Geek"
 addNode h {osg} {} {0.95 0.33} "Hireable"
 # initialize the beliefs network
 initBeliefs
 # now make queries
 puts "Percent of all programmers who are hireable: [evidence +h]"
 puts "Percent hireable who know Tcl and Python: [support +h +kt+kp]"
 puts "Percent of non-OS geeks who know just Python and not Tcl: [support ~osg +kp~kt]"
 puts "Percent who know Tcl but are not language lawyers: [infer +kt ~ll]"

Return to Jason Tang