During my travels I had to calculate some values given certain conditions. Appallingly I was unable to find any instance of Bayesian networks implemented in pure Tcl. The closest thing was [Bayesian Spam Filtering] by [Donal Fellows] which calculated probabilites by examining token frequencies. What I needed was a more general purpose network that could handle any arbitrary deep network. To remedy said condition and with the aid of Eliezer Yudkowsky's outstanding Bayesian Reasoning tutorial ([http://yudkowsky.net/bayes/bayes.html]) I wrote one myself. This program implements the ''support-except / evidence-except'' algorithm (see [http://sern.ucalgary.ca/courses/cpsc/533/W02/Uncertainty/ProbabilisticReasoningGrp9_files/main.htm] et al) and thus is bounded by the limitation that the network must be a polytree (a directed acyclic graph with exactly one path from any node to another). Add nodes to it by calling [[addNode] then initialize the beliefs network through [[initBeliefs]. Now query things by calling the functions [[evidence], [[support], or [[infer]. Suppose one wanted to try the example in Yudkowsky's site: 1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammographies. 9.6% of women without breast cancer will also get positive mammographies. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer? Using the code below one could build the Bayesian network like so: addNode bc {} {s} 0.01 "Have Breast Cancer" addNode s {bc} {} {0.80 0.096} "Screened Positive" When combined these two create a two node network that look like: (bc) -> (s) Node '''bc''' has a probability of 0.01, corresponding with 1% of all women. Node '''s''' has a single parent, '''bc'''. The probability that '''s''' is true given '''bc''' is 0.80; its probability of truth when '''bc''' is false is 0.096. Now one can use the network to answer the question if she actually has cancer by calling: puts [infer +bc +s] which means "if '''s''' is true, what is the probability that '''bc''' is also true?" Assuming that your computer is not broken Tcl will respond with the correct answer: 0.0776397515528 Here is a longer example of what Bayesian networks can do. In our fictional world 20% of all programmers '''Know Tcl''' (P(kt) = 0.20). 25% of all programmers also '''Know Python''' (P(kp) = 0.25). Note that it is possible for a programmer to know both Tcl and Python. A subset of these programmers are also '''Language Lawyers'''. Its ''conditional probability table'' is: kt | kp | P(ll) ----------------- T | T | 0.80 F | T | 0.75 T | F | 0.50 F | F | 0.20 What this means is "if someone knew both Tcl and Python, the odds that he is a language lawyer are 80%". If instead he knew neither language then there is only a 20% chance he is a language lawyer. Note that this conditional table is written such that the first parent node varies in truthhood before the others. Also in this world are '''Operating System Geeks'''. Its conditional probability table is: kt | kp | P(osg) ----------------- T | T | 0.60 F | T | 0.58 T | F | 0.30 F | F | 0.05 Finally there are those who are '''Hireable''' based upon their skills with operating systems. Its table is merely: osg | P(h) ----------- T | 0.95 F | 0.33 Graphically this node looks like (pardon the ASCII art): (kt) (kp) | \/ | v /\ v (ll) (osg) ---> (h) First set up the network like so: addNode kt {} {ll osg} 0.20 "Knows Tcl" addNode kp {} {ll osg} 0.25 "Knows Python" addNode ll {kt kp} {} {0.80 0.75 0.50 0.20} "Language Lawyer" addNode osg {kt kp} {h} {0.60 0.58 0.30 0.05} "OS Geek" addNode h {osg} {} {0.95 0.33} "Hireable" Now make some queries. Suppose you wanted to know what percentage of all programmers are hireable. Phrase the query like this: puts [evidence +h] and the Bayesian network responds with 0.46702. What if you wanted to know how many non-OS geeks know Python but not Tcl: puts [support ~osg +kp~kt] The network reports the answer to be 0.42. The complete code (blah blah [oll] blah) below: # An implementation of a a Bayesian belief network by way of # support-except and evidence-except algorithm by Jason Tang # (tang@jtang.org). The network is limited to just polytrees. # Populates the table of calculated beliefs using values given by the # Bayesian network. proc initBeliefs {} { array unset ::beliefs foreach n $::bayes(nodes) { if {$::bayes($n:p) == {}} { # head node set ::beliefs(+$n) $::bayes($n:w) set ::beliefs(~$n) [not $::bayes($n:w)] } else { # iterate through the parents and populate beliefs with # given evidence set weightNum 0 foreach given [permutate $::bayes($n:p)] { set given [join $given {}] set ::beliefs(+$n|$given) [lindex $::bayes($n:w) $weightNum] set ::beliefs(~$n|$given) [not $::beliefs(+$n|$given)] incr weightNum } } } } # Given a child node and some condition by the parents, calculates the # probability of that child existing. $a must be of form "+X" (to # indicate that state X should be true) or "~X" (to indicate # falsehood). $given may be any complex truthhood (e.g., "+X~Y~Z") as # long as all of the states are the same distance away from $a. proc support {a given} { if {$given == {}} { # asked for a conditional variable, but a has no parents return {} } if {![info exists ::beliefs($a|$given)]} { set x [string range $a 1 end] set tmp $given while {[regexp {\A.([^+~]+)(.*)} $tmp foo g tmp]} { lappend gList $g } if [elemSubset $gList $::bayes($x:p)] { # condition x for just $given and no other variables # within x's parentage set prob 0 foreach xgiven [permutate $::bayes($x:p)] { set xgiven [join $xgiven {}] if [truthSubset $given $xgiven] { set p $::beliefs(+$x|$xgiven) foreach r [truthRemainder $given $xgiven] { set p [expr {double($p * [evidence $r])}] } set prob [expr {$prob + $p}] } } } else { # search through parents until a path is found set foundPath 0 foreach p $::bayes($x:p) { set prob0 [support +$p $given] if {$prob0 != {}} { set prob1 [support ~$p $given] set prob [expr {[support +$x +$p] * $prob0 + [support +$x ~$p] * $prob1}] set foundPath 1 break } } if !$foundPath { error "no path found from $x to $given" } } set ::beliefs(+$x|$given) $prob set ::beliefs(~$x|$given) [not $prob] } # puts "support returns for $a|$given: $::beliefs($a|$given)" return $::beliefs($a|$given) } # Calculates the probability the truthhood of some state. $a must be # of form "+X" (to indicate that state X should be true) or "~X" (to # indicate falsehood). proc evidence {a} { if {![info exists ::beliefs($a)]} { set x [string range $a 1 end] set prob 0.0 foreach given [permutate $::bayes($x:p)] { set p [support +$x [join $given {}]] foreach g $given { set p [expr {double($p * [evidence $g])}] } set prob [expr {$prob + $p}] } set ::beliefs(+$x) $prob set ::beliefs(~$x) [not $prob] } # puts "evidence returns for $a: $::beliefs($a)" return $::beliefs($a) } # Uses Bayes' theorem to infer some parent condition ($a) given the # truthhood of a single child state ($given). Both $a and $given must # be a single state of form "+X" (to indicate that state X should be # true) or "~X" (to indicate falsehood). proc infer {a given} { return [expr {[support $given $a] * [evidence $a] / [evidence $given]}] } ###################################################################### # Bayesian utilities # Given a list of nodes, returns a permutation of them alternating # between true and false. For example, given the list "a b c" this # function returns: # +a+b+c ~a+b+c +a~b+c ~a~b+c +a+b~c ~a+b~c +a~b~c ~a~b~c # meaning {a, b, c}, {not a, b, c}, {a, not b, c}, etc. proc permutate {nodes} { if {$nodes == {}} { error "permutated across nothing!" } if {[llength $nodes] == 1} { return [list [list "+$nodes"] [list "~$nodes"]] } else { set head [lindex $nodes 0] foreach n [permutate [lrange $nodes 1 end]] { lappend retlist [concat "+$head" $n] [concat "~$head" $n] } return $retlist } } # returns non-zero if every element in needle is in the haystack list proc elemSubset {needle haystack} { foreach n $needle { if {[lsearch -exact $haystack $n] == -1} { return 0 } } return 1 } # Returns true if the truth $needle (e.g., "+a~b") is a subset of the # truth $haystack (e.g., "+a~x+y+b~z") proc truthSubset {needle haystack} { while {[regexp {\A(.[^+~]+)(.*)} $haystack foo h haystack]} { lappend stackList $h } while {[regexp {\A(.[^+~]+)(.*)} $needle foo n needle]} { if {[lsearch -exact $stackList $n] == -1} { return 0 } } return 1 } # Given a truth $needle (e.g., "+a~b") and a truth $haystack (e.g., # "+a~x+y+b~z") return the truth without any from $needle in list form # (in this case, {~x +y ~z}) proc truthRemainder {needle haystack} { while {[regexp {\A(.[^+~]+)(.*)} $needle foo n needle]} { lappend needleList $n } set remainder "" while {[regexp {\A(.[^+~]+)(.*)} $haystack foo h haystack]} { if {[lsearch -exact $needleList $h] == -1} { lappend remainder $h } } return $remainder } # Takes a probability and returns its inverse. proc not {p} { return [expr {1.0 - $p}] } # adds a node to the network. be aware that weights must be phrased a # certain way proc addNode {shortName parents children weights {fullName ""}} { if {$fullName == ""} { set fullName $shortName } set ::bayes($shortName) $fullName set ::bayes($shortName:p) $parents set ::bayes($shortName:c) $children set ::bayes($shortName:w) $weights lappend ::bayes(nodes) $shortName } ###################################################################### # start of main script below # do the breast cancer example addNode bc {} {s} 0.01 "Have Breast Cancer" addNode s {bc} {} {0.80 0.096} "Screened Positive" initBeliefs puts "Percent who have breast cancer given a positive screening: [infer +bc +s]" # now do the Tcl/Python example # first clear away the old Bayesian network array unset ::bayes # then populate it with nodes addNode kt {} {ll osg} 0.20 "Knows Tcl" addNode kp {} {ll osg} 0.25 "Knows Python" addNode ll {kt kp} {} {0.80 0.75 0.50 0.20} "Language Lawyer" addNode osg {kt kp} {h} {0.60 0.58 0.30 0.05} "OS Geek" addNode h {osg} {} {0.95 0.33} "Hireable" # initialize the beliefs network initBeliefs # now make queries puts "Percent of all programmers who are hireable: [evidence +h]" puts "Percent hireable who know Tcl and Python: [support +h +kt+kp]" puts "Percent of non-OS geeks who know just Python and not Tcl: [support ~osg +kp~kt]" puts "Percent who know Tcl but are not language lawyers: [infer +kt ~ll]" ---- Return to [Jason Tang] ---- [EKB] See also [Tcl Interface to the SMILE Bayes Network Library] ---- [Category Mathematics] | [Artificial Intelligence with Tcl]