[NEM] ''4 September 2006'': For some reason today I got thinking about [decision trees] and games of 20 questions. Naturally, my thoughts turned to Tcl, and before I realised it, I'd created a little decision tree which can learn new decisions. Unlike the complicated inductive reasoning of [Playing C4.5], this one just asks the user for a new question when it can't correctly guess the classification. The code is drop-dead simple, but makes a nice code example. I use a quick and cheap fake of [algebraic types] to express the decision trees, for clarity: proc cons {name args} { proc $name $args { info level 0 } } proc func {name cons args} { set params [lrange $args 0 end-1] set body [lindex $args end] proc $name:$cons $params $body proc $name arg [format { uplevel 1 %s:$arg } $name] } The ''cons'' procedure creates data constructors, while ''func'' creates procedures that do a simple form of pattern matching. We can use these to create our decision tree data type: cons Guess ans cons Question q yes no A decision tree is thus either a Guess at what the answer is, or a question with sub-trees for yes or no answers. We can now write a function, ''ask'' that takes a decision tree and asks the user questions until it finds an answer or a gap in its knowledge. Every time we ask a question, if the answer is "y" then we move on to the yes-branch of the tree and ask again, otherwise we move on to the no-branch. We also build up a new decision tree as we go, allowing ''ask'' to plug in new sub-trees for the chosen branch. This is a classic technique from [functional programming], and forms the basis of our simple learning technique: if we get a novel answer then ''ask'' can simply return a new Question that incorporates the new answer, and this will then be incorporated into a new tree as the procedure unwinds. func ask Question q yes no { if {[query $q] eq "y"} { Question $q [ask $yes] $no } else { Question $q $yes [ask $no] } } func ask Guess guess { if {[query "It is $guess?"] eq "y"} { puts "Wooh!" Guess $guess } else { learn $guess } } proc learn old { set new [query "I give up. What is it?"] set feat [query "What feature does it have that $old doesn't?"] Question "Does it have $feat?" [Guess $new] [Guess $old] } The second version of ''ask'' is where the learning occurs. If the user picks something that we already know about, then we simply return the same guess, and this will build up the exact same decision tree as the recursive calls to ask unwind. However, if the user says something new, then we can learn a new Question for this answer and return that in place of our original Guess. The new tree will be identical to the old one accept at this point. ''learn'' is the procedure that ask the user to name the new item and what makes it different to our guess. ''learn'' transforms a Guess into a new Question (with two new Guesses). All we need now is our ''query'' procedure, and everything is complete: proc query str { puts -nonewline "$str " flush stdout gets stdin } Let's test it out: % ask [Question "Does it have wings?" [Guess "a bat"] [Guess "an ant"]] Does it have wings? y Is it a bat? n I give up. What is it? a bird What feature does it have that a bat doesn't? feathers Question {Does it have wings?} {Question {Does it have feathers?} {Guess {a bird}} {Guess {a bat}}} {Guess {an ant}} Our original decision tree has been extended with a new question in place of the old "a bat" guess, just as we wanted. We now have a way of supervised decision tree learning! We can repeat this process over and over to incrementally build up a complete tree: proc repeat tree { while 1 { set tree [ask $tree] if {[query "\nAgain?"] ne "y"} { puts [show $tree] break } } } This keeps building up the tree by repeated calls to ''ask'' until we say we are done. It then pretty-prints the resulting tree for us, using this ''show'' pretty-printer: func show Guess g { return $g } func show Question q y n { set str "$q\n" indent { append str [output "Yes: [show $y]\n"] append str [output "No : [show $n]"] } return $str } set indent 0 proc indent code { incr ::indent 2 uplevel 1 $code incr ::indent -2 } proc output str { return [string repeat " " $::indent]$str } Let's build up our initial tree again: % repeat [Question "Does it have wings?" [Guess "a bat"] [Guess "an ant"]] Does it have wings? y Is it a bat? n I give up. What is it? a bird What feature does it have that a bat doesn't? feathers Again? y Does it have wings? n Is it an ant? n I give up. What is it? a giraffe What feature does it have that an ant doesn't? a long neck Again? y Does it have wings? n Does it have a long neck? n Is it an ant? n I give up. What is it? an elephant What feature does it have that an ant doesn't? a trunk Again? n Does it have wings? Yes: Does it have feathers? Yes: a bird No : a bat No : Does it have a long neck? Yes: a giraffe No : Does it have a trunk? Yes: an elephant No : an ant Enjoy! [EMJ] Now all you need is persistence of the tree, and the ability to allow corrections if the next user disagrees, and ... (see [http://www.fourmilab.ch/documents/univac/animal.html]). [NEM] Tree persistence is easy (with [using]): proc save {tree file} { using fd [open $file w] { puts $fd $tree } } proc load {file} { using fd [open $file] { return [read $fd] } }