## a little learning decision tree

NEM 4 September 2006: For some reason today I got thinking about decision trees and games of 20 questions. Naturally, my thoughts turned to Tcl, and before I realised it, I'd created a little decision tree which can learn new decisions. Unlike the complicated inductive reasoning of Playing C4.5, this one just asks the user for a new question when it can't correctly guess the classification. The code is drop-dead simple, but makes a nice code example. I use a quick and cheap fake of algebraic types to express the decision trees, for clarity:

```proc cons {name args} { proc \$name \$args { info level 0 } }
proc func {name cons args} {
set params [lrange \$args 0 end-1]
set body   [lindex \$args end]
proc \$name:\$cons \$params \$body
proc \$name arg [format {
uplevel 1 %s:\$arg
} \$name]
}```

The cons procedure creates data constructors, while func creates procedures that do a simple form of pattern matching. We can use these to create our decision tree data type:

```cons Guess ans
cons Question q yes no```

```func ask Question q yes no {
if {[query \$q] eq "y"} {
} else {
}
}
if {[query "It is \$guess?"] eq "y"} {
puts "Wooh!"
Guess \$guess
} else {
learn \$guess
}
}
proc learn old {
set new [query "I give up. What is it?"]
set feat [query "What feature does it have that \$old doesn't?"]
Question "Does it have \$feat?" [Guess \$new] [Guess \$old]
}```

The second version of ask is where the learning occurs. If the user picks something that we already know about, then we simply return the same guess, and this will build up the exact same decision tree as the recursive calls to ask unwind. However, if the user says something new, then we can learn a new Question for this answer and return that in place of our original Guess. The new tree will be identical to the old one except at this point. learn is the procedure that ask the user to name the new item and what makes it different to our guess. learn transforms a Guess into a new Question (with two new Guesses). All we need now is our query procedure, and everything is complete:

```proc query str {
puts -nonewline "\$str "
flush stdout
gets stdin
}```

Let's test it out:

``` % ask [Question "Does it have wings?" [Guess "a bat"] [Guess "an ant"]]
Does it have wings? y
Is it a bat? n
I give up. What is it? a bird
What feature does it have that a bat doesn't? feathers
Question {Does it have wings?} {Question {Does it have feathers?} {Guess {a bird}} {Guess {a bat}}} {Guess {an ant}}```

Our original decision tree has been extended with a new question in place of the old "a bat" guess, just as we wanted. We now have a way of supervised decision tree learning! We can repeat this process over and over to incrementally build up a complete tree:

```proc repeat tree {
while 1 {
if {[query "\nAgain?"] ne "y"} {
puts [show \$tree]
break
}
}
}```

This keeps building up the tree by repeated calls to ask until we say we are done. It then pretty-prints the resulting tree for us, using this show pretty-printer:

```func show Guess g { return \$g }
func show Question q y n {
set str "\$q\n"
indent {
append str [output "Yes: [show \$y]\n"]
append str [output "No : [show \$n]"]
}
return \$str
}
set indent 0
proc indent code {
incr ::indent 2
uplevel 1 \$code
incr ::indent -2
}
proc output str {
return [string repeat " " \$::indent]\$str
}```

Let's build up our initial tree again:

```% repeat [Question "Does it have wings?" [Guess "a bat"] [Guess "an ant"]]
Does it have wings? y
Is it a bat? n
I give up. What is it? a bird
What feature does it have that a bat doesn't? feathers

Again? y
Does it have wings? n
Is it an ant? n
I give up. What is it? a giraffe
What feature does it have that an ant doesn't? a long neck

Again? y
Does it have wings? n
Does it have a long neck? n
Is it an ant? n
I give up. What is it? an elephant
What feature does it have that an ant doesn't? a trunk

Again? n
Does it have wings?
Yes: Does it have feathers?
Yes: a bird
No : a bat
No : Does it have a long neck?
Yes: a giraffe
No : Does it have a trunk?
Yes: an elephant
No : an ant```

Enjoy!

EMJ Now all you need is persistence of the tree, and the ability to allow corrections if the next user disagrees, and ... (see [L1 ]).

NEM Tree persistence is easy (with using):

```proc save {tree file} {
using fd [open \$file w] { puts \$fd \$tree }
}