a little learning decision tree

NEM 4 September 2006: For some reason today I got thinking about decision trees and games of 20 questions. Naturally, my thoughts turned to Tcl, and before I realised it, I'd created a little decision tree which can learn new decisions. Unlike the complicated inductive reasoning of Playing C4.5, this one just asks the user for a new question when it can't correctly guess the classification. The code is drop-dead simple, but makes a nice code example. I use a quick and cheap fake of algebraic types to express the decision trees, for clarity:

proc cons {name args} { proc $name $args { info level 0 } }
proc func {name cons args} {
    set params [lrange $args 0 end-1]
    set body   [lindex $args end]
    proc $name:$cons $params $body
    proc $name arg [format {
        uplevel 1 %s:$arg
    } $name]
}

The cons procedure creates data constructors, while func creates procedures that do a simple form of pattern matching. We can use these to create our decision tree data type:

cons Guess ans
cons Question q yes no

A decision tree is thus either a Guess at what the answer is, or a question with sub-trees for yes or no answers. We can now write a function, ask that takes a decision tree and asks the user questions until it finds an answer or a gap in its knowledge. Every time we ask a question, if the answer is "y" then we move on to the yes-branch of the tree and ask again, otherwise we move on to the no-branch. We also build up a new decision tree as we go, allowing ask to plug in new sub-trees for the chosen branch. This is a classic technique from functional programming, and forms the basis of our simple learning technique: if we get a novel answer then ask can simply return a new Question that incorporates the new answer, and this will then be incorporated into a new tree as the procedure unwinds.

func ask Question q yes no {
    if {[query $q] eq "y"} {
        Question $q [ask $yes] $no
    } else {
        Question $q $yes [ask $no]
    }
}
func ask Guess guess {
    if {[query "It is $guess?"] eq "y"} {
        puts "Wooh!"
        Guess $guess
    } else {
        learn $guess
    }
}
proc learn old {
    set new [query "I give up. What is it?"]
    set feat [query "What feature does it have that $old doesn't?"]
    Question "Does it have $feat?" [Guess $new] [Guess $old]
}

The second version of ask is where the learning occurs. If the user picks something that we already know about, then we simply return the same guess, and this will build up the exact same decision tree as the recursive calls to ask unwind. However, if the user says something new, then we can learn a new Question for this answer and return that in place of our original Guess. The new tree will be identical to the old one except at this point. learn is the procedure that ask the user to name the new item and what makes it different to our guess. learn transforms a Guess into a new Question (with two new Guesses). All we need now is our query procedure, and everything is complete:

proc query str {
    puts -nonewline "$str "
    flush stdout
    gets stdin
}

Let's test it out:

 % ask [Question "Does it have wings?" [Guess "a bat"] [Guess "an ant"]]
 Does it have wings? y
 Is it a bat? n
 I give up. What is it? a bird
 What feature does it have that a bat doesn't? feathers
 Question {Does it have wings?} {Question {Does it have feathers?} {Guess {a bird}} {Guess {a bat}}} {Guess {an ant}}

Our original decision tree has been extended with a new question in place of the old "a bat" guess, just as we wanted. We now have a way of supervised decision tree learning! We can repeat this process over and over to incrementally build up a complete tree:

proc repeat tree {
    while 1 {
        set tree [ask $tree]
        if {[query "\nAgain?"] ne "y"} {
            puts [show $tree]
            break
        }
    }
}

This keeps building up the tree by repeated calls to ask until we say we are done. It then pretty-prints the resulting tree for us, using this show pretty-printer:

func show Guess g { return $g }
func show Question q y n {
    set str "$q\n"
    indent {
        append str [output "Yes: [show $y]\n"]
        append str [output "No : [show $n]"]
    }
    return $str
}
set indent 0
proc indent code {
    incr ::indent 2
    uplevel 1 $code
    incr ::indent -2
}
proc output str {
    return [string repeat " " $::indent]$str
}

Let's build up our initial tree again:

% repeat [Question "Does it have wings?" [Guess "a bat"] [Guess "an ant"]]
Does it have wings? y
Is it a bat? n
I give up. What is it? a bird
What feature does it have that a bat doesn't? feathers

Again? y
Does it have wings? n
Is it an ant? n
I give up. What is it? a giraffe
What feature does it have that an ant doesn't? a long neck

Again? y
Does it have wings? n
Does it have a long neck? n
Is it an ant? n
I give up. What is it? an elephant
What feature does it have that an ant doesn't? a trunk

Again? n
Does it have wings?
  Yes: Does it have feathers?
    Yes: a bird
    No : a bat
  No : Does it have a long neck?
    Yes: a giraffe
    No : Does it have a trunk?
      Yes: an elephant
      No : an ant

Enjoy!

EMJ Now all you need is persistence of the tree, and the ability to allow corrections if the next user disagrees, and ... (see [L1 ]).

NEM Tree persistence is easy (with using):

proc save {tree file} {
    using fd [open $file w] { puts $fd $tree }
}
proc load {file} {
    using fd [open $file] { return [read $fd] }
}

Alex Caldwell This is very interesting to me. I used your Decision trees code in this Tcl/Tk medical decision/diagnosis project called MedMapper Medical Decision Suppport. I have a starpack demo you can download at [L2 ] I had to build up this big nested list by hand representing the decision tree. It was rather tedious to build the nested list by hand. The list/tree gets mapped to a Tk canvas using Canvas Buttons in 3-D for a visual interactive representation . With this new code you have developed, do you think it could be taught the type of information in the Medmapper chf (congestive heart failure) algorithm?