Version 3 of unification

Updated 2006-08-23 19:34:42 by NEM

Unification is used in Logic programming to find one or more instantiations of the free variables in a set of logical formulas that satisfies those formulas. See [L1 ] for a brief overview.

NEM The core unification algorithm is actually quite simple, and is a classic piece of code to know. Here I'll present a general unification algorithm that works over many different types of data structures.

In order to perform unification of two values, we need to know something about their structure. For instance, to unify two lists we would traverse each in order attempting to unify each corresponding element. However, to unify two dicts we would traverse each and compare corresponding key/value pairs. This information about the structure of values is type information. As Tcl has no built-in notion of types, we have to explicitly pass this information in. The core of our unification algorithm looks as follows:

 proc unify {type a b {env ""}} {
     if {[var? $a]} {
         unify-var $type $a $b $env
     } elseif {[var? $b]} {
         unify-var $type $b $a $env
     } else {
         invoke $type $a $b $env
     }
 }

This basic algorithm takes a type-function (we'll descibe in a moment), two values to be unified, and an optional initial environment dictionary, env. This dict will be used to hold possible variable bindings as we traverse the values. The basic unifier checks to see if either of the values is a variable, and if so attempts to unify that variable with the other value in the context of the environment. This is done by the unify-var procedure:

 # Variables are single words that start with a question mark, e.g. ?name
 proc var? item { regexp {^\?\S*$} $item }
 proc unify-var {type var other env} {
     if {[dict exists $env $var]} {
         unify $type [dict get $env $var] $other $env
     } elseif {[dict exists $env $other]} {
         unify $type $var [dict get $env $other] $env
     } elseif {$var eq "?"} {
         return $env
     } else {
         occurs-check $var $other
         return [dict set env $var $other]
     }
 }

Again, this code is fairly straight-forward. If the environment already contains a mapping for this variable, then we lookup the value and retry unification. If not, then there are a number of options. Firstly, we can check if the other value is also a variable and is already bound in the environment (second branch of the if). If so, then we again lookup the value and retry unification. If neither value is bound, then we create a new binding for the variable in the environment with the other value. We also allow a special wild-card variable "?", which matches everything without forming a binding. This is useful when we don't care about some part of a structure.

The only missing piece of this variable unification is the "occurs-check" call. When we create a new binding in the environment, we need to make sure that the variable we are binding is not itself present in the value we are binding it to. As a value cannot contain itself, this would form an inconsistent binding, and so unification should fail. For instance, trying to unify ?foo with {a b ?foo} should fail as it would otherwise create an impossible structure. This occurs-check can be expensive, so Prolog and other logic programming languages tend to leave it out. This is what we will do here, too. If you were writing a theorem prover or other application where soundness was more important than efficiency, then you should put this occurs-check back in.

 proc occurs-check {var other} {}

Our basic framework is pretty much complete. We just need a couple of extra utilities before we move on to individual type-functions. "invoke" is a simple helper to invoke a type function. It takes care of expanding the function into individual words and invoking at the global scope. "fail" is a special exception type to indicate that unification has failed. We do not handle this exception in unification, but let it propagate to callers. A Prolog implementation could catch this error and trigger back-tracking (i.e., trying a different rule).

 proc invoke {cmd args} { uplevel #0 $cmd $args }
 proc fail {} { return -code error "unification failure" }

We could use a custom return code (e.g., 3245 which is "fail" on a telephone keypad) if we wished to avoid the overhead of an error (which builds a stack-trace as it propagates), but we'll keep things simple for now.

We now must deal with describing the structure of individual types. Each type is represented by a function that takes two values of that type and an environment, and performs unification over those values. We will name our type-functions with an initial capital letter to distinguish them from the commands that manipulate those types. Let's start with the simplest type, Tcl's strings:

 proc String {a b env} {
     if {$a ne $b} { fail }
     return $env
 }

This simply compares two strings for equality. If they are not equal then we fail, otherwise we return the environment as it is. We can now test our unification function on simple strings:

 % unify String hello world
 unification failure
 % unify String hello hello
 % unify String hello ?s
 ?s hello
 % unify String ?s hello
 ?s hello

Great! Everything seems to be working. We can define a type for numbers (both integers and reals) just as easily:

 proc Num {a b env} {
     if {$a != $b} { fail }
     return $env
 }

But what about lists and dictionaries? These are slightly more complicated, as they can contain elements of different types -- they are polymorphic data structures. In order to fully describe the type of these structures, we can further parameterise the type-functions with the types of their elements. We assume, for simplicity, that lists are homogenous; all elements are of the same type:

 proc List {elemtype as bs env} {
     if {[llength $as] != [llength $bs]} { fail }
     foreach a $as b $bs {
         set env [unify $elemtype $a $b $env]
     }
     return $env
 }

Here we run down each element of the lists and call "unify" recursively to see if they match. We pass "unify" the expected element type, which is an argument to the List function. So where does this element type come from? We simply pass in a compound type description when we call unify:

 % unify {List Num} {1 2 3} {1 ?b 3}
 ?b 2

And now we can start to see how powerful unification is. We can also define a type of Tuples which are heterogenous lists, but we need to specify the type of each element:

 proc Tuple {types as bs env} {
     if {[llength $types] != [llength $as] ||
         [llength $as]    != [llength $bs]} { fail }
     foreach type $types a $as b $bs {
         set env [unify $type $a $b $env]
     }
     return $env
 }

We can then define dictionaries in a similar manner:

 proc Dict {types as bs env} {
     if {[dict size $types] != [dict size $as] ||
         [dict size $as]    != [dict size $bs]} { fail }
     dict for {key type} $types {
         if {![dict exists $as $key] || ![dict exists $bs $key]} { fail }
         set env [unify $type [dict get $as $key] \
                              [dict get $bs $key] $env]
     }
     return $env
 }

Here we parameterise our Dict type with a dictionary describing the types of each key/value mapping. We then check that each value has precisely the same mappings and try to unify each individual. More flexible schemes are possible; for instance, we could allow each dictionary to contain extra keys which are not in our type mapping -- structural sub-typing, but I'll leave that as an exercise.

We can now perform unification over very complex structures:

 # Simple aliases:
 proc def {name = args} {
     interp alias {} $name {} {expand}$args
 }
 # Create a type for people:
 def Person = Dict {
     name       String
     age        Num
     birthday   {List {Num String}}
 }
 set neil {
     age        25
     name       "Neil Madden"
     birthday   {27 October}
 }
 array set ans [unify Person $neil {
     name     ?name 
     birthday {? ?month} 
     age      ?
 }]
 puts "$ans(?name)'s birthday is in $ans(?month)"

As you can see, unification forms the basis of some powerful structural querying, which we can use in a little database with unification.


[ Category Concept ]