Version 3 of A lambda calculus interpreter with arithmetic

Updated 2006-09-10 18:54:10 by NEM

NEM 10 Sept 2006: The lambda calculus is one of the most elegant, and earliest, models of computing around. Its simplicity makes it ideal as a language to implement a little interpreter, while it is equivalent in power to a universal Turing machine. The original lambda calculus consists of just three syntactic constructs:

Variables: x, y, z etc

Abstraction: (λ<var>.<expr>)

Application: (<expr> <expr>)

Abstraction creates a lambda term, which denotes a single-parameter anonymous function (similar to a proc with no name and only one parameter). Application supplies an argument to a lambda term. Variables in the body of a lambda term are replaced by the argument to that lambda. We will omit the details of how evaluation takes place (Beta-reduction and Alpha-conversion), and instead use a slightly different evaluation strategy that works for what we want. We will also add primitive support for integers and arithmetic into our interpreter. You can actually encode integers using just lambda abstractions (via Church numerals) and basic arithmetic, but the encoding is very inefficient and a bit clumsy to use.

Writing an interpreter reveals a lot about how (computer) languages are given meaning. A sentence in a language doesn't have implicit meaning on its own. The meaning comes from the interpretation of the sentence. Thus, there are always two parts to defining a language:

  • Syntax describes the valid sentences of the language and how they can be constructed;
  • Semantics describes what those sentences are supposed to mean.

A third part is defining pragmatics, which refers to details of how such a language should be processed (e.g., what effects should be caused when interpreting something like Tcl's [puts $chan $msg]). We will define the syntax of our language shortly. The semantics will be given by our interpreter, which maps terms from our language onto equivalent Tcl code. Thus the meaning of a term in our language is given by the meaning of the equivalent Tcl term. The pragmatics is similarly delegated -- how a given term is processed is given by how the equivalent Tcl term is processed. This is a simple way to accomplish our goal, but in a real language you may want to define the semantics/pragmatics independently of the implementation language.

Another important step is that we don't try and define the complete meaning of every possible term in the language. Instead, we define the meaning of the basic constructs in our language and allow these constructs to contain symbols/variables. The meaning of these symbols is application-dependent. Our interpreter is parameterised with an environment argument. This environment contains a mapping from user-defined symbols/variables to their definitions in terms of the constructs of the language. The meaning of any given sentence can thus be established by applying our evaluation function to try and remove any variable references from the sentence and reducing it to a simplified form whose meaning can be determined. Typed functional programming languages (e.g., Haskell or ML) often have features to expand the abstract syntax of the language (through datatypes), whereas object oriented languages often strive for a simpler uniform syntax and rely heavily on environments (objects are environments). Of course, FP languages also make heavy use of environments, and OO languages do have syntax, so that's quite a crude characterisation.

In order to define our interpreter succinctly, we will reuse the simple algebraic types that I used in a little learning decision tree (slightly altered):

 proc datatype {name = args} { uplevel 1 [linsert $args 0 |] } 
 proc | {name args} { proc $name $args { info level 0 } }
 proc func {name cons args} {
     set params [lrange $args 0 end-1]
     set body   [lindex $args end]
     set chead  [lindex $cons 0]
     set ctail  [lrange $cons 1 end]
     proc $name:$chead [concat $ctail $params] $body
     proc $name {con args} [format {
         uplevel 1 %s:$con $args
     } $name]
 }

With this in place, we can now define our datatype of terms in language we wish to interpret. This is the abstract syntax of our language:

 datatype Term = Lambda String Term        ;# abstraction: param body
               | App Term Term             ;# application: lambda arg
               | Var String                ;# variable reference
               | IntLit Integer            ;# integer literals
               | Add Term Term             ;# addition
               | Sub Term Term             ;# subtraction
               | Mul Term Term             ;# multiplication
               | Div Term Term             ;# division
               | Closure Env String Term   ;# closures...

Concrete syntax could be defined by writing a parser that transforms strings into the abstract syntax. Notice that our abstract syntax forms a tree datastructure as some of the constructors can contain sub-terms. This is known as an abstract syntax tree (AST).

The first three of these constructors are all that is required for the traditional lambda calculus. We have added integer literals and some primitive arithmetic operations. We've also added a constructor for closures, which form part of our evaluation strategy. Basically, instead of performing alpha-conversion and beta-reduction, we instead capture the environment (i.e. the mapping from variable names to values) when we evaluate a lambda abstraction, and then we restore and extend this environment when we apply an abstraction. This is an environment-passing interpreter, and achieves the same effect as normal lambda calculus evaluation iff we ensure that we are never passed unbound variable references. To do this, we employ call-by-value semantics, where we make sure that all arguments are evaluated before being passed to a lambda. Anyway, let's see what the core of our interpreter looks like -- our "Eval" function:

 # Eval :: Term -> Env -> Integer
 func Eval [Lambda param body] env  { Closure $env $param $body }
 func Eval [App f x] env            { Apply [Eval $f $env] [Eval $x $env] }
 func Eval [Var name] env           { dict get $env $name }
 func Eval [IntLit i] env           { return $i }
 func Eval [Add a b] env            { + [Eval $a $env] [Eval $b $env] }
 func Eval [Sub a b] env            { - [Eval $a $env] [Eval $b $env] }
 func Eval [Mul a b] env            { * [Eval $a $env] [Eval $b $env] }
 func Eval [Div a b] env            { / [Eval $a $env] [Eval $b $env] }
 func Eval [Closure e v b] env      { Closure $e $v $b }

The advantages of our datatype machinery should be fairly clear from this example. If we look at the cases, the evaluation strategy becomes clear. Evaluating a lambda abstraction forms a closure by capturing the current environment (which is just a dictionary, passed as the second argument to Eval). Evaluating an application calls a helper function Apply, after first evaluating both of its arguments. We always need to evaluate the $f argument here, in order to resolve the lambda term that is being applied. However, we could leave the argument unevaluated -- this is call-by-name evaluation order(*). This would run the risk of name clashes in our interpreter, however, as free variables might be accidentally captured. Evaluating a variable reference simply returns the value of the variable in the environment. Integers and closures evaluate to themselves (they are in normal form), and our arithmetic operations perform the appropriate calculation. This covers all of the basic evaluation. Let's look at the other main function in our interpreter: Apply. This is the function that supplies an argument to a lambda abstraction. The only valid term here is a closure:

 func Apply [Closure env param body] arg {
     Eval $body [dict set env $param $arg]
 }

Here we see how our use of closures makes things simple once again. All we do to apply a lambda function is to grab the environment from the closure, extend it by binding the argument given to the parameter name, and finally evaluate the body in this new environment.

We now have the core of our interpreter implemented, consisting of the somewhat magical pair of Eval/Apply so beloved of SICP [L1 ]. The simplicity and power of this pair of mutually recursive functions is truly one of the gems of computer science.

All that remains is to define our arithmetic helper functions and our interpreter is complete. This is a simple one-liner in Tcl:

 foreach op {+ - * /} { proc $op {a b} "expr {\$a $op \$b}" }

We can now start interpreting some expressions. Firstly, let's try some simple arithmetic:

 # 2 + 3
 % Eval [Add [IntLit 2] [IntLit 3]] {}
 5
 % Eval [Mul [IntLit 2] [IntLit 3]] {}
 6

OK, that seems to work. Now, let's create a function that squares it's argument:

 % set square [Lambda x [Mul [Var x] [Var x]]]
 Lambda x {Mul {Var x} {Var x}}
 % Eval [App $square [IntLit 4]] {}
 16
 % Eval [App $square [IntLit 20]] {}
 400

OK, let's get a bit more complicated -- we will create a curried function that takes a number and then returns a function that will add the original number to its argument. In other words [adder 1] will return a function that adds 1 to its argument:

 % set adder [Lambda x [Lambda y [Add [Var x] [Var y]]]]
 Lambda x {Lambda y {Add {Var x} {Var y}}}
 % Eval [App [App $adder [IntLit 1]] [IntLit 12]] {}
 13
 % set add1 [Eval [App $adder [IntLit 1]] {}]
 Closure {x 1} y {Add {Var x} {Var y}}
 % Eval [App $add1 [IntLit 12]] {}
 13

This is really quite sophisticated behaviour, and our interpreter is only a dozen or so lines!

(*) Actually, our evaluation order isn't quite call-by-value, and the variant isn't call-by-name, but I'll leave the actual definitions for another time. For the curious, there are several good books on the lambda calculus, and many functional programming text books cover it. I recommend L. C. Paulson's "ML for the Working Programmer" which develops a nice lambda calculus interpreter supporting various evaluation strategies, as well as a wonderful tactical theorem prover for the predicate logic.


[ Category Example | Category Language | Functional Programming ]