Monads

NEM 16 Jan 2007: The functional programming language Haskell introduced an extremely useful concept into the world of programming: the monad. The name monad doesn't explain much of their use, and their origins in abstract mathematics can make them seem rather intimidating at first. However, they can be extremely useful in places. This page will attempt to demonstrate why, with some Tcl code showing how to use them.

Part of the problem with explaining monads is that they are useful for a variety of different things, and so there are lots of ways of explaining them, each of which only tells part of the story. Google for "monad tutorial" produces a host of attempts to explain this tricky concept [L1 ]. Rather than add another comprehensive tutorial to this lot, I will instead just dive into the Tcl code and examples, hoping that some of the uses soon become clear.

A monad is specified as an abstract datatype, supporting two basic operations. The first injects a value into the monad (you can consider this a kind of constructor), and the second applies a function to an element "contained" in a monad and returns a new instance of the monad based on the function result. In Haskell, these operations are called return and >>= respectively. We can describe these in Tcl with the following declaration:

 package require Tcl 8.5
 namespace eval Monad {
     namespace export yield >>=
     namespace ensemble create
 
     # yield :: a -> m a
     # injects a value into monad m
     proc yield a   { error "abstract method" }
     # >>= :: m a -> (a -> mb) -> m b
     # applies a function to a value (type a) in the monad m, and returns a new
     # instance of the monad (type b).
     proc >>= {m f} { error "abstract method" }
 }

These two operations turn out to be very useful. For instance, we can generalize list comprehensions to support any data structure that supports these operations, as we can note that a list comprehension of this form:

 [ exp | x <- xs, y <- ys, ...]

can be translated into code of the following form:

 xs >>= (\x -> ys >>= (\y -> ... (return exp)))

where (\x -> ...) is Haskell's notation for a lambda expression. This is exactly what Haskell's do notation does. So, if we take the example:

 xs = [1,2,3]
 ys = ['a','b']
 [ (x,y) | x <- xs, y <- ys ]

which produces a list of all combinations of the two lists, we can convert this into code using monads as:

 xs >>= (\x ->
 ys >>= (\y ->
    return (x,y)))

In Tcl, we might write this as:

 proc test {m xs ys} {
     $m >>= $xs [func x {
     $m >>= $ys [func y {
         $m yield ($x,$y)
     }] }]
 }

Where we specify the particular monad as the argument, m (Haskell can infer which monad is intended at compile time). We can test this out by developing a List monad in Tcl, which will allow us to exactly simulate list comprehensions:

 namespace eval List {
     namespace path ::Monad
     namespace export >>= yield
     namespace ensemble create
 
     # yield :: a -> [a]
     proc yield a   { list $a }
     # >>= :: [a] -> (a -> [b]) -> [b]
     proc >>= {m f} {
         set ret [list]
         foreach item $m { lappend ret {*}[invoke $f $item] }
         return $ret
     }
 }

This is fairly straight-forward. The yield operation simply wraps the value in a list. The >>= operation uses the standard map function to apply the function to every element of the list (producing another list) and then concatenates all the results. Using this, we should be able to use our test procedure, but first we need to define a few helper procedures:

 # Invoke a command with the given arguments
 proc invoke {cmd args} { uplevel #0 $cmd $args }
 # Create a function closure, that captures the current values of visible
 # variables at creation time.
 proc func {params body {ns ::}} {
     set params [linsert $params 0 __env__]
     set env    [capture 1 $params]
     set body   [list dict with __env__ $body]
     return [list ::apply [list $params $body $ns] $env]
 }
 # Capture variable environment of $level
 proc capture {{level 1} {params ""}} {
     if {[string is integer $level]} { incr level }
     set env [dict create]
     foreach name [uplevel $level { info vars }] {
         if {[lsearch -index 0 $params $name] >= 0} { continue }
         upvar $level $name value
         catch { dict set env $name $value }
     }
     return $env
 }

These should be fairly familiar by now, just some basic functional programming items. With this in place we can now run our test:

 set a [list 1 2 3]
 set b [list a b]
 puts "List: [test List $a $b]"

And like magic we get the result:

 List: (1,a) (1,b) (2,a) (2,b) (3,a) (3,b)

We can combine these computation to create more complex comprehensions. For instance, we can square the list numbers before feeding it into test:

 proc msquare {m x} { $m yield [expr {$x*$x}] }
 puts "List2: [test List [List >>= $a {msquare List}] $b]"

Which produces:

 List2: (1,a) (1,b) (4,a) (4,b) (9,a) (9,b)

At these point we seem to have a simple way of performing list comprehension, albeit with a rather clumsy syntax (the performance isn't too bad though). However, we can now use these comprehensions with data structures other than lists. For instance, another useful monad is one which can be used to represent optional values, in a similar manner to null values/pointers in other languages. The type of such data can be represented by two data constructors: either the value is present, or it is not:

 # Simple datatype constructors
 proc con {name args} { proc $name $args { info level 0 } }
 # data Maybe a = Just a | Nothing
 con Just a
 con Nothing

So, how do we combine computations that may not produce an answer? Well, one way is to say that if any part of a computation is unknown, then the computation as a whole should be unknown. We can formalise this by creating a monad for the Maybe type:

 namespace eval Maybe {
     namespace path ::Monad
     namespace export >>= yield
     namespace ensemble create
 
     proc yield a   { Just $a }
     proc >>= {m f} {
         switch [lindex $m 0] {
             Nothing     { Nothing }
             Just        { invoke $f [lindex $m 1] }
         }
     }
 }

Here we can see the sequencing power of >>=: if the result of $m is Nothing then we simply return Nothing, otherwise we extract the value from the Just constructor and pass it to the function. We can test this use our previous procedure:

 puts [test Maybe [Nothing] [Nothing]] ;# produces Nothing
 puts [test Maybe [Just 1] [Nothing]] ;# produces Nothing
 puts [test Maybe [Nothing] [Just a]] ;# produces Nothing
 puts [test Maybe [Just 1] [Just a]] ;# produces Just (1,a)

This is indeed producing the correct behaviour: if any information is unknown, then the whole computation is. Notice here that we are using our original list comprehension with an entirely different type than lists! We can even perform our more complex test in the Maybe monad:

 test Maybe [Maybe >>= [Just 3] {msquare Maybe}] [Just b]

which produces the expected Just (9,b).

At this point we can take some time to pretty up the syntax a little bit. We will adopt a notation similar to Haskell's do notation, but with any variable binding extracted to the start (much like a let expression). This allows to avoid any parsing of the body. We arrange for the body of our do notation to be evaluated in the namespace of the corresponding monad. This allows us to use the short name yield inside the body without having to explicitly name the monad -- which is shorter but also keeps the body independent of any particular monad.

 proc do {m args} {
     set body [lindex $args end]
     foreach {var <- gen} [lrange $args 0 end-1] {
         if {[llength $var] > 1} {
             set body [format { lassign $__arg__ %s; %s } $var $body]
             set var __arg__
         }
         set fun [format {[func %s %s %s]} $var [list $body] $m]
         set body "$m >>= [list $gen] $fun"
     }
     uplevel 1 $body
 }

This means that we can write our test rather more clearly:

 proc test {m a b} {
     do $m x <- $a y <- $b { yield ($x,$y) }
 }
 test List {1 2 3} {a b}
 test Maybe [Just a] [Nothing]

We can also now start to write some of the more complex examples of list comprehensions:

 # Just the even elements
 do List x <- {1 2 3 4 5} { if {$x%2 == 0} { yield $x } }

And we also allow to break up multiple parts of a list at once:

 set income {
     {Bob    15000}
     {Jane   40000}
     {Frank  38000}
 }
 do List {n i} <- $income {
     if {$i > 30000} { yield $n }
 }

And indeed, we can define some basic functional programming classics, but generalized over arbitrary monads:

 proc mmap {m f xs} { do $m x <- $xs { yield [invoke $f $x] } }
 proc mfilter {m f xs} {
     do $m x <- $xs { if {[invoke $f $x]} { yield $x } else fail }
 }
 proc def {name = cmd args} { interp alias {} $name {} $cmd {*}$args }
 def map = mmap List
 def filter = mfilter List

These are similar to what you can do with foreach, but don't require accumulator variables, and can work with other monads than lists. Our mmap function can "lift" any function to work over a particular monad. For instance:

 proc double x { expr {$x*2} }
 def ldouble = mmap List double
 def mdouble = mmap Maybe double
 ldouble {1 2 3 4} ;# gives {2 4 6 8}
 mdouble [Just 12] ;# gives Just 24
 mdouble [Nothing] ;# gives Nothing

But there is one subtlety here. In the examples above I have neglected to write the code to handle the case where the condition in the body is not met. Instead, I have relied on the fact that if returns the empty string if it doesn't evaluate anything, which is also an empty list. However, what would happen if we were using the Maybe monad?

 do Maybe x <- [Just 1] { if {$x%2 == 0} { yield $x } }

What we actually get here is the empty string. But this is not correct -- we should get Nothing if there is no result. Clearly, we need to extend our monad interface with a way of reporting a failure. We can do this by adding a "fail" operation, and appropriate definitions for our example monads. As this is an error condition, we allow the operation to take an optional string reporting why it failed. By default, this operation simply produces an error:

 namespace eval Monad {
     namespace export fail
     proc fail {{reason ""}} { error "fail: $reason" }
 }
 namespace eval Maybe {
     namespace export fail
     proc fail {{reason ""}} { Nothing }
 }
 namespace eval List {
     namespace export fail
     proc fail {{reason ""}} { list }
 }

Here, we override the default for our monads: Maybe returns Nothing, and List returns the empty list, which seem sensible failure values. We can now rewrite our example as:

 do Maybe x <- [Just 1] { if {$x%2 == 0} { yield $x } else fail }

and it produces Nothing, as we expect.

Outside of failure cases, we might wish to represent some default element of a monad in other cases. We might also want a way of combining elements of a monad without performing any kind of computation, for instance concatenating two lists. Haskell makes these operations available as mzero and mplus respectively, in the MonadPlus class, and we can have the same easily enough:

 namespace eval MonadPlus {
     namespace export mzero mplus
     namespace ensemble create
 
     # mzero :: m a
     proc mzero {}    { error "abstract method" }
     # mplus :: m a -> m a -> m a
     proc mplus {a b} { error "abstract method" }
 }

The instances for our monads are straightforward:

 namespace eval List {
     namespace path {::Monad ::MonadPlus}
     namespace export mzero mplus
 
     proc mzero {}    { list }
     proc mplus {a b} { lappend a {*}$b }
 }
 namespace eval Maybe {
     namespace path {::Monad ::MonadPlus}
     namespace export mzero mplus
 
     proc mzero {}    { Nothing }
     proc mplus {a b} {
         switch [lindex $a 0] {
             Nothing  { return $b }
             Just     { return $a }
         }
     }
 }

The only interesting thing here is the definition of Maybe::mplus. What we've implemented here is a choice operation: it returns whichever of its arguments succeeds, if either does.

Hopefully this introduction has provided a glimpse of the power of monads, and demonstrated that they can be of practical use in Tcl. A larger example of their use is at Parser Combinators, where we use a list monad to perform backtracking in a parser (the code there is currently based on an older TOOT version of monads that is slow and complex, I intend to update some time). Comments and questions welcome.


More Example Monads

A monad for binary trees:

 con Empty
 con Leaf a
 con Branch l r
 namespace eval Tree {
     namespace path {::Monad ::MonadPlus}
     namespace export >>= yield fail mzero mplus
     namespace ensemble create
 
     proc yield x   { Leaf $x }
     proc >>= {m f} {
         switch [lindex $m 0] {
             Empty   { Empty }
             Leaf    { invoke $f [lindex $m 1] }
             Branch  { Branch [>>= [lindex $m 1] $f] [>>= [lindex $m 2] $f] }
         }
     }
     proc fail s    { Empty }
     proc mzero {}  { Empty }
     proc mplus {a b} { Branch $a $b }
 }

Now we have binary tree comprehensions free of charge!

 set tree [Branch [Branch [Leaf 5] [Empty]] [Branch [Leaf 9] [Leaf 12]]]
 do Tree x <- $tree { yield [expr {$x*$x}] }
 mmap Tree ::tcl::mathfunc::sqrt $tree