Version 15 of Tail call optimization

Purpose: Discuss a possible optimization where the Tcl bytecode compiler would eliminate tail calls.

Background: Functional programming.

The usual programming style in Tcl is imperative, where all the programing primitives are thought of as commands. Programs maintain their state by assigning to variables, and they can assign to the same variable multiple times. In Tcl, the classical way to compute a factorial, for instance, would be given by the following script:

    proc fact0 { n } {
        set result 1.
        while { $n > 1 } {
            set result [expr { $result * $n }]
            set n [expr { $n - 1 }]
        }
        return $result
    }

This style is not the only style available, nor necessarily even the best. The functional programming community considers variable assignments harmful, and prefers that variables be bound only as the arguments to functions. This preference leads to a style of programming where loops are replaced by recursion, and all the control structure is applicative, that is, based on function calls. The naive implementation of factorial in such a system is shown in the following script:

    proc fact1 { n } {
        if { $n <= 1 } {
            return 1.
        } else {
            return [expr { $n * [fact1 [expr { $n - 1 }]] }]
        }
    }

(This Wiki has a discussion on the required steps towards functional programming in Tcl.)

Tail recursion.

Note that the fact1 procedure consumes a large quantity of stack. If n is 10, say, it goes ten recursive calls deep. If multiple levels of loop are eliminated in this way, or if something requires many thousand iterations, the recursion quickly becomes unmanageable.

There is an alternative that preserves the best of both approaches. If a procedure can accept extra parameters that maintain its state, it can be written in a hybrid-iterative or tail-recursive style. The key to this style is:

If a procedure invokes itself recursively, it immediately returns the value of the recursive call without further processing.

One factorial procedure, written in the tail-recursive style, is shown below:

    proc fact2 { n { result 1. } } {
        if { $n <= 1 } {
            return $result
        } else {
            return [fact2 [expr { $n - 1 }] [expr { $n * $result }]]
        }
    }

Tail call elimination.

Why is tail recursion important? The reason is that there's a simple, mechanical optimization that can turn it back into iteration. The recursive call can be replaced with code that:

Replaces the arguments to the function with the values from the recursive call.
Jumps back to the start of the function.

While Tcl has no go to construct, the code for this optimization can be approximated with a while loop, as in the fact2 procedure below.

    proc fact2-rewritten { n { result 1 } } {
        while { 1 } {
            if { $n <= 1 } {
                return $result
            } else {
                set k [expr { $n * $result }]
                set n [expr { $n - 1 }]
            }
        }
    }

The Tcl bytecode engine has no support for this kind of optimization today, but maybe it should. Programmers accustomed to functional languages such as FP and Haskell (see Playing Haskell), and those accustomed to Scheme, would be surprised that it does not.

In fact, most implementations of these languages optimize not only tail recursion, but all tail calls. If a function returns the value of another function, the call is replaced with code that adjusts the stack and jumps to the start of the other function. In this way, mutually recursive functions can also get reduced to iteration.

See also MS's bytecode engine ideas

MS: a silent implementation of tail-jump-optimisation in the engine is probably possible. However, it would mess up the stack - and errorInfo.

Maybe it should be used at the programmer�s bidding only; maybe something like

    proc fact2 { n { result 1. } } {
        if { $n <= 1 } {
            return $result
        } else {
            return -tail [fact2 [expr { $n - 1 }] [expr { $n * $result }]]
        }
    }

KBK: I'd rather see it be a global option for the bytecode compiler, so that the programmer can set up as a "debug" or "release" configuration. Something like:

  if { [info exists ::tcl_platform(debug)] 
       && $::tcl_platform(debug)] } {
    set ::tcl::compiler(tailCallOptimize) 0
  } else {
    set ::tcl::compiler(tailCallOptimize) 1
  }

That way, running in the "debug" environment gets detailed ::errorInfo, while running in the "release" environment gets fast code. Sort of like -g vs -O on most C compilers.

AK: This modification reaches further than messing up errorInfo. uplevel and upvar are deeply affected. They will either not work at all, or have different semantics. A suboptimal (interim) solution would be to eliminate tail-calls on the C-stack but keep the Tcl Callframes so that uplevel, etc. can still work correctly.

Arjen Markus It is possible to realise tail recursion in pure Tcl (version 8.0 and onwards). The major drawbacks of the approach are:

Clumsy programming (you need to use variables in a context outside the recursive procedure
Slow execution (I have included some figures to give an impression)
All the caveats mentioned above about upvar and uplevel continue to apply.

But the advantage is that you can do endless recursion - the very reason tail recursion is so useful!

Here is my small example - a stupid little counter, but nevertheless ...

   #
   # Simple recursive calculation
   #
   proc rcount { c } {
      if { $c != 0 } {
         incr c -1
         return [expr {1+[rcount $c]}]
      } else {
         return 0
      }
   }

   set counter [rcount 10]
   puts $counter

   #
   # Simulate tail-recursive calculation
   #
   proc tail_rcount_priv { c } {
      if { $c != 0 } {
         incr c -1
         incr ::tail_rcount_c
         after 0 [list tail_rcount_priv $c]
      } else {
         set ::tail_rcount_continue 0
      }
   }


   proc tail_rcount { c } {
      set ::tail_rcount_continue 1
      set ::tail_rcount_c        0

      after 0 [list tail_rcount_priv $c]

      vwait ::tail_rcount_continue

      return $::tail_rcount_c
   }

   set counter [tail_rcount 10]
   puts $counter

   puts [time {set counter [rcount 100]}      100]
   puts [time {set counter [rcount  10]}     1000]
   puts [time {set counter [rcount   2]}     5000]
   puts [time {set counter [tail_rcount 100]} 100]
   puts [time {set counter [tail_rcount 1000]} 10]
   puts [time {set counter [tail_rcount 10000]} 1]

The result (at least, one possible, on my home PC, 350 MHz, Tcl8.3.4):

   10
   10
   1600 microseconds per iteration
   500 microseconds per iteration
   78 microseconds per iteration
   89000 microseconds per iteration
   841000 microseconds per iteration
   8450000 microseconds per iteration

Conclusion:

Tail recursion using after is possible but expensive (about 50 times!)
Tail recursion is linearly dependent on the number of steps (for ordinary recursion the time does not increase linearly, but grows less fast)

- - - I realised that the large execution time is probably due to the granularity of the timer (hence the round figures), more than anything else, so a delay of 0 ms is actually a delay of 1 ms most of the time!

(I ran the above script on a UNIX workstation and got very different ratios, a clear indication that granularity is the problem here!)

Update: using after idle improves the performance numbers as well (at least in tclsh)!

Trampolines jenglish

Another technique for implementing tail-calls is the trampoline: instead of returning a value, procedures return a continuation (i.e., what to do next). Then an outer loop -- the trampoline -- repeatedly evaluates the current continuation.

Implementing a trampoline is easy in Tcl:

    proc trampoline {args} {
        set cmd $args
        while {1} {
            set cmd [eval $cmd]
        }
    }

Add a bit of syntactic sugar to indicate tail-calls:

    proc trTailcall {args} {
        return -code return $args
    }

and finally we need a way to break out of the trampoline:

    proc trResult {result} {
        return -code return [list return $result]
    }

For an example, here's the standard (tail-recursive) version of the Fibonacci function in Tcl:

    proc fib1 {n} {     ; # See [SICP] p. 39
        proc fib-iter {a b count} {
            if {$count <= 0} {
                return $b
            } else {
                fib-iter [expr {$a+$b}] $a [expr {$count-1}]
            }
        }
        fib-iter 1.0 0.0 $n
    }

But this version runs out of stack space for large n:

    % fib1 1000
    too many nested calls to Tcl_EvalObj (infinite loop?)

Rewriting it to use a trampoline works for (almost) arbitrarily large n

    proc fib2 {n} {
        proc fib-iter {a b count} {
            if {$count <= 0} {
                trResult $b
            } else {
                trTailcall fib-iter [expr {$a+$b}] $a [expr {$count-1}]
            }
        }
        trampoline fib-iter 1.0 0.0 $n
    }

    % fib2 1000
    4.34665576869e+208

although we eventually run into floating point limitations (on my machine this happens at n=1476).

Arjen Markus This is a very nice solution too, nicer in that the evaluation occurs in a local context, rather than global.

Arjen Markus I copied these wise words about tail recursion from a response by Donal Fellows on the c.l.t.:

While many algorithms can have tail-recursion elimination applied to them, not all can be handled like this. Typically, you'd expect the optimisation to be applicable exactly when a function directly returns the result of another function call (especially in the case where it is a recursive or co-recursive1 call.) The problem comes with when you start wanting to do some operations to the result before passing it back; optimising these sorts of things can be much harder. As an example of what I mean, the natural definition of the factorial function2 (in Tcl) is:

  proc fact {n} {
     expr {$n<2 ? $n : $n*[fact [expr {$n-1}]]}
  }

This is not tail-recursive since you need to keep the value of $n about until after the recursion ends. Of course, this is fixable using an accumulator-argument scheme, but in general that's difficult to do:

  proc fact {n {accum 1}} {
     expr {$n<2 ? $accum : [fact [expr {$n-1}] [expr {$accum*$n}]]}
  }

Indeed, for some algorithms (depth-first tree traversal, for example) you need a stack *somewhere* and it may well be easiest to maintain the stack through recursive calls (it usually depends what else is going on at the same time.)

Trampolining (I *like* that code3) becomes unwieldy as the amount of information stored on the stack increases (I've got co-recursive tree-traversal code that uses upvar to access large arrays in parent contexts as part of an elaborate code generator, which would be very hard to make tail-recursive.) However, if you can bound your recursion depth neatly (pretty easy in practise with non-worst-case trees because of the sheer quantity of data even fairly shallow trees can hold) then a recursive algorithm may well be the best way to solve a problem.

Donal.

1: A calls B, and B calls A.

2: Fibonacci and GCD functions lend themselves to this treatment as well.

3: If we had first-class stack-frame objects, this would be even cooler!

Another place where tail calls would be really useful, and in support of an existing tcl idiom, is to implement major command/minor commands. In this case, the major command does nothing (modulo error checking) but a tail call to the minor command, and the fact that the stack frame of the major command is lost is a feature, not a bug. It's sort of the inverse of a curry (a chutney?). Something like:

  proc majorCommand {minorCommand args} {
    # check for valid $minorCommand
    return [eval $minorCommand $args]
  }

And: Suppose that creating a namespace also created a major command of the same name that redispatched the minor command to the namespace. Then namespaces and commands could be unified. The nice thing about this is that while commands are closed, namespaces are not(*). This would allow the set of minor commands of a major command to be extended, which would definitely be nice. And once commands and namespaces are unified, a tcl OO-system seems like it would practically fall out.

(*) If it were possible to close a namespace (perhaps a new kind of trace?), class-based OO systems could be implemented on top of this. With open namespaces, it's slot-based OO only. Scott Gargash