Version 5 of ::tcl::unsupported::tailcall

Experimental command provided by NRE, available in HEAD and onwards from the release of Tcl8.6a2.

The command provides proper tailcalls, with the following semantics:

a proc/lambda invokes [tailcall foo [bar] $soom]
bar is invoked and the value of soom fetched, as if [list foo [bar] $soom] had been called
a command named foo is looked up in the current context
the current proc/lambda replaces itself in the call stack with a call to the just found command

That is: tailcall foo [bar] $soom is very similar to return [uplevel 1 [list foo [bar] $soom]]] with two exceptions:

foo is looked up in the current context, not in the caller's
the stack frame is really gone, not just virtually. This has positive effects on memory, and a possibly confusing effect on stack traces.

NEM Many thanks to MS for his hard work making this a reality! For some background, see Tail call optimization.

NEM As a test/demo of how to use this facility, here is a simple benchmark using the factorial function:

package require Tcl 8.6a1
namespace import ::tcl::mathop::*
interp alias {} tailcall {} tcl::unsupported::tailcall

# Naive recursive factorial function
proc fac n {
    if {$n <= 1} {
        return 1
    } else {
        * $n [fac [- $n 1]]
    }
}

# Tail-recursive factorial
proc fac-tr {n {k 1}} {
    if {$n <= 1} {
        return $k
    } else {
        tailcall fac-tr [- $n 1] [* $n $k]
    }
}

# Iterative factorial
proc fac-i n {
    for {set k 1} {$n > 1} {incr n -1} {
        set k [expr {$n*$k}]
    }
    return $k
}

proc test {} {
    set fmt "%-10s ..%-12.12s %s"
    puts [format $fmt "Implementation" "Result" "Time"]
    foreach n {1 5 10 100 500 1000 2500 5000 10000} {
        puts "\nfac $n:"
        foreach impl {fac fac-i fac-tr} {
            if {[catch { $impl $n } result]} { 
                set result "n/a"
                set time "n/a"
            } else {
                set time [time [list $impl $n] 10]
            }
            puts [format $fmt $impl $result $time]
        }
    }
}
test

Putting this in a table, we get (timings taken on Windows XP: dual processor, dual core 2.66GHz machine):

N	fac Time	fac-i Time	fac-tr Time
1	0.0	0.0	0.0
5	0.0	0.0	0.0
10	0.0	0.0	0.0
100	0.0	0.0	1600.0
500	3100.0	3100.0	6200.0
1000	n/a	11000.0	17200.0
5000	n/a	48500.0	64000.0
10000	n/a	661000.0	n/a

Timings are very rough (Windows issue, I think); I'll rerun on a unix at some point. As we can see, the tail-recursive version is slightly slower than the iterative version, and unlike the naive version, manages to not blow the stack. However, at N=10000 it does abort trying to realloc 12944 bytes. Maybe a memory leak?

Category Concept	Category Control Structure	Category Functional Programming