memoizing

I came up with this little proc while lamenting the confusing way that cache functions need to be called in mason [L1 ] - because perl has no way (that I'm aware of) (anon can be done with 'die') for a subroutine to cause its caller to return, cache functions must be called as

return if $m->cache_self();

RLH 2006-01-14: Perl has had memoization for a while now: [L2 ]

In Tcl this can be done much more elegantly. Just call this memoize proc at the beginning of another proc that is expensive to run and it will save the return value so it doesn't need to be recomputed. This makes use of info level to examining the stack as well as [return -code] to cause the calling proc to return.

proc memoize {args} {
    global memo
    set cmd [info level -1]
    set d [info level]
    if {$d > 2} {
        set u2 [info level -2]
        if {[lindex $u2 0] eq {memoize}} {
            return
        }
    }
    if {[info exists memo($cmd)]} {
        set val $memo($cmd)
    } else {
        set val [eval $cmd]
        set memo($cmd) $val
    }
    return -code return $val
}

A classic use of this is the recursive fibonacci function:

proc fibonacci x {
    if {$x < 3} {return 1}
    expr {[fibonacci [expr {$x - 1}]] + [fibonacci [expr {$x - 2}]]}
}

Because this recomputes all lower values for every number, the performance is O(2^n)

proc fibonacci x {
    memoize
    if {$x < 3} {return 1}
    expr {[fibonacci [expr {$x - 1}]] + [fibonacci [expr {$x - 2}]]}
}

By saving values that have already been computed by simply calling memoize, the performance becomes O(n)

RS: See also Result caching - but this solution here appears more elegant (though a bit brain-twisting) to me. My only proposal to make it more simple is to inline once-used variables:

proc memoize {} {
    global memo
    set cmd [info level -1]
    if {[info level] > 2 && [lindex [info level -2] 0] eq {memoize}} return
    if {![info exists memo($cmd)]} {set memo($cmd) [eval $cmd]}
    return -code return $memo($cmd)
}
proc fib x {expr {$x < 3 ? 1 : [fib [expr {$x-1}]] + [fib [expr {$x-2}]]}}
proc fibm x {memoize; expr {$x < 3 ? 1 : [fibm [expr {$x-1}]] + [fibm [expr {$x-2}]]}}
% fib 20
6765
% fibm 20
6765
% time {fib 32}
3451830 microseconds per iteration
% array unset memo
% time {fibm 32}
339 microseconds per iteration

But maybe I'm a bit too much on the FP trip that variables are evil :)

male 2004-23-01: "FP trip" and evil variables? - RS FP: Functional programming. At least in some FP circles, variables that "can vary", that are reassigned values, are considered as harmful as goto in procedural languages is. Some FPers take great Joy in the like-named Forth-related language where you don't even have (named) arguments to functions - "everything's on the stack".

NEM: Am I right in assuming that this memoize function will only work with functions in the strict sense - in other words, if your procedure relies on (or generates) side-effects then the cache will not be valid? Generally, not using side-effects is a good thing, but many built-in Tcl commands produce side-effects, and almost all Tk commands do.

RS: Right - memoizing caches the result of f(x,y,...) for later calls with the same arguments, and returns the same result. So, e.g., gets stdin should better not be memoized :)

While looking at speeding up the code I got side tracked with Memoizing - Is it a good idea.

Strick 2004-02-25: Here is how I've been memoizing. It's actually two slightly different ways, one called "memo" used at call time, and one called "memoproc" used at proc definition time. (: Both tweak with your syntax a bit, in ways you can't do in most languages. :)

# First I define "memo", which is used by inserting it in front of the command to be memoized when it is called:

# memoize a function call
proc memo args {
    if {[info exists ::MEMO($args)]} {
        set ::MEMO($args)
    } else {
        set ::MEMO($args) [uplevel 1 $args]
    }
}

# Then based on the idea in that code, I define "memoproc", which replaces the word "proc" when a function is defined. The function *must not* use "return" -- so get out your K Combinator and write functional functions!

# auto-memoize a function -- it should not use return
proc memoproc {name argv body} {
    set b "set _k_ \[list [list $name]\]; "
    foreach pair $argv {
        append b "lappend _k_ \$[list [lindex $pair 0]]; "
    }

    append b "
    if {\[info exists ::MEMO(\$_k_)\]} {
        set ::MEMO(\$_k_)
    } else {
        set ::MEMO(\$_k_) \[ $body \]
    }
    "

    proc $name $argv $b
}

# Here's a returnless functional fibonacci to play with, based on the one above:

proc fibonacci x {
    if {$x <= 1} {
        expr 1
    } else {
        expr {[fibonacci [expr $x - 1]] + [fibonacci [expr $x - 2]]}
    }
}

# And here it is named "m-fibonacci", used memoproc:

memoproc m-fibonacci {x} {
    if {$x <= 1} {
        expr 1
    } else {
        expr {[fibonacci [expr $x - 1]] + [fibonacci [expr $x - 2]]}
    }
}

# Now try it straight:

foreach n {1 2 3 4 5 6 7 8} {
    puts $n...[fibonacci $n]
    puts $n...[time "fibonacci $n" 10]
}

# And with memo, inserting 'memo' before the command:

foreach n {1 2 3 4 5 6 7 8} {
    puts $n...[memo fibonacci $n]
    puts $n...[time "memo fibonacci $n" 10]
}

# And with memoproc, calling "m-fibonacci' instead:

foreach n {1 2 3 4 5 6 7 8} {
    puts $n...[m-fibonacci $n]
    puts $n...[time "m-fibonacci $n" 10]
}

Setok: For the interested, I've had a Cacheable class for XOTcl available which does the same thing on an object level. Because of the XOTcl filter mechanism you can just dynamically attach it to any class you want. Found it pretty useful ;-)

Zarutian 2005-04-18: Probably a little related but here goes: is there any way to determine if a procedure is pure function? aka has no side-effects?

One way is to check if a procedure contains an command that has an side effect. Which would be more efficent to list all core commands that have side effects or all core commands that dont? (This could probably be better of in an separate page perhaps called pure function?)

RHS: One problem is that every command in Tcl can be redefined at any time. Even if you manage to completely analyze everything called from within a proc and make sure there are no side effects (which would be amazingly difficult, I think), there's nothing preventing one of the commands it uses from being defined before the next time it's called.

I think an easier option would be to be able to define a command as a "pure function", and then make it clear that things will not work "as expected" if that isn't true.

CMcC: adds a wrinkle to RS' version (prompted by Lurch on the chat) such that setting an element memo() to some number will delete the memo array in its entirety if its size exceeds $memo() - this means the memo size is limited, fairly cheaply, to some arbitrary maximum:

proc memoize {} {
    global memo
    if {[info exists memo()]
        && ([array size memo] > $memo())
    } {
        # limit size of memo array
        set max $memo()
        unset memo
        set memo() $max
        return
    }
 
    set cmd [info level -1]
    if {[info level] > 2 && [lindex [info level -2] 0] eq "memoize"} return
    if { ![info exists memo($cmd)]} {set memo($cmd) [eval $cmd]}
    return -code return $memo($cmd)
}
set memo() 100000        ;# set an arbitrary (large) size limit on memoizing

NEM 2008-05-26: I was just playing around with some code, and came up with this simple memoization procedure. It works without needing to change either the source-code of the procedure or of callers:

proc memoize f {
    variable $f [dict create]
    rename $f _cache_$f
    interp alias {} $f {} remember $f
}
proc remember {f args} {
    upvar #0 $f cache
    if {![dict exists $cache $args]} {
        dict set cache $args [uplevel 1 [linsert $args 0 _cache_$f]]
    }
    dict get $cache $args
}

If we then define the simple Fibonacci function, and speed it up:

proc func {name params body} { proc $name $params [list expr $body] }
func fib n {
    $n < 2 ? $n
           : [fib [expr {$n-1}]] +
             [fib [expr {$n-2}]]
}
puts "Slow: [fib 30] in [time { fib 30 } 10]"
memoize fib
puts "Fast: [fib 30] in [time { fib 30 } 10]"

aspect: I don't know if this is a universally good idea, but I tend to put the memovar in the caller's namespace instead of making it a global. This puts a bit more responsibility on the caller, but gives them some control over automatic cleanup when a namespace is deleted. If you use a package namespace you need to be a bit careful of memory leaks. To get the caller's namespace, use uplevel 1 namespace current. If the caller wants more control, they can be given a -memovar option. Allowing the caller to specify a proc-local might be a good thing, but that's approaching too much magic ...

memoizing

See Also