Result caching

Difference between version 15 and 16 - Previous - Next
[Richard Suchenwirth] 2002-07-19 - If you repeatedly do the same time-consuming calculations (e.g. string [similarity], or especially [the Q function]), it may be a good idea to keep the previous results in a cache and, for every call of the same command after the first, just retrieve the cached value.

This implementation uses an array in caller's scope (he might make it global, or not) whose name is explicitly given, indexed by the command. This way, you could have multiple caches in parallel, and selectively unset some when needed.
Note that variable values are not substituted in a braced command - to tie the cached result to the variable values, use [list] to group the command:
 set result [cached c [list myLongCommand $i $j]]
----
 proc cached {cacheName command} {
    upvar 1 $cacheName cache
    if {![info exists cache($command)]} {
        set cache($command) [uplevel 1 $command]
    } else {
        set cache($command)
    }
 }
 # ---------------------------------------- testing:
 % time {cached c {stringSimilarity "Tool Command Language" "Tool Command Languages"}}
 422000 microseconds per iteration
 % time {cached c {stringSimilarity "Tool Command Language" "Tool Command Languages"}}
 0 microseconds per iteration
 % array get c
 {stringSimilarity "Tool Command Language" "Tool Command Languages"} 0.953488372093

The second call saves you more than 0.4 seconds...(on second try with cleared cache, it was only 63 ms, though).
Note however that caching results makes only sense if the command always returns the same value - caching commands like
 gets $fp
 expr rand()
 clock seconds
is certainly a bad idea... And side effects are of course not produced, as the command isn't executed after the first time.

''[DKF] notes:'' The above timings aren't very good (there's granularity problems) but on Solaris8 (on a not-very-fast processor) I get:

 % time {cached c {stringSimilarity "Tool Command Language" "Tool Command Languages"}}
 97151 microseconds per iteration
 % time {cached c {stringSimilarity "Tool Command Language" "Tool Command Languages"}} 200
 44 microseconds per iteration
----
''[Philip Greenspun] calls this concept "memoization"...'' A heavyweight XOTcl version is on [Cacheable class]. See also [memoizing].
----
[NaviServer] has the commands '''ns_cache_eval'''[http://naviserver.sourceforge.net/n/naviserver/files/ns_cache.html] and '''ns_memoize'''[http://naviserver.sourceforge.net/n/naviserver/files/ns_memoize.html], which are thread friendly. The cache-data is global, shared by all threads; the interlocking is transparent. Also, if one thread is computing ''expensive_function'' because the cache is empty/stale, and then another thread also checks the cache, the second thread will suspend until the first thread completes the cache update. Timeouts can be specified.
----
See [Arrays as cached functions] where you can call $f($x,$y)... or [A size-bounded cache].
----To avoid caching results when passing in arguments as variables:
To capture the values of variables referenced by the command:

======
proc cached {cacheName command} {
    set index [uplevel 1 "[list subst {$command}"]]
    upvar 1 $cacheName cache
    if {![info exists cache($index)]} {
        set cache($index) [uplevel 1 $command]
    } else {
        set cache($index)
    } }
======

So we can then call: set a 1
 cache c {expr {$a + 1}}
 set a 2
 cache c {expr {$a + 1}}
======
set a 1
cache c {expr {$a + 1}}
set a 2
cache c {expr {$a + 1}}
======

And not get erroneous results.
[PYK] 2020-01-18:  First, I've replaced

======
"subst {$command}"
======

, which is incorrect, with 

======
[list subst $command]
======

, which is correct.  See [Tcl Quoting], [list], and [eval] for explanations.

Second, even with that correction made, this routine is problematic.  `[subst]`
is a poor stand-in for either a proper Tcl parser or a parser of the
expressions that `[expr]` understands.  It's going to make
[substitution%|%substitutions] that weren't intended, and fail to make
substitutions that were intended.  Also, the implementation results in each
substitution being performed twice, which will have unexpected effects in some
cases.

A more correct way to go about it is to implement caching schemes that are
customized to the routines they serve.

 
----

<<categories>> Concept | Caching | Arts and crafts of Tcl-Tk programming