Accumulator Generator

See also Accumulator Generators.


The Accumulator Generator is a problem used by Paul Graham to compare the relative power of different programming languages.

The Problem

We want to write a function that generates accumulators-- a function that takes a number n, and returns a function that takes another number i and returns n incremented by i.

(That's incremented by, not plus. An accumulator has to accumulate.)

The original essay: http://www.paulgraham.com/icad.html

The implementation in other languages: http://www.paulgraham.com/accgen.html

pmarin(2009-12-18). The Rules (from http://paulgraham.com/accgensub.html )

  1. Takes, and returns functions that take, exactly one argument.
  2. Works for any numeric type-- i.e. can take both ints and floats and returns functions that can take both ints and floats. (It is not enough simply to convert all input to floats. An accumulator that has only seen integers must return integers.)
  3. Generates functions that return the sum of every number ever passed to them, not just the most recent.
  4. Returns a real function, meaning something that you can use wherever you could use a function you had defined in the ordinary way in the text of your program.
  5. Doesn't store the accumulated value or the returned functions in a way that could cause them to be inadvertently modified by other code.

My (pmarin) implementation:

package require TclOO
namespace import oo::*

proc foo n {
        set obj [object new]        
        objdefine $obj method init {_n} {my variable n; set n $_n}
        objdefine $obj method acc i {my variable n; set n [expr {$n + $i}]}
        $obj init $n
        list ::apply [list i [subst -novariables {[set obj] acc $i}]]
}
proc destroy_acc acc {
        [lindex $acc 1 1 0] destroy
}

A test:

> set acc [foo 3]
> {*}$acc 1
4
> {*}$acc 10
14
> {*}$acc 0
14

> destroy_acc $acc
> {*}$acc 1
invalid command name "::oo::Obj

Comments

I'm not happy with my implementation, It works but It is still too vebose compared with other languages.

I'm not sure if garbage collection is obligatory to solve this problem.

Feel free to insert here your own better/clever approach.


WHD: How about this?

proc mkag {name n} {
    set ::${name} $n

    proc ::${name} {i} [list incr ::${name} \$i]
}

pmarin: Sorry, I don't understand your code, Can you make a similar test like the above? Note that the accumulator must work with integer and floats.

AMG: When WHD's mkag procedure is invoked, it (1) sets a global variable and (2) creates a procedure which increments that same global variable and returns its updated value. The global variable and the created procedure are given the same name, determined by the first argument to mkag. The second argument determines the increment step. Because the procedure created by mkag uses incr, it only works with integers, not floats. To make it work with either, create an alternative to incr:

proc incr {varname {increment 1}} {
    upvar 1 $varname var
    set var [expr {$var + $increment}]
}

DKF: With 8.6, you can do this using coroutines.

coroutine accumulator apply {{} {
    set n [yield]
    while 1 {
        set n [yield [coroutine accumulator.[incr counter] apply {n {
            set i [yield [info coroutine]]
            while 1 {
                catch {set n [expr {$n + $i}]}
                set i [yield $n]
            }
        }} $n]]
    }
}}

Testing...

% set a [accumulator 3]
::accumulator.1
% $a 1
4
% $a 1
5
% $a 10
15
% $a 0
15
% $a 1.5
16.5
% $a 1.5
18.0
% set b [accumulator 1]
::accumulator.2
% $b -1
0
% $b -1
-1
% $b -1
-2
% $b -1
-3
% $b 100
97
% $a 1
19.0

The only real issue with this code is that it doesn't give good error messages when misused.


slebetman: Here's one that is portable up to when [info body] was introduced: way before Tcl 7.0 I suspect. Using self-modifying code (mechanics and implementation is ugly but the fact that it's doable is beautiful):

  proc accumulator {name n} {
    proc $name {i} "proc $name {i} \[info body $name\]+\$i;expr \$i+$n"
  }

Usage...

% accumulator acc 3
% acc 1
4
% acc 10
14
% acc 0
14

The reason I say the mechanics is ugly is because at this point:

% info body acc
proc acc {i} [info body acc]+$i;expr $i+3+1+10+0

the proc body keeps on growing...


NEM suggests:

proc accum {name n} {
    interp alias {} $name {} incr $name
}

slebetman: I think that should be:

proc accum {name n} {
    interp alias {} $name {} incr ::$name
    set ::$name $n
}

but it still doesn't work properly (as specified by the essay):

% accum x 10
10
% x 4
14
% x 3.7
expected integer but got "3.7"

AMG: Here's one written using sproc:

sproc accumulator {{increment 1}} {value 0} {
    set value [expr {$value + $increment}]
}
set p [accumulator]
$p                    ;# returns 1
$p                    ;# returns 2
$p 0.5                ;# returns 2.5
set q [accumulator value 20]
$q                    ;# returns 21
$q                    ;# returns 22
$q 0.5                ;# returns 22.5

As far as I can tell, the only one of Paul Graham's rules that it breaks is the first one, for two reasons. One, the increment argument is optional (defaults to 1). To be strict, replace "{{increment 1}}" with simply "increment". You can write it "{increment}" if you want. Two, the initial value argument is both optional and must be prefixed by the word "value". This can be easily resolved by making a wrapper procedure.

sproc Accumulator {increment} {value 0} {
    set value [expr {$value + $increment}]
}
proc accumulator {value} {
    Accumulator value $value
}
set p [accumulator]   ;# wrong # args: should be "accumulator value"
set q [accumulator 20]
$q                    ;# wrong # args: should be "::::Accumulator0 increment"
$q 0.5                ;# returns 20.5

Now it's compliant. :^)


DKF: A more nifty coroutine solution. This splits out the non-obvious bits into the coro and coloop procedures, which respectively make coroutines (without a helper procedure) and provide the looping body without horrible, non-obvious information flow.

proc coro {name arguments body args} {
    coroutine $name apply [list $arguments $body] {*}$args
}
proc coloop {var body} {
    set val [info coroutine]
    upvar 1 $var v
    while 1 {
        set v [yield $val]
        set val [uplevel 1 $body]
    }
}

coro accumulator {} {
    coloop n {
        coro accumulator.[incr counter] n {
            coloop i {
                set n [expr {$n + $i}]
            }
        } $n
    }
}

This code is a little longer than the previous solutions, but it's crystal clear what it is doing.


slebetman: In the Accumulator Generators page Paul Graham rejected proc based solutions because a proc in tcl is basically a global thing. The point he was trying to make was that languages without closure suck. So my first reaction is to look up the Closure page and write something based on that. Not only did someone else beat me to it but the solution also did not meet Paul's requirements because the emulated closure is stored in a namespace which, like procs, is a global thing. We can't even use an interp because the interp will have to be named and that name is, again, a global thing.

But I refuse to believe that tcl can't do what Paul wants. So here's an implementation that meets all his requirements for this particular task yet does not use a real closure. What it uses instead is something a bit more limited: static variables in lambda expressions:

# implementation of static vars in lambdas:
proc static {var val} {
    uplevel 1 [list lappend \0STATIC_VARS $var]
    uplevel 1 [list set $var $val]
}

proc apply_with_static {lambda args} {
    upvar 1 $lambda \0LAMBDA
    foreach \0VAR [lindex [set \0LAMBDA] 0] \0VAL $args {
        set [set \0VAR] [set \0VAL]
    }

    set \0RETURN [eval [lindex [set \0LAMBDA] 1]]

    foreach \0STATIC [set \0STATIC_VARS] {
        lset \0LAMBDA 1 [regsub -linestop -lineanchor -- \
            "static\[ \t\]+[set \0STATIC]\[ \t\]+.*?(;.*)?$" \
            [lindex [set \0LAMBDA] 1] \
            "static [set \0STATIC] [set [set \0STATIC]]\\1"
        ]
    }
    set \0RETURN
}

Static vars are implemented using self-modifying code (modified by a regular expression no less!). [apply_with_static] is a limited emulation of apply for this proof of concept. But it can be extended to be compatible with the real [apply] if necessary. So here's the implementation of the accumulator generator using [apply_with_static]:

proc foo {n} {return [list {x} [string map [list _N_ $n] {
    static y _N_
    set y [expr {$x+$y}]
}]]}

set acc [foo 1]
apply_with_static acc 2
apply_with_static acc 2.5
puts [apply_with_static acc 0]; # outputs 5.5

# Example of automatic garbage collection:
proc acc_test {} {
    set acc2 [foo 0]
    apply_with_static acc2 1
    apply_with_static acc2 2
    apply_with_static acc2 3
    return [apply_with_static acc2 0]
    # at this point acc2 goes out of scope
    # and is automatically garbage collected
}
acc_test; # returns 6

Since the lambda expression is just a string/list it is automatically garbage collected if and when it goes out of scope. Which is exactly the behavior required by Paul. Alas, he no longer accept submissions to his web site.

AMG: As written, this fails another one of Paul's criteria. [foo] does not return "a real function," since it can't be called in the same way as "functions you had defined in the ordinary way." This can be (crudely) remedied using [unknown] or [namespace unknown]:

proc unknown {args} {
    uplevel 1 apply_with_static $args
}
# or:
namespace unknown apply_with_static

Now the example works like this:

set acc [foo 1]
acc 2
acc 2.5
puts [acc 0]; # outputs 5.5
proc acc_test {} {
    set acc2 [foo 0]
    acc2 1
    acc2 2
    acc2 3
    acc2 0
}
acc_test; # returns 6

Your approach is very clever, but it differs from the proc-based approach in a crucial way, which may or may not be desirable: There's no way to have multiple references to a single closure object. Attempting to create a new reference actually creates a new copy. I will demonstrate:

set acc [foo 1]
acc 2; # returns 3
set acc2 $acc
acc -3; # returns 0
acc 0; # returns 0
acc2 0; # returns 3

If the closure is instead a proc (which stores data in its default arguments, in a global variable, in its coroutine execution frame, or in someplace more exotic), the acc and acc2 variables would contain only the name of that proc, so they would both be references to the same object, not separate copies of (different versions of) the object.

By the way, your regular expression doesn't work for me, since the .* gobbled up the newline and everything after. Adding -linestop fixed the problem. Also I replaced your // with # so that the example can be pasted into tclsh.

Paul rejects proc-based solutions because of vulnerability to name collisions ("the functions returned by foo have to be named, and there is no sure way to avoid name clashes" [L1 ]), which would break the inadvertent modification of code or data rule.

There's two ways to interpret this, depending on the meaning of the word "inadvertent." If inadvertent includes malicious, closure objects must be opaque or at least read-only, so that there's no way they can be picked apart and modified, e.g. by using [lset] on the variable storing the closure. Due to EIAS, Tcl cannot simultaneously satisfy this interpretation and the other rules. But any dictionary will tell you that inadvertent does not include malicious [L2 ], and it's a fool's errand to design a language in which it is impossible to write incorrect code, especially intentionally incorrect code. So let's rule that out and instead focus on accidental modification, which happens only due to name collisions. Or it could happen due to bugs in the closure mechanism, which I also rule out for the purposes of this discussion.

Since it is possible to programmatically pick unique names, I assume Paul's gripe isn't that the generated procs might be given names equal to those of existing procs, but rather that fixed-name procs defined later (perhaps interactively) might be given names equal to those of generated procs. (It's a fixable bug for autogenerated names to clobber each other.) Namespaces (both simple prefixes and proper ::-style namespaces) mitigate this problem, but the vulnerability still exists in principle, if not in practice. (Language comparisons frequently exhibit bias by neglecting practical considerations.)

The vulnerability would still exist even without proc names being global: If a proc name could be scoped to a stack frame, it could still be clobbered by a proc defined later in that same stack frame. Really, I think Paul's complaint is that an internal, autogenerated, temporary name is visible and vulnerable to all (or parts) of the program outside of the closure mechanism; I also think that this constraint is not expressed in Paul's rules. To satisfy Paul, the generated closure must itself be anonymous, accessible only through indirection.

All that being said, I don't think satisfying Paul Graham is a worthwhile goal, especially since he's not going to update his accumulator article anyway. Rather we should be interested in making practical closures, even if "practical" means having having to deal with rough edges such as non-anonymous procs. Think about it: we already use specialized closure-like (code+data) objects implemented as procs/commands given autogenerated names typically stored into variables, and we've managed so far. Examples: channels, interps, TclOO objects made with [new], images, fonts. Also even though some commands that generate code+data objects require a name argument, often the script autogenerates the names because the names themselves aren't important. Examples: Tk widgets can be given random names and accessed indirectly through variables, Snit autogenerates [namespace] names, creating (in a loop) many coroutines with the same body.

In short: I don't mind if the internal code and data of the closure mechanism are visible to the rest of the script. After all, Tcl is the language of introspection.

I'm more concerned about lack of automatic garbage collection for closures. Here, "closure" refers to all code+data objects implemented as auto-named procs/commands which are accessed through indirection. See above for a list of examples. Even though I have no idea how this could possibly be implemented, I'd like for the closures to be deleted when they cease to be reachable, meaning that all data structures (usually simple scalars) containing the name (reference) go out of scope. [trace add variable X unset] doesn't do the job, since it keeps track of variables, not values, and values have a way of migrating from variable to variable.

But I'm not terribly concerned. Again, we've managed so far. Manual, explicit deletion has its charms.


wdb with Tcl 8.5, the command apply treats strings as true closures, thus use it as result for accumulator generators:

% proc gen-accu n {list x "expr {\$x + $n}"}
% set accu4 [gen-accu 4]
x {expr {$x + 4}}
% apply $accu4 10
14
% 

AMG: How can such a closure-string update its internal data? Keep doing [apply $accu4 10] all you want; it'll always return 14.

slebetman: As mentioned by AMG, apply does not create a closure. There is a general misunderstanding on the internet that confuses closures with anonymous functions. Apply merely creates anonymous functions. But since tcl has a strict value semantics it is not easy in principle to implement closures which by its very nature has reference semantics.

A closure is not an anonymous function. It is instead a kind of localized global variable where all functions defined in the same scope see the variable as a global variable but all other functions defined outside that shared scope can't see the variable as a global variable.

Here's an example of what a closure really is:

# if tcl really had closures:
apply {{} {
  set x 0 ;# this will be captured as a closure by all procs and
           # lambda expressions defined in here

  proc incrX {} {global x; incr x}
  proc decrX {} {global x; incr x -1}
  proc valueX {} {global x; return $x}
}}
incrX
incrX
puts [valueX] ;# prints 2
decrX
puts [valueX] ;# prints 1
puts $x ;# error because x is not defined

As illustrated by the above examples, neither incrX, decrX nor valueX are anonymous functions. But they are still closures. The closure bit does not refer to weather a function is anonymous or not. Instead it refers to the fact that the function captures (closes over) a locally scoped variable at the point where the function is defined so that it is able to treat it as a global variable.

DKF: Not quite. A closure variable is not global, it is context-defined. The code to set up the closure-defined variable would look more like this (though with the same behaviour as above otherwise):

apply {{} {
    contextvariable x
    set x 0
    proc incrX {} {incr x}
    proc decrX {} {incr x -1}
    proc valueX {} {return $x}
}}

The two real problems are defining how to perform introspection on context variables, and defining what happens when you use eval with these things about. By comparison, making proc pick up the current set of context variables at the time it is called should be fairly easy. (I've no idea about apply, unless we can make current contexts named entities; that would help with introspection, perhaps...)

slebetman: Which is what I defined closures as: "private" "global" variables that are not visible to other procs defined outside the shared scope. Since tcl does not import globals by default like other languages it makes sense that something like [global] be used to import the closure - just like how globals currently work. If people feel uncomfortable with using [global] to do this then perhaps another command could be implemented [upscope] perhaps?

apply {{} {
  set x 0 ;

  proc incrX {} {upscope x; incr x}
  proc decrX {} {upscope x; incr x -1}
  proc valueX {} {upscope x; return $x}
}}

PHP (which shares tcl's don't-import-globals-by-defalut semantics) solves this differently by doing something like this:

# proc name args body ?closures?

apply {{} {
  set x 0 ;

  proc incrX {} {incr x} x
  proc decrX {} {incr x -1} x
  proc valueX {} {return $x} x
}}

personally, I'd prefer overriding the [global] command to enable it to import closures as well as globals since a global varaible is merely a closure of the global scope.

IMHO the best way to support closures is to perform early byte-compilation rather than lazy. So [procs] are quite easy. As for lambda expressions passed to [apply], it means that we need to check each word as it is consumed by the interpreter to see if it could potentially be a lambda expression. That sounds messy and slow to me and doesn't seem particularly robust.

Of course, we could just define a command to take away the ambiguity:

# force early byte-compilation of lambda expressions
# so we can capture closures:
apply [closure {} {
  set x 0
  set ::f [closure {} {incr x}]
}]
apply $f

Technically, the semantics of closures are exactly the same as any variable we import from foreign scope with globals and upvars. Only that instead of the calling scope we link the variable to the scope in which the string representing the proc's/lambda expression's body is defined. So whatever means we currently use to introspect globals and upvars can be used as inspiration of how to do it for closures.