accumulate and collect

Inlining Loops

by slebetman

Very often, when constructing data structures (especially lists) in a loop, I've had to write something like this:

set result {}
foreach x $input {
    lappend result [process $x]
}

that is, declare a variable before the loop and manually appending results to that variable inside the loop. I thought, there must be a better way to do this. Well, there is map and filter from functional programming. They'll handle the common cases. But sometimes I really need the extra power of foreach. Wouldn't it be nice if foreach behave like map and filter?

So I started thinking about writing a custom control structure. But rather than just another looping construct like foreach or map I wanted a control structure that can be applied to any and all loops. This is what I ended up with:

set result [accumulate {
    foreach x $input {
        collect [process $x]
    }
}]

I'm calling this accumulate and collect. The accumulate function simply evals the string passed to it constructing a list from values collected by the collect function. Basically, this is an encapsulation of the set..lappend idiom above.

The accumulate function is nestable such that:

accumulate foreach x {1 2 3} {
    collect [accumulate foreach y {a b c} {
        collect "$x$y"
    }]
}

would return:

{1a 1b 1c} {2a 2b 2c} {3a 3b 3c}

This is great. It allows me to treat loops like functions that return lists. And I don't have to declare pesky temporary variables!

Then I realized something:

A Fancy List Generator

This is another problem I keep facing. I've long wished that tcl had a list generator where I don't need to backslash escape all the time. I hate having to do:

set foo [list \
    a 1 \
    b 2 \
    c 3 \
]

Apart form all the '\' looking ugly, it is also error prone. I often forget to add a '\' and tcl complains about 'c' being an invalid command name.

Of course I could use {} to generate lists:

set foo {
    a 1
    b 2
    c 3
}

but this doesn't work if I need to perform variable substitution. And doing it with "" is not only hard to read but much more error prone than list!

While playing with accumulate..collect I realized that it is just a fancy list generator. Is this the list generator that I've been wanting for so long? Let us try it out:

set foo [accumulate {
    collect a; collect 1;
    collect b; collect 2;
    collect $argv;         # Yes, variable substitutions work.
    collect [glob *];      # It works! And look, comments!
}]

Wow, like so many things in tcl this came as a complete surprise. I've been waiting for something like this for so long! But the syntax looks a bit cumbersome. Let's see if we can remedy that:

interp alias {} List {} accumulate
interp alias {} : {} collect

set foo [List {
    : a
    : b
    : c
    : $argv
    : [glob *]
    : [List {   #nested!
        : 1
        : 2
        : 3
        # and even plays well with [list]:
        : [list x y z]
    }]
}]

I love it!

Implementation

So, without further ado, here's the implementation of accumulate and collect:

set accumulator {}
proc accumulate {args} {
    if {[llength $args] == 1} {
        set args [lindex $args 0]
    }

    lappend ::accumulator {}
    set code [catch {uplevel 1 $args} result]
    switch -- $code {
        0 {}
        3 break
        4 continue
        default {
            set ::accumulator [lrange $::accumulator 0 end-1]
            return -code $code $result
        }
    }
    set ret [lindex $::accumulator end]
    set ::accumulator [lrange $::accumulator 0 end-1]
    return $ret
}

proc collect {value} {
    set acc [lindex $::accumulator end]
    lappend acc $value
    lset ::accumulator end $acc
}

AMG: Here's a version implemented using coroutines.

proc accumulate {args} {
    if {[llength $args] == 1} {
        set args [lindex $args 0]
    }
    set coro [info coroutine]-accum
    set accumulator [list [coroutine $coro eval $args]]
    while {[llength [info commands $coro]]} {
        lappend accumulator [$coro]
    }
    lrange $accumulator 0 end-1
}

proc collect {value} {
    yield $value
}

I don't like the names "accumulate" and "collect" since they're virtually the same word. It's not clear that one is providing data to the other, that one is gathering data produced by the other.

I can't decide if the method I used to generate unique coroutine names is genius or madness. I guess it can be both! A fixed name won't cut it for nested use of accumulate/collect, such as in the "{1a 1b 1c} {2a 2b 2c} {3a 3b 3c}" example.

The script body is executed in the global stack frame, not in the caller's stack frame. I don't think this is possible to fix.

I frequently need the functionality provided by your code, and I have implemented it in various ways. Examples follow:

One example is [wibble::applytemplate], which builds and executes a script by combining a command with a template. [wibble::template] calls applytemplate using "dict append response content" as the command and the on-disk file as the template. The result is that the subst'ed template is appended in chunks to the content value in the response dict, which is eventually pushed to the client's Web browser. Custom code can pass different commands to applytemplate to make it do other things with the subst'ed template, for example puts'ing to stdout. [apply] is very useful in this context.

Another example is [splitscript] on the coroutine page, which executes a script in a safe interp that has almost no commands nor variables (only [lappend*], [unknown*], and ${script*}). Since almost nothing is defined, every line invokes [unknown*], which uses [lappend*] to add its $args to ${script*}. After execution completes, [splitscript] fetches the value of ${script*}, deletes the interpreter, and returns ${script*}'s value. This converts a Tcl-like script into a list of commands, chewing through backslash substitution, comments, whitespace, semicolons, and so on. For extra fun, [splitscript] can be modified to expose some commands like [foreach], [proc], and [set] for the input script to exploit. If the input script does something in a loop, that something will be emitted repeatedly, perhaps with variations due to embedded variable substitutions. See Config file using slave interp for more.

A third, non-Tcl example is writing to stdout inside sh subshells. The following pretty-prints a directory listing, grouping directories before non-directories. Both echo and printf contribute to sed's input stream.

(
    for x in *; do
        if [ -d "$x" ]; then
            echo "         $x/"
        fi
    done
    for x in *; do
        if [ ! -d "$x" ]; then
            printf "%8d %s\n" `stat -c "%s %n" "$x"`
        fi
    done
) |
sed = |
sed 'N; s/^/     /; s/ *\(.\{3,\}\)\n/\1. /'

slebetman: Funny you should mention templates because that is what I use it for. I use plain old subst for all my templating needs thereby allowing me to leverage the full power of tcl in my templates. For syntax sugaring, I've developed a suite of procs to make life easier. In fact, I am currently in the middle of writing up a page for the wiki (not on here yet) detailing idioms I use when using subst for templating.

Here's what my templates tend to look like:

<html>
<body>
    <table>
        [void {flipflop reset}]
        [build foreach {x y} $data {collect {
            <tr class="[if [flipflop] {give odd} {give even}]">
                <td> $x </td>
                <td> $y </td>
            </tr>
         }}]
    </table>
</body>
</html>

note: in the code above [build] is a shortcut for [join [accumulate]]]

AMG: I assume you define [void] and [give] thusly:

proc void {script} {uplevel 1 $script; list}
proc give {value} {set value}

If this is the case, then your templatized HTML can be written as follows, using the Wibble template system (credit JCW). In my opinion it's simpler for this example, but there could be times when [accumulate] works better.

<html>
<body>
    <table>
% flipflop reset
% foreach {x y} $data {
        <tr class="[if [flipflop] {give odd} {give even}]">
            <td> $x </td>
            <td> $y </td>
        </tr>
% }
    </table>
</body>
</html>

slebetman: Yes, but my main point is to use subst as the templating system without additional parsers - just plain old tcl ;-) Why write parsers when the interpreter itself is a very capable parser. Besides, this way I can claim that my templating engine is written in C.