Going Loopy with Coroutines

Introduction

NEM 2009-08-07: An obvious use of the new coroutine feature in Tcl 8.6 is to aid in the construction of on-demand loops or iterators. The basic process is fairly straight-forward, and some example code is show dotted around this wiki. For example, to iterate over the integers in a range we can write a generator procedure as follows:

package require Tcl 8.6
proc range {n m} {
    for {set i $n} {$i <= $m} {incr i} { yield $i }
}

We can then create a coroutine to wrap this procedure and then generate successive values:

set i [coroutine it range 1 10]
while {[llength $i]} {
    puts $i
    set i [it]
}

This will print the numbers from 1..10. While fairly simple to implement, it doesn't really seem very elegant, with a bit of boilerplate for constructing and naming the coroutine, checking whether it's still valid and then fetching the next value. It's also a little clumsy to handle break or continue, as you then have to add more boilerplate to ensure the coroutine is deleted (in case of early termination), or that the next value is still fetched (in case of continue). As this is Tcl, however, we can wrap all this boilerplate into a nice reusable custom control structure that works like the usual foreach command:

proc iterate {varName generator body} {
    upvar 1 $varName item
    set it ::iterator[incr ::iterate]
    set item [coroutine $it {*}$generator]
    while {[llength $item]} {
        uplevel 1 $body
        set item [$it]
    }
}

This handles the basic boilerplate, and we can now write our loop more pleasingly as:

iterate i {range 1 10} { puts $i }

This works for the basic case, but a break will still leave the coroutine hanging around, while a continue will cause the loop to hang completely! To solve these problems, we can add a bit more error-handling code to the iterate procedure:

proc iterate {varName generator body} {
    upvar 1 $varName item
    set it ::iterator[incr ::iterate]
    set item [coroutine $it {*}$generator]
    try {
        while {[llength $item]} {
            try {
                uplevel 1 $body
            } on continue {} {
                # ignore - but ensure next value is fetched
            } on return {result options} {
                # increment the -level to remove this proc from the stack
                dict incr options -level
                return -options $options $result
            }
            # Fetch the next item
            set item [$it]
        }
    } finally {
        # Ensure the coroutine is cleaned up
        if {[llength [info commands $it]]} { rename $it "" }
    }
    return
}

So, let's now look at some other applications of this iteration construct:

Iterating over Data Structures

We can iterate over data structures, such as binary trees (using the code at the bottom of Algebraic Types to define the tree):

datatype define tree = Empty | Leaf val | Branch left right
proc traverse tree {
    datatype match $tree {
        case [Empty]      -> { return }
        case [Leaf v]     -> { yield $v }
        case [Branch l r] -> { traverse $l; traverse $r }
    }
}
set tree [Branch [Branch [Leaf 1] [Branch [Leaf 2] [Empty]]] [Leaf 3]]

iterate x [list traverse $tree] { puts "x = $x" }

Iterating over file contents

We can iterate over the lines in a file easily enough too:

proc lines file {
    set in [open $file r]
    while {[gets $in line] >= 0} { yield $line }
    close $in
}
iterate line {lines /etc/passwd} { puts "[incr i]: $line" }

Iterating over Database Result Sets

We can iterate over the rows in a tdbc result set using the nextdict method (providing rougly equivalent functionality to the built-in internal iterators that tdbc provides):

proc rows rs {
    while {[$rs nextdict row]} { yield $row }
    $rs close
}
set resultset [$db ...] ;# execute some query
iterate row [list rows $resultset] {
    puts [dict get $row name]
    puts [dict get $row age]
}

Networking

We can also use this to simplify iterating over results of remote queries using various network protocols. For example, fetch a list of articles from a nntp server:

proc headers {nntp group} {
    foreach artid [$nntp newnews $group] {
        yield [$nntp head $artid]
    }
    $nntp close
}
iterate h [list headers $conn comp.lang.tcl] {
    puts [join $h \n]
    puts ----
}

Issues

The advantages of coroutines and a custom control construct here should be clear: both the code that uses an iterator, and the code that implements it, can be very clear and simple. The general-purpose iterate control structure takes care of all the exception handling, coroutine generation, and cleanup. Indeed, it seems that careful use of coroutines can probably eliminate the need for writing several classes of custom control structure, as the basic control code can be packaged up and then reused in different situations. However, there is still a major problem with this approach: resource cleanup. While the iterate procedure can handle early termination and ensure the coroutine is destroyed, it cannot know whether the coroutine procedure itself has allocated any resources: such as opening new channels or database connections. For instance, if I write the following code, I will leak a channel:

iterate line {lines /some/big/file} { break }

The reason is that the lines generator opens a file channel for the target file, but is then immediately destroyed by the early termination of the iterator: it gets no notification that this is about to happen and so cannot cleanup correctly. Writing the lines generator as a traditional custom control construct would solve this problem, as it can then catch the break exception and cleanup as required. However, we then are back at duplicating all of the exception handling boilerplate in each custom iterator. I'm not sure of the best way to solve this problem. One approach would be to recommend that generators do not allocate resources, but instead always have them passed to them:

proc lines chan { while {[gets $chan line] >= 0} { yield $line } }
set f [open /some/big/file]
try {
    iterate line [list lines $f] { puts $line }
} finally { close $f }

This is almost as ugly as the original code, but we can also clean this up somewhat with using:

using f [open /some/big/file] {
    iterate line [list lines $f] { puts $line }
}

Another approach would be to somehow notify the coroutine that it is being terminated. This could be done via a return from the yield command. For example, changing the finally clause of the iterate implementation to be:

... finally {
        # Ensure the coroutine is cleaned up
        if {[llength [info commands $it]]} { 
            #rename $it "" 
            $it return
        }
    }
...

And then changing the implementation of each generator procedure to something like:

proc lines file {
    set in [open $file]
    try { 
        while {[gets $in line] >= 0} { eval [yield $line] }
    } finally { close $in }
}

In this approach, we pass a return command back to the generator, causing it to exit. The finally block in the generator then takes care of ensuring the channel is closed. Such an approach could work, but seems fragile.

A further alternative option would be to add a command delete trace to the coroutine command and then let the generator run some sort of destructor on exit. For example:

proc eventually args {
    trace add command [info coroutine] delete [lambda args $args]
}
proc lambda {params body} { list ::apply [list $params $body] }
proc lines file {
    set in [open $file]
    eventually close $in
    while {[gets $in line] >= 0} { yield $line }
}
iterate line {lines /etc/passwd} { puts $line; break }

I think this option looks the best to me: it's robust and simple and easy to work with. It also seems somewhat nicer than explicit try/catch for coding complex cleanup requirements -- just follow each resource acquisition with an eventual destructor.

The other issue with the construct at present is generality. It would be useful to extend the iterate procedure to handle multiple variable names, and multiple generators, much like foreach. I leave that as an exercise...

A minor stylistic point is the current syntax for generators. The generators specified so far all are meant to be called as a coroutine. This means that they then have to be braced or list-quoted when passed to iterate, which is a little ugly. An alternative would be to make them work more like combinators: they return a generator. To illustrate:

rename lines _lines
proc lines file { list _lines $file }
iterate line [lines /etc/passwd] { ... }

This may seem a minor stylistic point, and it is, but little touches like that can make usage much more pleasant!

NEM 2009-09-04: I've created a generators package that expands on the ideas in this page, and polishes them into a decent package. Available from [L1 ] with a manual page at [L2 ]. The iterate command has now been named generator foreach and fully supports the built-in foreach command syntax, including looping over multiple generators and extracting multiple elements at each iteration. Some of the examples from the manpage:

# infinite lists
generator define nats {} { while 1 { generator yield [incr nat] } }
# ranges
generator define range {n m} {
    for {set i $n} {$i <= $m} {incr i} { generator yield $i }
}
# cleanup via "finally"
generator defines lines file {
    set in [open $file]
    generator finally close $in
    while {[gets $in line] >= 0} { generator yield $line }
}
# Sum the numbers in a file
puts "sum = [generator sum [lines expenses.txt]]"
# Convert to/from lists/dicts/strings
generator foreach ch [generator from string "Hello, World!"] { puts $ch }

# Square numbers
proc square x { expr {$x**2} }
puts [generator takeList 20 [generator map square [nats]]

# Factorial
proc fac n { generator product [range 1 $n] }
# Fibonacci numbers via iteration
proc fst pair { lindex $pair 0 }
proc nextFib prev { list [fst $prev] [expr [join $prev +]] }
proc fibs {} { generator map fst [generator iterate nextFib {0 1}] }

Category Control Structure

Category Data Structure