Iterator Protocol

Iterator Protocol is a description of one method for implementing iterators in Tcl.

Description

A common iterator protocol is presented. This is particularly useful with coroutines, but could be implemented in other ways as well. The protocol is simple: An iterator is a command which one calls repeatedly until the command no longer exists. The last value returned by the command before it ceases to exist should not be considered one of the iterated values, as the iterator command has no way of knowing whether it is finished or not until the last time it is called, and must therefore return one time after there are no more valid values to return. for ... in ... is a drop-in replacement for for that uses this protocol and provides an expanded syntax for looping over iterators.

This example implements the iterator protocol:

proc iterproc {} {
    while {some condition} {
        return [
            something useful
        ]
    }
    rename iterproc {}
    return
}

set procname [namespace which iterproc]
while 1  {
    set val [iterproc]
    if {[namespace which [namespace current]::iterproc] ne $procname} {
        break
    }
    do stuff with $val
}

For a coroutine, a little more plumbing is usually added:

proc iterproc {} {
    #the first yield returns an out-of-band value to give the caller a chance
    #to settle the name of the proc
    yield [info coroutine]
    while {some condition} {
        yield [something useful]
    }
    return
}

set procname [iterproc myiterator]
while 1  {
    set val [myiterator]
    if {[namespace which [namespace current]::iterproc] ne $procname} {
        break
    }
    do stuff with $val
}

Making the last value a signal rather than a valid value is similar to the protocol for file operations, where an empty string signals some file condition, and eof provides more detail about that condition. Without this mechanism, an iterator would have to pre-compute the next value and be prepared to return it at the next invocation, which defeats the desirable feature of lazy computation, where the iterator doesn't compute the next value until asked for it.


PYK 2016-08-30:

Coroutines in the ycl collection that use this protocol have also recently begun to sprout interfaces in which the first argument of each call to the coroutine is next in order to receive the next value. This opens the door for communication with an iterator through other words it may choose to respond to. For example, ycl chan iter provides prepend as a way to push data back onto the front of the channel. So that things remain composable, even iterators which only support one action have grown a next interface, and I think this development is a good thing.

PYK 2016-10-08: In order to accomodate these new iterator command interfaces, tools such as ycl iter async cat have been modified to accept iterator command prefixes instead of simple iterator command names.


mpr 2016-09-01:

I wrote a little thing like this: Controlling iteration in the loop body. I am using it to parse arguments to a command, as explained on the page.


PYK 2017-08-15 I have recently switched from the protocol presented on this page to one in which the iterator returns with -code break to signal the end of the iteration, which is also seen in Controlling iteration in the loop body, by mpr.

One advantage of this strategy is that it isn't necessary to delete the iterator command, which it may proceed to return another sequence of values. In the caller the code looks something like this:

while 1 {
    set value [someiter]
}

The reader must understand that someiter also functions as break, but it's concise and convenient. To call the iterator when not in a loop:

try {
    set value [someiter]
} on break {
    respond to end of iteration
}