**Introduction** [NEM] 2009-08-07: An obvious use of the new [coroutine] feature in Tcl 8.6 is to aid in the construction of on-demand loops or ''iterators''. The basic process is fairly straight-forward, and some example code is show dotted around this wiki. For example, to iterate over the integers in a range we can write a ''generator'' procedure as follows: ====== package require Tcl 8.6 proc range {n m} { for {set i $n} {$i <= $m} {incr i} { yield $i } } ====== We can then create a coroutine to wrap this procedure and then generate successive values: ====== set i [coroutine it range 1 10] while {[llength $i]} { puts $i set i [it] } ====== This will print the numbers from 1..10. While fairly simple to implement, it doesn't really seem very elegant, with a bit of boilerplate for constructing and naming the coroutine, checking whether it's still valid and then fetching the next value. It's also a little clumsy to handle [break] or [continue], as you then have to add more boilerplate to ensure the coroutine is deleted (in case of early termination), or that the next value is still fetched (in case of continue). As this is Tcl, however, we can wrap all this boilerplate into a nice reusable custom [control structure] that works like the usual [foreach] command: ====== proc iterate {varName generator body} { upvar 1 $varName item set it ::iterator[incr ::iterate] set item [coroutine $it {*}$generator] while {[llength $item]} { uplevel 1 $body set item [$it] } } ====== This handles the basic boilerplate, and we can now write our loop more pleasingly as: ====== iterate i {range 1 10} { puts $i } ====== This works for the basic case, but a [break] will still leave the coroutine hanging around, while a [continue] will cause the loop to hang completely! To solve these problems, we can add a bit more error-handling code to the iterate procedure: ====== proc iterate {varName generator body} { upvar 1 $varName item set it ::iterator[incr ::iterate] set item [coroutine $it {*}$generator] try { while {[llength $item]} { try { uplevel 1 $body } on continue {} { # ignore - but ensure next value is fetched } on return {result options} { # increment the -level to remove this proc from the stack dict incr options -level return -options $options $result } # Fetch the next item set item [$it] } } finally { # Ensure the coroutine is cleaned up if {[llength [info commands $it]]} { rename $it "" } } return } ====== So, let's now look at some other applications of this iteration construct: **Iterating over Data Structures** We can iterate over data structures, such as binary trees (using the code at the bottom of [Algebraic Types] to define the tree): ====== datatype define tree = Empty | Leaf val | Branch left right proc traverse tree { datatype match $tree { case [Empty] -> { return } case [Leaf v] -> { yield $v } case [Branch l r] -> { traverse $l; traverse $r } } } set tree [Branch [Branch [Leaf 1] [Branch [Leaf 2] [Empty]]] [Leaf 3]] iterate x [list traverse $tree] { puts "x = $x" } ====== **Iterating over file contents** We can iterate over the lines in a file easily enough too: ====== proc lines file { set in [open $file r] while {[gets $in line] >= 0} { yield $line } close $in } iterate line {lines /etc/passwd} { puts "[incr i]: $line" } ====== **Iterating over Database Result Sets** We can iterate over the rows in a [tdbc] result set using the ''nextdict'' method (providing rougly equivalent functionality to the built-in internal iterators that tdbc provides): ====== proc rows rs { while {[$rs nextdict row]} { yield $row } $rs close } set resultset [$db ...] ;# execute some query iterate row [list rows $resultset] { puts [dict get $row name] puts [dict get $row age] } ====== **Networking** We can also use this to simplify iterating over results of remote queries using various network protocols. For example, fetch a list of articles from a [nntp] server: ====== proc headers {nntp group} { foreach artid [$nntp newnews $group] { yield [$nntp head $artid] } $nntp close } iterate h [list headers $conn comp.lang.tcl] { puts [join $h \n] puts ---- } ====== **Issues** The advantages of coroutines and a custom control construct here should be clear: both the code that uses an iterator, and the code that implements it, can be very clear and simple. The general-purpose ''iterate'' control structure takes care of all the exception handling, coroutine generation, and cleanup. Indeed, it seems that careful use of coroutines can probably eliminate the need for writing several classes of custom control structure, as the basic control code can be packaged up and then reused in different situations. However, there is still a major problem with this approach: resource cleanup. While the ''iterate'' procedure can handle early termination and ensure the coroutine is destroyed, it cannot know whether the coroutine procedure itself has allocated any resources: such as opening new channels or database connections. For instance, if I write the following code, I will leak a channel: ====== iterate line {lines /some/big/file} { break } ====== The reason is that the ''lines'' generator opens a file channel for the target file, but is then immediately destroyed by the early termination of the iterator: it gets no notification that this is about to happen and so cannot cleanup correctly. Writing the ''lines'' generator as a traditional custom control construct would solve this problem, as it can then catch the [break] exception and cleanup as required. However, we then are back at duplicating all of the exception handling boilerplate in each custom iterator. I'm not sure of the best way to solve this problem. One approach would be to recommend that generators do not allocate resources, but instead always have them passed to them: ====== proc lines chan { while {[gets $chan line] >= 0} { yield $line } } set f [open /some/big/file] try { iterate line [list lines $f] { puts $line } } finally { close $f } ====== This is almost as ugly as the original code, but we can also clean this up somewhat with [using]: ====== using f [open /some/big/file] { iterate line [list lines $f] { puts $line } } ====== Another approach would be to somehow notify the coroutine that it is being terminated. This could be done via a return from the [yield] command. For example, changing the ''finally'' clause of the iterate implementation to be: ====== ... finally { # Ensure the coroutine is cleaned up if {[llength [info commands $it]]} { #rename $it "" $it return } } ... ====== And then changing the implementation of each generator procedure to something like: ====== proc lines file { set in [open $file] try { while {[gets $in line] >= 0} { eval [yield $line] } } finally { close $in } } ====== In this approach, we pass a ''return'' command back to the generator, causing it to exit. The ''finally'' block in the generator then takes care of ensuring the channel is closed. Such an approach could work, but seems fragile. A further alternative option would be to add a command delete trace to the coroutine command and then let the generator run some sort of destructor on exit. For example: ====== proc eventually args { trace add command [info coroutine] delete [lambda args $args] } proc lambda {params body} { list ::apply [list $params $body] } proc lines file { set in [open $file] eventually close $in while {[gets $in line] >= 0} { yield $line } } iterate line {lines /etc/passwd} { puts $line; break } ====== I think this option looks the best to me: it's robust and simple and easy to work with. It also seems somewhat nicer than explicit [try]/[catch] for coding complex cleanup requirements -- just follow each resource acquisition with an eventual destructor. The other issue with the construct at present is generality. It would be useful to extend the ''iterate'' procedure to handle multiple variable names, and multiple generators, much like [foreach]. I leave that as an exercise... A minor stylistic point is the current syntax for generators. The generators specified so far all are meant to be called as a coroutine. This means that they then have to be braced or [list]-quoted when passed to ''iterate'', which is a little ugly. An alternative would be to make them work more like combinators: they return a generator. To illustrate: ====== rename lines _lines proc lines file { list _lines $file } iterate line [lines /etc/passwd] { ... } ====== This may seem a minor stylistic point, and it is, but little touches like that can make usage much more pleasant! ---- [NEM] 2009-09-04: I've created a ''[generator]s'' package that expands on the ideas in this page, and polishes them into a decent package. Available from [http://www.cs.nott.ac.uk/~nem/tcl] with a manual page at [http://www.cs.nott.ac.uk/~nem/tcl/generator.html]. The ''iterate'' command has now been named ''generator foreach'' and fully supports the built-in [foreach] command syntax, including looping over multiple generators and extracting multiple elements at each iteration. Some of the examples from the manpage: ====== # infinite lists generator define nats {} { while 1 { generator yield [incr nat] } } # ranges generator define range {n m} { for {set i $n} {$i <= $m} {incr i} { generator yield $i } } # cleanup via "finally" generator defines lines file { set in [open $file] generator finally close $in while {[gets $in line] >= 0} { generator yield $line } } # Sum the numbers in a file puts "sum = [generator sum [lines expenses.txt]]" # Convert to/from lists/dicts/strings generator foreach ch [generator from string "Hello, World!"] { puts $ch } # Square numbers proc square x { expr {$x**2} } puts [generator takeList 20 [generator map square [nats]] # Factorial proc fac n { generator product [range 1 $n] } # Fibonacci numbers via iteration proc fst pair { lindex $pair 0 } proc nextFib prev { list [fst $prev] [expr [join $prev +]] } proc fibs {} { generator map fst [generator iterate nextFib {0 1}] } ====== <> Control Structure | Data Structure