'''Iterators''' ** See Also ** [Iterator Protocol]: [a generic collection traversal interface]: [NEM] 2006-01-16: another take on how to enumerate the elements of a collection. ** Description ** CLU's iterators correspond to what [Python] and other languages call [generators]. CLU iterators were modeled on Alphard's generators [http://www.pmg.lcs.mit.edu/CLU.html]. IPL-v generators and [Lisp] mapping functions are related to Alphard's generators. Can someone explain the idea of iterator/generator here? It seems to me that the idea is that rather than explicitly listing an entire range of values (that a variable can take within a loop, etc.) one lists the beginning and ending value of a range plus the incremental factor and the language generates the value as needed. How is this different than using a for loop though? An iterator is ''richer'' than a mere index. An iterator carries the ''container'' reference and the element's position with it. You can use 'em without for loops. The iterator controls access to the element. -- [Todd Coram] [RS]: So would the "loop variable" in [[`[foreach]`] come closer to the iterator concept? [Todd Coram]: Yes, but iterators are most useful when you have different ''container'' types (or classes). [foreach] iterates over lists. A useful iterator interface deals with strings, lists, trees, vectors, etc. You need OO, ''interfaces'' or ''inherent'' polymorphism to take full advantage of iterators. Since Tcl primarily deals with lists and arrays, there is no great need for such polymorphic operations. Right? Now, [VFS] gives us a (loose) approximate of stream iterators which are very useful abstractions... Also, iterators/generators control what comes ''next'' in an iteration. [foreach] is predetermined: it has already been decided what element you will get on each iteration. If my container wants to return random values as I step through it, it can determine to do so at anytime. -- [Todd Coram] Here is a contrived example of an [Itcl based iterator]. [Eric Boudaillier]: I have begin to write a C extension that deals with iterator (or generator ?). This extension defines the '''iterate''' command, which works like [foreach], but takes the description of the ''thing'' to iterate instead of list. This description is a list containing the type of the iterator and the arguments. At this time, four types are defined: '''numeric''', '''list''', '''string''' and '''channel'''. Here is a little example: Common Tcl construction Same using [iterate] command for {set i 0} {$i<=$max} {incr i} { iterate i [list numeric 0 $max] { ... ... } } while {[gets $chan line] != -1} { iterate line [list channel $chan] { ... ... } } foreach c [split $string {}] { iterate c [list string $string] { ... ... } } foreach e [lreverse $list] { iterate e [list list $list end 0] { ... ... } } [[`iterate`] provides a cleaner syntax, and can be faster and consume less memory. And also, you can combine different type of iterator. Moreover, some iterator can be infinite: ====== # returns the number of lines in $chan. proc wc-l {chan} { iterate count [list numeric 1 {}] line [list channel $chan] {} return $count } ====== ---- [RS]: This looks like a great unification to me - however, if the list-bundling of the second argument could be given up, it would be even more pleasing to write and read: ====== iterate i -numeric "0 $max" {...} iterate line -channel $chan {...} iterate c -string $string {...} iterate e -list $list {...} iterate rev -list $list {end 0} {...} iterate count -numeric 0 {} line -channel $chan {} ====== Granted that the last argument is the body, a variable number of control args could be easily processed. The dash before the type indicator might set it off more clearly. However, I'm not sure whether all iterations possible with [for] or [foreach] can be replaced with this construct. -- [AvL]: surely not: [foreach] will be "nicer" for the easy cases, and [for] is much stronger: it allows you to modify the iteration-variable from the body - neither foreach, nor the proposed iterators would allow this. Python unfortunately lacks such a construct. ---- [Todd Coram]: By supplying a ''type'' to the iteration, I think you lose a great deal of power that typeless iteration gives you. C++ style iteration is what make generic programming such a nice capability. Typelessness allows me to define generic functions that don't care about type. How about an iterator command that generates an iterator for the supplied type? ====== set it1 [iterator -string "hello world how are you?"] set it2 [iterator -chan $chan1] set it3 [iterator -list {{hello world} {how are you}} ====== The generator iterator would be a proc: So, ====== proc wc-w {it} { set count 1 while {[$it] != {}} { incr count } return $count } wc-w $it1 # -- returns 5 wc-w $it2 # returns the number of words in the file wc-w $it3 # -- returns 2 ====== someone: And the iterator command would be a perfect usage of [Closures]! [Todd Coram]: See [Iterator using Closures] for an example of the above... ---- [Eric Boudaillier]: Iterator are not typed: just see the description list of iterators as a constructor. At the [C] level, I have a structure like [Tcl_ObjType]: ======c typedef struct { char *name; Tcl_IteratorCreateProc *createProc; Tcl_IteratorFreeProc *freeProc; Tcl_IteratorGetProc *getProc; Tcl_IteratorNextProc *nextProc; Tcl_IteratorIsFiniteProc *finiteProc; } Tcl_IteratorType; ====== [[`iterator`] just looks for the iterator type, and then call the iterator type functions. This implementation is simple, and make iterator lifetime only in [[iterator]] command. I have tried other implementation with a Tcl_Obj internal representation, but I did'nt know how to handle some cases: ====== set it1 [iterator string "Hello, World"] set it2 [iterator channel $chan] set it1copy $it1 set it2copy $it2 iterate c $it1 line $it2 { } # where is $it1copy ? at the start of the string ? # at least, $it2copy is at the same position of $it2. ====== Also, [[`iterator`] don't know the args to pass to Tcl_IteratorCreateProc, so this is difficult to break the description list, which can also take options: ====== iterate v [list numeric 0 9 -loop true] {e1 e2} [list list $list] { ... } ====== But of course, all these lists can be created by a command. ---- Iterators are big in [Ruby]. [escargo]: Generators are fundamental in [Icon Programming Language%|%Icon]. Generators need not be finite; they will return a new value if they are resumed after suspending and returning a value. ---- It would seem to me that the construct is maybe nice or fun in certain ways, but when I measure it against imo important criteria: * do I need to type less * is it more insightfull * does it add strong language possibilities not easily or at all achievable otherwise * is it more efficient * is there some esteatical or alternatively some computer structure related reason for it then I don't think it would score all too high neccessarily, which of course maybe is less important than that is sais something important in the real world, though I would say that is dangerous territory. About on of the examples/remarks, the ''foreach'' tcl command is not pre-determined, and fun enough to play around with: ====== set l {1 2 3} foreach i $l {puts "$i $l"; if {[llength $l] < 5} {lappend l 5}; set l [lreplace $l 3 3 [expr [lindex $l 3]+1] ]} ====== Gives: ======none 1 1 2 3 2 1 2 3 6 3 1 2 3 7 5 ====== I guess a informatics professor wouldn't like this... [NEM]: Not sure who wrote the above, but it is worth pointing out that the foreach command ''is'' pre-determined in that it captures the values of $l when it starts and doesn't reflect any subsequent changes to that variable. This should be the same for any Tcl command as $-substitution gets done before the command is called, and Tcl is pass-by-value rather than pass-by-reference. You can, of course, update l inside the loop as it is a mutable variable, but note that you still only get 3 iterations through foreach, regardless of the new value of l. ---- [NEM] 2005-09-20: Iterators (or [streams]) are certainly nice things to have. When I first saw [EB]'s suggestion for an iterate command, I thought it was a nice higher-order function, where the [[list numeric 0 $max]] etc were actual command fragments that generate the next iteration. However, this doesn't seem to be the case. So, here is a version of the iterate command that does just use normal Tcl commands for the iterators. Each iterator is a command that accepts subcommands (i.e. like an object or an [ensemble]). The subcommands for this interface are "empty" (returns true if there are no more elements) and "next" which returns the next value ''and a new iterator''. This last requirement may sound a little strange, but it allows to have purely-functional, ''persistent'' iterators that can be re-used as many times as you need. To demonstrate, here is the iterate command: ====== proc iterate {varname stream body} { upvar 1 $varname var while {![eval $stream empty]} { lassign [eval $stream next] var stream uplevel 1 $body } return $stream } ====== This is straight-forward enough. Now let's define some iterators. First, a simple one for lists: ====== proc literator {list method} { switch $method { empty { expr {[llength $list] == 0} } next { list [head $list] [list literator [tail $list]] } } } proc head list { lindex $list 0 } proc tail list { lrange $list 1 end } ====== Now, we can try this out: ====== set l {literator {1 2 3 4 5}} iterate x $l { puts "x = $x" } ====== This produces the expected output and returns an iterator to the empty list. Notice that this all looks very much like an ordinary stateful iterator. However, we can still re-use the original iterator or any of the intermediate ones, as there are no side-effects. In particular, you could pass the same iterator to a bunch of different procs without them interfering with each other. However, we don't have to limit ourselves to just side-effect free iterators. For instance, here I can wrap a channel as an iterator: ====== proc citerator {chan method} { switch $method { empty { eof $chan } next { list [gets $chan] [list citerator $chan] } close { close $chan } } } ====== And I can use that to iterate through lines of a file: ====== set chan [list citerator [open $file]] iterate line $chan { puts "line = $line" } eval $chan close ====== That iterator is, of course, not persistent -- once you have used an iterator it is not generally possible to restore it to its previous state (although you might be able to try with [tell] and [seek]). Finally, here is an iterator which can generate infinite sequences, based on the "iterate" function of Haskell (which, along with monads, inspired this implementation): ====== proc fiterator {func init method} { switch $method { empty { return 0 } next { list $init [list fiterator $func [eval $func $init]] } } } ====== Which I will use with a "take" command which itself returns an iterator to the first n items in another iterator: ====== proc take {n iter method} { switch $method { empty { expr {$n == 0} } next { lassign [eval $iter next] next iter list $next [list take [expr {$n-1}] $iter] } } } ====== And some uses: ====== proc succ x { incr x } set nats [list fiterator succ 1] # First 10 natural numbers: iterate n [list take 10 $nats] { puts "n = $n" } ====== Well, that's just some food for thought. You don't need state or closures to implement iterators, but a little bit of sugar goes a long way. In addition, stateless iterators are persistent pure values so you can use them over and over again. Also, as pure values, you don't have to worry about destroying them as they are reference counted by Tcl like every other value (and like other [TOOT] types, which are also very similar to these iterator type-commands). Add a bit of [memoizing] and you get some pretty useful lazy structures. A next step might be to define some useful iterator ''combinators'' which can combine different iterators in interesting ways, e.g. a lazy append can be defined: ====== proc iter_append {it1 it2 method} { switch $method { empty { expr {[eval $it1 empty] && [eval $it2 empty]} } next { if {[eval $it1 empty]} { return [eval $it2 next] } else { lassign [eval $it1 next] next it1 return [list $next [list iter_append $it1 $it2]] } } } } set a [list literator {1 2 3}] set b [list literator {4 5 6}] set both [list iter_append $a $b] iterate x $both { puts "x = $x" } ====== Which prints the values 1 to 6, but only calculates them as needed. I'll leave other combinators (such as other list operators) as an exercise. For the really adventurous, look into Haskell's List monad, which should be fairly easily implemented on top of this, giving a new framework for non-deterministic computations. That would probably simplify and speed up [Parser Combinators] (also based on monads) quite a bit. [NEM] 2005-09-20: A conversation with [Joe English] on the [Tcl chatroom] revealed that the implementation of iterators I describe above is actually pretty near to the implementation of ''anamorphisms'' or unfolds, as described in the paper ''Functional Programming with Bananas, Lenses, Envelopes, and Barbed Wire'' (Meijer, Fokkinga, Paterson [http://research.microsoft.com/~emeijer/Papers/fpca91.pdf]). Basically, where a [fold] (or ''catamorphism'', to give it its posh name) takes a list and applies a binary operator to successive pairs of the list (thus reducing, or folding, the list to a single value), an unfold takes a value and a function to generate a next value and produces a list (or sequence, more generally). To demonstrate here is an unfold function implemented in Tcl: ====== proc unfold {done func next seed} { if {[apply $done $seed]} { return [list] } else { return [list [apply $func $seed] \ [list unfold $done $func $next [apply $next $seed]]] } } ====== Here, the unfold function actually takes 4 arguments: a "done" function that tests the seed value to see if it is the last in the sequence (equivalent to our "empty" method in the above code); an arbitrary function which is applied to seed to generate the actual next element; a function to generate the next seed; and finally the initial seed itself. This is a generalisation of our iterator code, as it distinguishes between seeds used for iteration, and the actual elements of the resulting sequence. The first part of the code checks the "done" function to see if this is the last element, and if so returns an empty list. If not, then we apply the function to the seed to create this element ([[apply $func $seed]]) and then call ourselves recursively to generate the next element, using the "next" function to generate the next seed ([[unfold $done $func $next [[apply $next $seed]]]]). However, rather than immediately execute this recursive call, we instead quote it (using [list]) and return the two bundled up together; creating a lazy list in effect (Haskell does this by default). We then need a new "take" function that forces the first n elements of the lazy list to be evaluated and returned, which works by grabbing the head of the list and then evaluating the tail of the list: ====== proc take {n iter} { for {set i 0} {$i < $n} {incr i} { lappend ret [lindex $iter 0] set iter [eval [lindex $iter 1]] } return $ret } ====== This delayed evaluation of elements of the list also has the nice property of converting our recursive definition of unfold into a simple iteration without having to do anything too fancy. We can now use this unfold to produce our original iterators again in a new form. First, we need one last helper that applies a function with proper list quoting: ====== proc apply {func args} { uplevel #0 $func $args } ====== Now we can define our iterators. Starting with the natural numbers: ====== proc false x { return 0 } ;# never done, i.e. infinite sequence proc id x { return $x } ;# identity function, i.e. seed and element are identical proc succ x { incr x } ;# next seed is increment of previous set nats [unfold false id succ 1] take 10 $nats ;# Displays 1 .. 10 ====== Here is a version that returns all the lines from a channel as a lazy list: ====== proc lines chan { unfold eof gets id $chan } # e.g. take 20 [lines [open $file]] ====== Great stuff! <> Concept