'''Iterators'''


** See Also **

   [Iterator Protocol]:   

   [a generic collection traversal interface]:   [NEM] 2006-01-16: another take on how to enumerate the elements of a collection.


** Description **

CLU's iterators correspond to what [Python] and other
languages call [generators].  CLU iterators were modeled
on Alphard's generators
[http://www.pmg.lcs.mit.edu/CLU.html].

IPL-v generators and [Lisp] mapping functions are related
to Alphard's generators.

Can someone explain the idea of iterator/generator here?

It seems to me that the idea is that rather than explicitly listing an entire
range of values (that a variable can take within a loop, etc.) one lists the
beginning and ending value of a range plus the incremental factor and the
language generates the value as needed.  How is this different than using a for
loop though?

An iterator is ''richer'' than a mere index. An iterator carries the
''container'' reference and the element's position with it. You can use 'em
without for loops. The iterator controls access to the element.  -- [Todd
Coram]

[RS]: So would the "loop variable" in [[`[foreach]`] come closer to the
iterator concept?

[Todd Coram]: Yes, but iterators are most useful when you have different
''container'' types (or classes). [foreach] iterates over lists. A useful
iterator interface deals with strings, lists, trees,  vectors, etc. You need
OO, ''interfaces'' or ''inherent'' polymorphism to take full advantage of
iterators. Since Tcl primarily deals with lists and arrays, there is no great
need for such polymorphic operations. Right? Now, [VFS] gives us a (loose)
approximate of stream iterators which are very useful abstractions...

Also, iterators/generators control what comes ''next'' in an iteration.
[foreach] is predetermined: it has already been decided what element you will
get on each iteration. If my container wants to return random values as I step
through it, it can determine to do so at anytime. -- [Todd Coram]

Here is a contrived example of an [Itcl based iterator].

[Eric Boudaillier]: I have begin to write a C extension that deals with
iterator (or generator ?).  This extension defines the '''iterate''' command,
which works like [foreach], but takes the description of the ''thing'' to
iterate instead of list. This description is a list containing the type of the
iterator and the arguments. At this time, four types are defined:
'''numeric''', '''list''', '''string''' and '''channel'''.  Here is a little
example:

     Common Tcl construction                     Same using [iterate] command

  for {set i 0} {$i<=$max} {incr i} {        iterate i [list numeric 0 $max] {
     ...                                        ...
  }                                          }

  while {[gets $chan line] != -1} {          iterate line [list channel $chan] {
     ...                                        ...
  }                                          }

  foreach c [split $string {}] {             iterate c [list string $string] {
     ...                                        ...
  }                                          }

  foreach e [lreverse $list] {               iterate e [list list $list end 0] {
     ...                                        ...
  }                                          }

[[`iterate`] provides a cleaner syntax, and can be faster and consume less
memory. And also, you can combine different type of iterator.  Moreover, some
iterator can be infinite:

======
# returns the number of lines in $chan.
proc wc-l {chan} {
    iterate count [list numeric 1 {}] line [list channel $chan] {}
    return $count
}
======

----

[RS]: This looks like a great unification to me - however, if the list-bundling
of the second argument could be given up, it would be even more pleasing to
write and read:

======
iterate i    -numeric "0 $max"   {...}
iterate line -channel $chan      {...}
iterate c    -string $string     {...}
iterate e    -list $list         {...}
iterate rev  -list $list {end 0} {...}
iterate count -numeric 0 {} line -channel $chan {}
======

Granted that the last argument is the body, a variable number of control args
could be easily processed. The dash before the type indicator might set it off
more clearly. However, I'm not sure whether all iterations possible with [for]
or [foreach] can be replaced with this construct. -- [AvL]: surely not:
[foreach] will be "nicer" for the easy cases, and [for] is much stronger: it
allows you to modify the iteration-variable from the body - neither foreach,
nor the proposed iterators would allow this. Python unfortunately lacks such a
construct. 

----

[Todd Coram]: 

By supplying a ''type'' to the iteration, I think you lose a great deal of
power that typeless iteration gives you. C++ style iteration is what make
generic programming such a nice capability. Typelessness allows me to define
generic functions that don't care about type.  How about an iterator command
that generates an iterator for the supplied type?

======
set it1 [iterator -string "hello world how are you?"]
set it2 [iterator -chan $chan1]
set it3 [iterator -list {{hello world} {how are you}}
======

The generator iterator would be a proc: So,

======
proc wc-w {it} {
    set count 1
    while {[$it] != {}} {
        incr count
    }
    return $count
}

wc-w $it1 # -- returns 5
wc-w $it2 # returns the number of words in the file
wc-w $it3 # -- returns 2 
======

someone: And the iterator command would be a perfect usage of [Closures]!


[Todd Coram]: See [Iterator using Closures] for an example of the above...

----

[Eric Boudaillier]: Iterator are not typed: just see the description list of
iterators as a constructor. At the [C] level, I have a structure like
[Tcl_ObjType]:

======c
typedef struct {
    char                     *name;
    Tcl_IteratorCreateProc   *createProc;
    Tcl_IteratorFreeProc     *freeProc;
    Tcl_IteratorGetProc      *getProc;
    Tcl_IteratorNextProc     *nextProc;
    Tcl_IteratorIsFiniteProc *finiteProc;
} Tcl_IteratorType;
======

[[`iterator`] just looks for the iterator type, and then call the iterator type
functions. This implementation is simple, and make iterator lifetime only in
[[iterator]] command.  I have tried other implementation with a Tcl_Obj
internal representation, but I did'nt know how to handle some cases:

======
set it1  [iterator string "Hello, World"]
set it2  [iterator channel $chan]
set it1copy $it1
set it2copy $it2
iterate c $it1 line $it2 {
}
# where is $it1copy ? at the start of the string ?
# at least, $it2copy is at the same position of $it2.
======

Also, [[`iterator`] don't know the args to pass to Tcl_IteratorCreateProc, so
this is difficult to break the description list, which can also take options:

======
iterate v [list numeric 0 9 -loop true] {e1 e2} [list list $list] {
    ...
}
======

But of course, all these lists can be created by a command.

----

Iterators are big in [Ruby].

[escargo]: Generators are fundamental in [Icon Programming Language%|%Icon].
Generators need not be finite; they will return a new value if they are resumed
after suspending and returning a value.

----

It would seem to me that the construct is maybe nice or fun in certain ways,
but when I measure it against imo important criteria:

 * do I need to type less
 * is it more insightfull
 * does it add strong language possibilities not easily or at all achievable otherwise
 * is it more efficient
 * is there some esteatical or alternatively some computer structure related reason for it

then I don't think it would score all too high neccessarily, which of course
maybe is less important than that is sais something important in the real
world, though I would say that is dangerous territory.

About on of the examples/remarks, the ''foreach'' tcl command is not
pre-determined, and fun enough to play around with:

======
set l {1 2 3}
foreach i $l {puts "$i  $l"; if {[llength $l] < 5} {lappend l 5}; set l [lreplace $l 3 3 [expr [lindex $l 3]+1] ]}
======

Gives:

======none
1  1 2 3
2  1 2 3 6
3  1 2 3 7 5
======

I guess a informatics professor wouldn't like this...

[NEM]: Not sure who wrote the above, but it is worth pointing out that the
foreach command ''is'' pre-determined in that it captures the values of $l when
it starts and doesn't reflect any subsequent changes to that variable. This
should be the same for any Tcl command as $-substitution gets done before the
command is called, and Tcl is pass-by-value rather than pass-by-reference. You
can, of course, update l inside the loop as it is a mutable variable, but note
that you still only get 3 iterations through foreach, regardless of the new
value of l.

----

[NEM] 2005-09-20: Iterators (or [streams]) are certainly nice things to have.
When I first saw [EB]'s suggestion for an iterate command, I thought it was a
nice higher-order function, where the [[list numeric 0 $max]] etc were actual
command fragments that generate the next iteration. However, this doesn't seem
to be the case. So, here is a version of the iterate command that does just use
normal Tcl commands for the iterators. Each iterator is a command that accepts
subcommands (i.e. like an object or an [ensemble]). The subcommands for this
interface are "empty" (returns true if there are no more elements) and "next"
which returns the next value ''and a new iterator''. This last requirement may
sound a little strange, but it allows to have purely-functional, ''persistent''
iterators that can be re-used as many times as you need. To demonstrate, here
is the iterate command:

======
proc iterate {varname stream body} {
    upvar 1 $varname var
    while {![eval $stream empty]} {
        lassign [eval $stream next] var stream
        uplevel 1 $body
    }
    return $stream
}
======

This is straight-forward enough. Now let's define some iterators. First, a
simple one for lists:

======
proc literator {list method} {
    switch $method {
        empty   { expr {[llength $list] == 0} }
        next    { list [head $list] [list literator [tail $list]] }
    }
}
proc head list { lindex $list 0 }
proc tail list { lrange $list 1 end }
======

Now, we can try this out:

======
set l {literator {1 2 3 4 5}}
iterate x $l { puts "x = $x" }
======

This produces the expected output and returns an iterator to the empty list.
Notice that this all looks very much like an ordinary stateful iterator.
However, we can still re-use the original iterator or any of the intermediate
ones, as there are no side-effects. In particular, you could pass the same
iterator to a bunch of different procs without them interfering with each
other. However, we don't have to limit ourselves to just side-effect free
iterators. For instance, here I can wrap a channel as an iterator:

======
proc citerator {chan method} {
    switch $method {
        empty   { eof $chan }
        next    { list [gets $chan] [list citerator $chan] }
        close   { close $chan }
    }
}
======

And I can use that to iterate through lines of a file:

======
set chan [list citerator [open $file]]
iterate line $chan { puts "line = $line" }
eval $chan close
======

That iterator is, of course, not persistent -- once you have used an iterator
it is not generally possible to restore it to its previous state (although you
might be able to try with [tell] and [seek]). Finally, here is an iterator
which can generate infinite sequences, based on the "iterate" function of
Haskell (which, along with monads, inspired this implementation):

======
proc fiterator {func init method} {
    switch $method {
        empty   { return 0 }
        next    { list $init [list fiterator $func [eval $func $init]] }
    }
}
======

Which I will use with a "take" command which itself returns an iterator to the
first n items in another iterator:

======
proc take {n iter method} {
    switch $method {
        empty   { expr {$n == 0} }
        next    { lassign [eval $iter next] next iter
                            list $next [list take [expr {$n-1}] $iter] }
    }
}
======

And some uses:

======
proc succ x { incr x }
set nats [list fiterator succ 1]
# First 10 natural numbers:
iterate n [list take 10 $nats] { puts "n = $n" }
======

Well, that's just some food for thought. You don't need state or closures to
implement iterators, but a little bit of sugar goes a long way. In addition,
stateless iterators are persistent pure values so you can use them over and
over again. Also, as pure values, you don't have to worry about destroying them
as they are reference counted by Tcl like every other value (and like other
[TOOT] types, which are also very similar to these iterator type-commands). Add
a bit of [memoizing] and you get some pretty useful lazy structures. A next
step might be to define some useful iterator ''combinators'' which can combine
different iterators in interesting ways, e.g. a lazy append can be defined:

======
proc iter_append {it1 it2 method} {
    switch $method {
        empty   { expr {[eval $it1 empty] && [eval $it2 empty]} }
        next    { 
            if {[eval $it1 empty]} {
                return  [eval $it2 next]
            } else {
                lassign [eval $it1 next] next it1
                return [list $next [list iter_append $it1 $it2]]
            }
        }
    }
}
set a [list literator {1 2 3}]
set b [list literator {4 5 6}]
set both [list iter_append $a $b]
iterate x $both { puts "x = $x" }
======

Which prints the values 1 to 6, but only calculates them as needed. I'll leave
other combinators (such as other list operators) as an exercise. For the really
adventurous, look into Haskell's List monad, which should be fairly easily
implemented on top of this, giving a new framework for non-deterministic
computations. That would probably simplify and speed up [Parser Combinators]
(also based on monads) quite a bit.

[NEM] 2005-09-20:  A conversation with [Joe English] on the [Tcl chatroom]
revealed that the implementation of iterators I describe above is actually
pretty near to the implementation of ''anamorphisms'' or unfolds, as described
in the paper ''Functional Programming with Bananas, Lenses, Envelopes, and
Barbed Wire'' (Meijer, Fokkinga, Paterson
[http://research.microsoft.com/~emeijer/Papers/fpca91.pdf]).  Basically, where
a [fold] (or ''catamorphism'', to give it its posh name) takes a list and
applies a binary operator to successive pairs of the list (thus reducing, or
folding, the list to a single value), an unfold takes a value and a function to
generate a next value and produces a list (or sequence, more generally). To
demonstrate here is an unfold function implemented in Tcl:

======
proc unfold {done func next seed} {
    if {[apply $done $seed]} {
        return [list]
    } else {
        return [list [apply $func $seed] \
                     [list unfold $done $func $next [apply $next $seed]]]
    }
}
======

Here, the unfold function actually takes 4 arguments: a "done" function that
tests the seed value to see if it is the last in the sequence (equivalent to
our "empty" method in the above code); an arbitrary function which is applied
to seed to generate the actual next element; a function to generate the next
seed; and finally the initial seed itself. This is a generalisation of our
iterator code, as it distinguishes between seeds used for iteration, and the
actual elements of the resulting sequence. The first part of the code checks
the "done" function to see if this is the last element, and if so returns an
empty list. If not, then we apply the function to the seed to create this
element ([[apply $func $seed]]) and then call ourselves recursively to generate
the next element, using the "next" function to generate the next seed ([[unfold
$done $func $next [[apply $next $seed]]]]). However, rather than immediately
execute this recursive call, we instead quote it (using [list]) and return the
two bundled up together; creating a lazy list in effect (Haskell does this by
default). We then need a new "take" function that forces the first n elements
of the lazy list to be evaluated and returned, which works by grabbing the head
of the list and then evaluating the tail of the list:

======
proc take {n iter} {
    for {set i 0} {$i < $n} {incr i} {
        lappend ret [lindex $iter 0]
        set iter [eval [lindex $iter 1]]
    }
    return $ret
}
======

This delayed evaluation of elements of the list also has the nice property of
converting our recursive definition of unfold into a simple iteration without
having to do anything too fancy. We can now use this unfold to produce our
original iterators again in a new form. First, we need one last helper that
applies a function with proper list quoting:

======
proc apply {func args} {
    uplevel #0 $func $args
}
======

Now we can define our iterators. Starting with the natural numbers:

======
proc false x { return 0  } ;# never done, i.e. infinite sequence
proc id    x { return $x } ;# identity function, i.e. seed and element are identical
proc succ  x { incr x    } ;# next seed is increment of previous
set  nats [unfold false id succ 1]
take 10 $nats ;# Displays 1 .. 10
======

Here is a version that returns all the lines from a channel as a lazy list:

======
proc lines chan { unfold eof gets id $chan }
# e.g. take 20 [lines [open $file]]
======

Great stuff!

<<categories>> Concept