like python generator

iu2 2007-02-01 Inspired by What kinds of variable names can be used in Tcl and Looking at LISP's SERIES extension I tried to write a generator with Python's syntax.

The generator creates values upon demand. For example, given (in python)

 >>> def inc1():
        c = 0
        while 1:
                c += 1
                yield c

                yield c
 >>> gen = inc1()
 >>> gen.next()
 1
 >>> gen.next()
 2
 >>> gen.next()
 3
 >>> gen.next()
 4
 >>> 
 >>> 

More information can be found in generators and Python's doc at [L1 ] and the itertools library [L2 ]

The syntax is simple and powerful. The mechanism involves saving the context after the yield command. Saving context in tcl? --- using threads....

So it occurred to me that if I use threads I can imitate Python's generator. This is the method: I write a generator proc just like in Python, using yield. Then I call a special function generate that converts my proc to a functioning generator by launching a thread, and establishing communication with the calling thread upon yield occurrences.

This is the generate proc. iu2 addition Important: I used vwait as a means of synchronization between the threads, which is not robust, and I can think of a scenario of a deadlock. generate should be implemented by condition variables, instead, or even better, message queues (but I don't think tcl supports that...)

  proc generate {var proc args} {
    # creating a thread
    uplevel #0 set thread$var [thread::create {thread::wait}]
    upvar #0 thread$var th
    thread::send $th "set th [thread::id]" ;# notifying about caller thread
    tsv::set gen $var 0
    uplevel #0 set wait$var 0
    upvar #0 wait$var wait

    # change the desired proc so that 'yield' causes a generation of a value and a pause
    set b [info body $proc]
    regsub -all -linestop {\yyield\y\s+(.+)} [info body $proc] [format {tsv::set gen %s [subst \1]; thread::send -async $::th {set ::%s 1}; vwait sleep} $var wait$var] b
    append b "\nthread::send -async \$::th {set ::wait$var 2}"
    thread::send $th [list proc $proc [info args $proc] $b]

    # create a proc for making the generator advance - communication with the thread
    proc $var {} [format {
      if {$::%s == 0} {thread::send -async %s "%s %s"} else {
      thread::send -async %s {set sleep 1}}
      vwait ::%s
      if {$::%s == 2} {error "Generator exausted"}
      return [tsv::get gen %s]} wait$var $th $proc $args $th wait$var wait$var $var]
  }

Lars H: There's quite a bit of quoting hell in this generate. I'd suggest instead of the dynamically created $var proc having one "generator_get" which takes the necessary arguments as arguments and in generate create the $var command as an interp alias to this generator_get. yield should probably be a proper command as well.

An example of a generator function:

  proc list3 {a b c} {
    for {set x 0} {$x < $a} {incr x} {
      for {set y 0} {$y < $b} {incr y} {
        for {set z 0} {$z < $c} {incr z} {
          yield [list $x $y $z]
        }
      }
    }
  }

Now I convert list3 to a generator

 generate next list3 2 2 3

And here is the usage and result for next

  for {set i 0} {$i < 20} {incr i} {
    if {[catch {
      puts [next]
    }]} break
  }

result

 0 0 0
 0 0 1
 0 0 2
 0 1 0
 0 1 1
 0 1 2
 1 0 0
 1 0 1
 1 0 2
 1 1 0
 1 1 1
 1 1 2

Another example

  proc plus1 {} {
    set i 0
    while 1 {
      yield [incr i]
    }
  }

And the usage:

  generate p1 plus1
  while 1 {
    gets stdin s
    catch {eval $s}
    puts [p1]
  }

The catch {eval $s} is to be able to write exit and leave the program.

One more:

  proc manyyields {} {
    yield 1
    yield {Hello world!}
    yield {1 2 3 4 5}
  }

  generate my manyyields
  while 1 {
    puts [my]
  }

And the result

 1
 Hello world!
 1 2 3 4 5 
 Generator exausted
    while executing
 "error "Generator exausted""
    (procedure "my" line 5)
    invoked from within
 "my"
    ("while" body line 2)
    invoked from within
 "while 1 {
   puts [my]
 }"

schlenk Why not use simple slave interpreters to save the context (even namespaces or dynamically rewritten procs with default arguments might be enough)? No need to involve threads.

iu2 I simply couldn't figure out how to do this with the yield syntax. I will be glad to learn how to do it without threads... I can point out two advantages with the current method:

  • Using yield is an extremely simple and intuitive way to describe generators
  • The thread implementation is relatively short - 24 lines (might be a little longer, replacing the vwait commands - I added a comment above that vwait is not a safe way to synchronize the threads...)

There are of course disadvantages, like not having control over context switches (the exact time when the next item fetch will complete), to name one...

NEM I don't think it's possible to save the execution context of a slave interpreter without using threads, as there is no method to capture the current continuation. You can suspend a thread at the appropriate point, however. Generators/co-routines are neat. The usual Tcl way of accomplishing the same is to instead pass a script/callback to the generating function:

 proc list3 {a b c yield} {
   for {set x 0} {$x < $a} {incr x} {
     for {set y 0} {$y < $b} {incr y} {
       for {set z 0} {$z < $c} {incr z} {
         invoke $yield [list $x $y $z]
       }
     }
   }
 }
 proc invoke {cmd args} { uplevel #0 $cmd $args }
 list3 2 2 3 puts

Apart from the difference in structure, the Python method works better if you want to combine elements from several generators at once. For instance:

 while 1 {
     set el1 [generator1]
     set el2 [generator2]
     puts [list $el1 $el2]
 }

Streams are useful for this case. See also a generic collection traversal interface.

iu2 The C library functions setjmp and longjmp can be used to save the context in C. I wonder if they could be incorporated into tcl commands....

DKF: No. Please don't go there. Any code that uses longjmp() has the Wizard Factor turned up to 11. (Tcl's memory management is designed under the assumption that nothing does a longjmp()...)


Lars H: I don't know much about threads, but using [thread::send -async] for the communication seems odd to me. Conceptually, a generator thread should just run its little generating script and then exit (pausing only if the mode of communicating the data requires waiting for the previous item to have been read), so maybe this is a use case for threads where one should not call [thread::wait]? (The thread documentation suggests that such cases are rare, so it might be of interest for that reason alone.)

iu2 Thanks for your comments, Lars H. You're right about the quoting hell and the tips you suggested are good. There is one thing, however, I don't know how to implement, and I will be glad to hear suggestions: How to do the "pausing only if the mode of communicating the data requires waiting for the previous item to have been read". There are no message queues in tcl threads' api, and the clumsy waiting scheme and async calls where just that - a sort of implementing a message queue. A chan maybe? BTW, the {thread::wait} is unnecessary, thread::create will wait anyway when no script is supplied...

Lars H: As I read the docs, [thread::send] puts things in the event queue, so that will probably serve as message queue as long as there is a clear client-server relationship (which however isn't the case here). As for synchronization without entering the event loop, a mutex-condition-combination looks like it should do the trick. The complete script for the generator thread has the structure

  thread::mutex lock $mutex
  # <Loop for generating values and yielding them>
  thread::mutex unlock $mutex

To yield a value $val means to

  tsv::set generators $genid $val
  thread::cond wait $cond $mutex

When the consumer thread want to get a value it goes

  thread::mutex lock $mutex
  if {[thread::exists $genid]} then {
      tsv::get generators $genid val
      thread::mutex unlock $mutex
      thread::cond notify $cond
  } else {
      # The generator thread has finished; 
      thread::mutex unlock $mutex
      set val "%%EndOfData" ; # or whatever
  }

This is probably not completely robust (I think there should be some initialization synchronization to ensure the generator thread grabs the mutex first), but it demonstrates the main point.

NEM: See also A Thread-Safe Message Queue.