exit in slave interpreter

Handling the [exit] command in a slave interpreter can be a bit tricky. To work through the complications, consider the following silly Tk example, Take 1:

# FILE: clone.tcl
package require Tk
proc clone {} {
    set slave [interp create]
    load {} Tk $slave
    $slave alias clone clone
    $slave eval {
        pack [button .clone -text clone -command clone]
        pack [button .exit -text exit -command [list exit 0]]
    }
}
# hide master's root window
wm withdraw .
# Create the original clone
clone

Evaluate clone.tcl with wish and you will see a window with two buttons, clone and exit. Click clone and another identical window, a clone, will appear. The impression is an application which can clone itself, and each of those clones can clone itself, and so on...

But, click exit in any of the clones and they all disappear. That's because [exit] is an application exit command, not just an interpreter exit command. If that's what was intended, great, but in this case we wanted each clone's exit button to destroy only that clone. We want the whole application to [exit] only after all the clones have been destroyed. Take 2:

# FILE: clone.tcl
package require Tk
proc slaveExit {slave {code 0}} {
    interp delete $slave
    if {[llength [interp slaves]] == 0} {exit $code}
}
proc clone {} {
    set slave [interp create]
    load {} Tk $slave
    $slave alias clone clone
    $slave alias exit slaveExit $slave
    $slave eval {
        bindtags . [linsert [bindtags .] end cloneMainWindow]
        bind cloneMainWindow <Destroy> [list exit 0]
        pack [button .clone -text clone -command clone]
        pack [button .exit -text exit -command [list exit 0]]
    }
}
wm withdraw .
clone

Now, [exit] in each clone is aliased to [slaveExit] in the master. [slaveExit] destroys the clone's interpreter. If there are no more slave interpreters, there are no more clones, so the application exits. Note also the explicit binding on the <Destroy> event on the clone's root window. Without that binding, the window manager could destroy the clone's root window without an [exit] from the clone's interpreter. The clone's interpreter would never be destroyed, and the application would never exit. Take 2 works fine, for this simple example.

Now let's try to extend the simple example into a more generally useful cloning facility. Let clone.tcl accept another Tk script as an argument, and add a cloning button to it. Take 3:

# FILE: app0.tcl
pack [button .exit -text exit -command [list exit 0]]

# FILE: clone.tcl
package require Tk
proc slaveExit {slave {code 0}} {
    interp delete $slave
    if {[llength [interp slaves]] == 0} {exit $code}
}
proc clone {argv0 argv} {
    set slave [interp create]
    $slave eval [list set argv0 $argv0]
    $slave eval [list set argv $argv]
    $slave eval {set argc [llength $argv]}
    load {} Tk $slave
    $slave alias clone clone $argv0 $argv
    $slave alias exit slaveExit $slave
    $slave eval {
        bindtags . [linsert [bindtags .] end cloneMainWindow]
        bind cloneMainWindow <Destroy> [list exit 0]
        frame .clone
        pack [button .clone.btn -text clone -command clone] -in .clone
        pack .clone -side top
    }
    $slave eval [list source $argv0]
}
if {!$argc} {
    return -code error "\nusage: clone.tcl appScript ?arg ...?"
}
wm withdraw .
clone [lindex $argv 0] [lrange $argv 1 end]

Now, wish clone.tcl app0.tcl gives the same thing as before, but the cloning facility has been factored out of the application being cloned. With that factorization, we can examine how well this cloning facility handles several different application scripts.

First consider how the cloning facility handles this app:

# FILE: app1.tcl
# Use ownership of a socket to allow only one copy of this app
proc no-op args {}
if {[catch {socket -server no-op 50000} sock]} {exit 1}
pack [button .exit -text exit -command [list exit 0]]

This app has been written so that only one copy of it may run on a host machine. When launched with wish app1.tcl, it works fine. It makes little sense to wrap such an app with our app cloner, but let's see what happens. wish clone.tcl app1.tcl starts up one copy of the app with the clone button attached. Then, when clone is clicked, up pops an error dialog with the stack trace:

 attempt to call eval in deleted interpreter
     while executing
 "list exit 0"
     invoked from within
 "button .exit -text exit -command [list exit 0]"
     invoked from within
 "pack [button .exit -text exit -command [list exit 0]]
 "
     (file "app1.tcl" line 5)
     invoked from within
 "source app1.tcl"
     invoked from within
 "$slave eval [list source $argv0]"
     (procedure "clone" line 16)
     invoked from within
 "clone"
     invoked from within
 ".clone.btn invoke"
     ("uplevel" body line 1)
     invoked from within
 "uplevel #0 [list $w invoke]"
     (procedure "tkButtonUp" line 7)
     invoked from within
 "tkButtonUp .clone.btn
 "
     (command bound to event)

So what happened? The app was written to be evaluated directly by wish. In that case, no commands following the evaluation of [exit] will be evaluated. That's not true in the slave interpreters of our app cloner. [exit] is an alias to [slaveExit] in the master interpreter, and when that completes, evaluation of app1.tcl tries to continue. It fails immediately because the interpreter no longer exists. The same problem will arise in any app which depends on no commands following [exit] being evaluated.

How can this be fixed? We want [exit] to be the last command evaluated. We might try to have our alias for [exit] throw an error, so evaluation of the next command would not occur; the stack would unwind instead. There are two problems with that solution. First, the stack would eventually unwind to our [$slave eval] where we would need to catch the error. Placing a [catch] around our [$slave eval], though, would mask other errors we might be interested in. The second problem is more serious. Although we can have [slaveExit] throw an error, that will only unwind the stack until the first enclosing [catch], possibly still in the deleted interpreter. Then evaluation will continue with the next command, again causing the same error we're trying to solve.

So, throwing an error is wrong. Not throwing an error is also wrong. The surprising solution is to not delete the interpreter after all, at least not yet! Take 4:

# FILE: clone.tcl
package require Tcl 8          ;# We use [namespace] now.
package require Tk
proc slaveExit {slave {code 0}} {
    interp delete $slave
    if {[llength [interp slaves]] == 0} {exit $code}
}

# Return list of commands in $slave in namespace $ns
proc commandsIn {slave {ns {}}} {
    set cmds [$slave eval [list info commands ${ns}::*]]
    foreach child [$slave eval [list namespace children $ns]] {
       set cmds [concat $cmds [commandsIn $slave $child]]
    }
    return $cmds
}

proc slaveDisable {slave {code 0}} {
    # Prevent re-entrancy problems.  Don't want deletion callbacks of
    # commands we redefine below (notably [.]) to recursively call us.
    $slave eval {proc exit args {}}
    # Convert all commands in $slave to no-ops.
    foreach cmd [commandsIn $slave] {
        if {[string match ::proc $cmd]} {continue}
        $slave eval [list proc $cmd args [list]]
    }
    $slave eval [list proc ::proc args [list]]
    # Delete $slave later.  Make sure $slave is not deleted
    # within the current [update], if any.
    after idle [list after 1 [list slaveExit $slave $code]]
}
proc clone {argv0 argv} {
    set slave [interp create]
    $slave eval [list set argv0 $argv0]
    $slave eval [list set argv $argv]
    $slave eval {set argc [llength $argv]}
    load {} Tk $slave
    $slave alias clone clone $argv0 $argv
    $slave alias exit slaveDisable $slave
    $slave eval {
        bindtags . [linsert [bindtags .] end cloneMainWindow]
        bind cloneMainWindow <Destroy> [list exit 0]
        frame .clone
        pack [button .clone.btn -text clone -command clone] -in .clone
        pack .clone -side top
    }
    $slave eval [list source $argv0]
}
if {!$argc} {
    return -code error "\nusage: clone.tcl appScript ?arg ...?"
}
wm withdraw .
clone [lindex $argv 0] [lrange $argv 1 end]

The command [exit] in $slave is now aliased to [slaveDisable]. [slaveDisable] does not delete $slave. It registers an idle callback to delete $slave later. Then it returns control to $slave. Since we are not able to force $slave to abort its evaluation after evaluating [exit], we do the next best thing. $slave can continue to evaluate commands, but we have redefined all of its commands to be no-ops! $slave will evaluate no-ops until the current callback from the event loop completes. Then when the event loop regains control, it will (eventually) call the idle callback which deletes $slave. Now, wish clone.tcl app1.tcl works just as we would like it to.

Problem solved? Not quite. As we just described, $slave will continue to evaluate no-ops until the current callback completes. What if our redefinition of all commands to no-ops has the side effect of creating an infinite loop? Consider another app:

more to come... DGP


Donald Porter