Version 10 of Script termination: results and control

Updated 2013-12-10 23:38:28 by PeterLewerin

PL: I'm just starting up this page. It's intended to explain the details of terminating scripts, as regards termination categories (ok, error, return, break, continue), result values (valid data vs error messages), and handling exceptional termination.

Basic terminology

In Tcl documentation, terms like return and exception are sometimes used more or less interchangeably for when control is transferred from a command to another command in the calling chain that leads up to the first command. The term exceptional return is used for every kind of return except normal return. The following definitions aren't Tcl-official but do make sense and at least don't contradict the official terminology:

termination
any way to end a script, such as by reaching the end, executing a terminating command, or having an error occur.
terminating command
any one of the commands return, error, throw, unknown (typically not invoked directly), break, and continue (see below).
return
specifically, to terminate a script by executing a terminating command, also in the phrase "return control to ...".
(raise an) exception
to terminate a script with a return code equal to 1 (because this is the definition that comes closest to the definition used in computer science and in other exception-enabled programming languages).
exception-raising command
any one of the commands error, throw, and unknown (typically not invoked directly); return when invoked as return -code error.

Kinds of script termination

KindNumeric codeSymbolic codeReturns toResult value treated by convention as...AKA
Normal0okCallerValid dataNormal return
Error1errorHandlerError messageError return, error exception, exception
Return2returnCaller's callerValid data (see below)Return exception?
Break3breakSpecial, see belowNo convention existsBreak exception
Continue4continueSpecial, see belowNo convention existsContinue exception

"By convention" means just that: while most if not all Tcl programs handle the result values in this manner, there is nothing to actually enforce this usage or to stop you from having an error message at normal termination or valid data at error termination. It would just be a case of obfuscated Tcl.

How a script is terminated

Consider a script with three commands (the comment after each command shows the command's result value):

commandA ;# -> resultA
commandB ;# -> resultB
commandC ;# -> resultC

By reaching the end

The evaluation goes like this: commandA is invoked, its side-effects (if any) happen, and its result (resultA) is produced and discarded (because it isn't assigned to anything). The same thing happens to commandB and commandC, but as the script terminates after commandC, resultC isn't discarded: instead it becomes the result of the evaluation of the script. This is a normal termination.

By terminating command

Now, if commandB is a terminating command, evaluation goes differently. Commands commandA and commandB are invoked and their side-effects happen, but commandC doesn't get invoked at all and its side-effect doesn't happen. The result of commandB becomes the result of the evaluation of the script instead. This can by any one of the five kinds of termination.

By error

If instead an error occurs during execution of commandB, the situation is similar to the last paragraph. Commands commandA and commandB are invoked, but only commandA is completed successfully. The side-effect of commandB may or may not happen, or happen partially. The intended result value of commandB isn't produced. Instead, the error produces an error message which becomes the result of the evaluation of the script. This is always error termination.

The call stack

Now, I've talked about the script having a result value, but where does it go? Every script is evaluated in a stack frame, a level in a multi-layered data structure, the call stack, which is managed by the Tcl interpreter and mostly invisible to the programmer. The executing command always resides in the most recently added frame in the call stack. If that command calls another command, a new stack frame is added to the call stack, and the called command can be executed in that frame. When a command terminates by normal termination, the stack is unwound, i.e. the command's frame is removed and the command in the previous stack frame becomes active again ("control returns to the caller"). As part of the procedure to switch stack frames, the interpreter transfers the terminating command's result value to the caller's frame where it might be stored in a variable (if not, it is discarded).

Up or down?

A generic stack is usually visualized as growing upwards from an initially empty base. The call stack, however, is traditionally implemented growing downwards in memory (hence the names of the commands uplevel and upvar, which operate on previous stack frames), and always having one starting frame (the "top-level" in Tcl documentation). I've compromised in the following by depicting stacks as growing horizontally from left to right.

Normal termination

If commandA, with the variables a, b, and c, executes set b [commandB], and commandB produces the result value "abc" the call stack might look something like these three steps (text in parenthesis is a stack frame with command name and variable names; the >> symbol shows the caller relationship; the text after -> is the return code in parenthesis and the result value in double quotes):

(NB this is provisional notation, I intend to improve it later.)

1. ( commandA a b c )

2. ( commandA a b c ) >> ( commandB -> (0) "abc" )

3. ( commandA a b = "abc" c )

The command return by itself, and also if invoked like return -code ok, causes normal termination.

Error termination

In error termination, the stack is unwound until a handler is found. This is either the try command or the catch command being executed in a frame of the call stack. The handler gets the return code and the result value from the termination.

If commandA calls try, which calls commandB, which calls commandC, which terminates with an error, the call stack might look something like this (the stack frame for try has two fictional variables, _C and _R: they are just a way to show the handler's capability to record the return code and the result value):

1. ( commandA )

2. ( commandA ) >> ( try _C _R )

3. ( commandA ) >> ( try _C _R ) >> ( commandB )

4. ( commandA ) >> ( try _C _R ) >> ( commandB ) >> ( commandC -> (1) "xyz" )

5. ( commandA ) >> ( try _C = 1 _R = "xyz" )

If neither commandC nor commandB had terminated with an error, the call stack might have looked like this:

1. ( commandA )

2. ( commandA ) >> ( try _C _R )

3. ( commandA ) >> ( try _C _R ) >> ( commandB )

4. ( commandA ) >> ( try _C _R ) >> ( commandB ) >> ( commandC -> (0) "abc" )

5. ( commandA ) >> ( try _C _R ) >> ( commandB -> (0) "def" )

6. ( commandA ) >> ( try _C = 0 _R = "def" )

The return code makes it possible for try to know if the result value is valid data or an error message: in the previous example, "def" is data, while in the example before that, "xyz" is an error message.

The commands throw and error cause error termination, and so does return if invoked like return -code error, and unknown if it can't find a command to execute. Many other commands terminate with an error if they can't complete their tasks, such as expr {1/0} or open {this file doesn't exist I promise} "r".

Return termination

A return termination unwinds exactly two stack frames and leaves its result value in the third ("control returns to the caller's caller"). If commandA calls set a [commandB], and commandB calls commandC, which executes return -code return theResult, the stack looks like this:

1. ( commandA a )

2. ( commandA a ) >> ( commandB )

3. ( commandA a ) >> ( commandB ) >> ( commandC -> (3) "theResult" )

4. ( commandA a = "theResult" )

There is a twist, though. The termination reaches the caller's caller as a normal termination. Conceptually, it's like this:

1. ( commandA a )

2. ( commandA a ) >> ( commandB )

3. ( commandA a ) >> ( commandB ) >> ( commandC -> (3) "theResult" )

4. ( commandA a ) >> ( commandB -> (0) "theResult" )

5. ( commandA a = "theResult" )

But note that there is no actual corresponding return command in commandB! The termination is converted from return to normal by the interpreter while unwinding the stack.

The only way to cause a return termination is to invoke the command return -code return.

Break termination

A break termination unwinds one stack frame and expects to find a handler in the exposed frame: specifically, a handler that was called as part of a command implementing a loop construct. If such a handler isn't found, the termination is converted into an error termination with the result value invoked "break" outside of a loop.

TODO: examine break termination in for, foreach, lmap, while

A command handling a break termination should probably conform to the convention created by the behavior of the break command. The break command can't produce a result value, but that principle might be possible to set aside. The break command typically first terminates the body of a looping command and then causes a normal termination of the looping command. A construct like 'loop script finally' postscript that handles a break by terminating script but letting postscript be evaluated before terminating the whole command might not be unreasonable.

(The break command can also be used in a stackless context, such as in a Tk event binding. This is outside the scope of this page.)

A return termination is caused by invoking the break command or by invoking the command return -code break.

Continue termination

(I'll be back soon -- PL)

proc caller {} {
    set result [callee]
}
control { script } ;# -> result of control might be result of script