Script termination: results and control

A former Tcler writes in 2013-12-09: This page attempts to explain the (not too low-level) details of terminating scripts, as regards termination modes (ok, error, return, break, continue), result values (valid data vs error messages), and the effects on flow control (call stack manipulation and looping construct manipulation).

Basic terminology

In Tcl documentation, terms like return and exception are sometimes used more or less interchangeably for when control is transferred from a command to another command in the calling chain that leads up to the first command. The term exceptional return is used for every kind of return except normal return. The following definitions aren't Tcl-official but do make sense and at least don't contradict the official terminology:

termination
any way to end a script, such as by reaching the end, executing a terminating command, or having an error occur.
terminating command
any one of the commands return, error, throw, unknown (typically not invoked directly), break, and continue (see below). Does not include commands that raise an exception when they can't complete their task, such as expr {1/0}, even though they do terminate a script when this happens (unknown is a borderline case).
termination mode
the "protocol" by which the interpreter treats the termination (designated by the return code).
return
specifically, to terminate a script by executing a terminating command, also in the phrase "return control to ...".
(raise an) exception
to terminate a script in error termination mode, either by executing a terminating command or any command that fails for some reason (because this is the definition that comes closest to the one used in computer science and in other exception-enabled programming languages).
exception-raising command
any one of the commands error, throw, and unknown (typically not invoked directly); return when invoked as return -code error.

Modes of script termination

ModeNumeric codeSymbolic codeReturns toResult value treated by convention as...AKA
Normal0okCallerValid dataNormal return
Error1errorInnermost handlerError messageError return, error exception, exception
Return2returnNth caller's callerValid data (see below)Return exception?
Break3breakImmediate handlerNo convention existsBreak exception
Continue4continueImmediate handlerNo convention existsContinue exception
≥5(none)Innermost handlerNo convention exists
  • "Numeric code" and "symbolic code" refer to the value of the return code associated with the mode of termination in question. The numeric code is usable in most circumstances where the return code is handled: the symbolic code is primarily useful as an argument to the return command and as a keyword in a handler clause in the try command.
  • "By convention" means just that: while most if not all Tcl programs deal with result values in this manner, there is nothing to actually enforce this usage or to stop you from having an error message in normal termination mode or valid data error termination mode. It would just be a case of obfuscated Tcl.
  • The unnamed sixth mode is legal and functional but barely documented. There is no stated protocol or convention for dealing with this mode. tkcon at least treats it like error termination. (AMG: It is occasionally used for custom control structures, as in Wibble and WebSocket.) I won't discuss it further in this essay.

That's a lot of modes!

Maybe, but mostly it looks that way because Tcl makes the modes of termination explicit (and available for introspection and manipulation in a program). Most imperative languages actually do have normal, error, break, and continue termination in some form, and a few even keep the old-style goto termination mode around (but nowadays they typically don't let it go outside a function or sub or whatever their equivalent of a Tcl command is). A small minority of languages offer further modes, like (iteration-)redo in Perl and Ruby and (iteration-)restart in Ruby. Typically, in these languages a break statement (for example) is simply a fixed part of a language-defined looping construct. Few if any languages offer what Tcl does: access to the semantic function (i.e. the break termination) of a break statement.

So what?

With this access, you can easily (i.e. without writing your own parser/interpreter) define commands, say foo and bar, such that 1) bar causes a break termination and 2) foo can handle it in some way. The foo command can then evaluate scripts that have bar as a terminating command and differentiate it from other modes of termination.

How a script is terminated

Consider a script with three commands (the comment after each command shows the command's result value):

commandA ;# -> resultA
commandB ;# -> resultB
commandC ;# -> resultC

By reaching the end

The evaluation goes like this: commandA is invoked, its side-effects (if any) happen, and its result (resultA) is produced and discarded (because it isn't assigned to anything). The same thing happens to commandB and commandC, but as the script terminates after commandC, resultC isn't discarded: instead it becomes the result of the evaluation of the script. This is a case of normal termination.

By terminating command

Now, if commandB is a terminating command, evaluation goes differently. Commands commandA and commandB are invoked and their side-effects happen, but commandC doesn't get invoked at all and its side-effect doesn't happen. The result of commandB becomes the result of the evaluation of the script instead. This can by any one of the different modes of termination.

By error

If instead an error occurs during execution of commandB, the situation is similar to the last paragraph. Commands commandA and commandB are invoked, but only commandA is completed successfully. The side-effect of commandB may or may not happen, or happen partially. The intended result value of commandB isn't produced. Instead, the error produces an error message which becomes the result of the evaluation of the script. This is always error termination.

The call stack

(Simplification alert! The workings of the call stack are a little more complex than this discussion is willing to acknowledge.)

Now, I've talked about the script having a result value, but where does it go? Every script is evaluated in a level in a multi-layered data structure, the call stack, which is managed by the Tcl interpreter and mostly invisible to the programmer (see also stack frame). The executing command always resides in the most recently added level in the call stack. If that command calls another command, a new level is added to the call stack, and the called command can be executed in that level. When a command terminates by normal termination, the stack is unwound, i.e. the command's level is removed and the command in the previous level becomes active again ("control returns to the caller"). As part of the procedure to switch stack levels, the interpreter transfers the terminating command's result value to the caller's level where it might be stored in a variable (if not, it is discarded).

Up or down?

A generic stack is usually visualized as growing upwards from an initially empty base. The call stack, however, is traditionally implemented growing downwards in memory (hence the names of the commands uplevel and upvar, which operate on previous stack levels), and always having one starting level (the "top-level" in Tcl documentation). I've compromised in the following by depicting stacks as growing horizontally from left to right, starting with the first command called.

Normal termination

If commandA, with the variables a, b, and c, executes set b [commandB], and commandB produces the result value "abc" the call stack might look something like these three steps (text in parenthesis is a stack frame with command name and variable names; the >> symbol shows the caller relationship; the text after -> is the return code in parenthesis and the result value in double quotes):

1. ( commandA a b c )

2. ( commandA a b c ) >> ( commandB -> (0) "abc" )

3. ( commandA a b = "abc" c )

The command return by itself, and also if invoked like return -code return -level 0, causes normal termination. (The -level option value states where the specified return action is supposed to happen: 0 means in this level, 1 means in the previous level, etc.) By the way, return -code ok -level 0 value does not return: it's just a fancy way of exposing the value value (i.e. set foo [return -code ok -level 0 bar] is equivalent to set foo bar).

Error termination

In error termination, the stack is unwound until a handler is found. This is either the try command or the catch command being executed in a call stack level. The handler gets the return code and the result value from the termination.

If commandA calls try, which calls commandB, which calls commandC, which terminates with an error, the call stack might look something like this (the stack frame for try has two fictional variables, _C and _R: they are just a way to show the handler's capability to record the return code and the result value):

1. ( commandA )

2. ( commandA ) >> ( try _C _R )

3. ( commandA ) >> ( try _C _R ) >> ( commandB )

4. ( commandA ) >> ( try _C _R ) >> ( commandB ) >> ( commandC -> (1) "xyz" )

5. ( commandA ) >> ( try _C = 1 _R = "xyz" )

If neither commandC nor commandB had terminated with an error, the call stack might have looked like this:

1. ( commandA )

2. ( commandA ) >> ( try _C _R )

3. ( commandA ) >> ( try _C _R ) >> ( commandB )

4. ( commandA ) >> ( try _C _R ) >> ( commandB ) >> ( commandC -> (0) "abc" )

5. ( commandA ) >> ( try _C _R ) >> ( commandB -> (0) "def" )

6. ( commandA ) >> ( try _C = 0 _R = "def" )

The return code makes it possible for try to know if the result value is valid data or an error message: in the previous example, "def" is data, while in the example before that, "xyz" is an error message.

(As an aside, read this page about the semipredicate problem to see why it's a really good idea to have a separate error termination mode.)

The commands throw and error cause error termination, and so does return if invoked like return -code error, and unknown if it can't find a command to execute. Many other commands terminate with an error if they can't complete their tasks, such as expr {1/0} or open {this file doesn't exist I promise} "r".

Return termination

A return termination unwinds exactly as many stack levels as the number given in the -level option (default is 1), and then returns (with normal termination by default, but see below) from the level it ends up in ("control returns to the nth caller's caller"). If commandA calls set a [commandB], and commandB calls commandC, which executes return -code return theResult, the stack looks like this:

1. ( commandA a )

2. ( commandA a ) >> ( commandB )

3. ( commandA a ) >> ( commandB ) >> ( commandC -> (2) "theResult" )

4. ( commandA a = "theResult" )

Since the termination mode by default ends up as normal when the dust settles, this mode inherits the convention that the result value is valid data.

Return termination used like this might be useful for instance in a dispatching command, i.e. a command that assesses a situation and decides which one of a number of sub-commands available would be most appropriate to carry out the task in question: that command will, at completion, return directly to the command that called the dispatching command.

A return termination can be caused by invoking the command return -code return. As hinted above, any invocation of return with a -level option value n, where n is greater than zero, will in fact cause a transitional return termination in the n first stack levels and then with the mode of termination designated by the original return call.

Break termination

A break termination expects to terminate a script and end up inside a command implementing a loop construct. If such a construct isn't found, an exception is raised with the error message invoked "break" outside of a loop.

TODO: examine break termination in for, foreach, lmap, while, dict for, dict map

In Tcl 8.5, break termination was extrapolated from the break command with the intent to make it easier to write new control structures. A command handling a break termination should probably conform to the convention created by the behavior of the break command for reasons relating to the principle of least astonishment . Still, some variations might be allowed without contradicting convention:

arguments
The break command doesn't take arguments, but a break-like command could.
result value
The break command doesn't return a result value, but a break-like commands might.
flow control
The break command typically first terminates the body of a looping command and then causes a normal termination of the looping command. A construct like 'loop script finally' postscript that handles a break termination by terminating script but letting postscript be evaluated before terminating the whole command might not be unreasonable. (Note since this handling is implemented in the enclosing command, this will function the same way with the break command as with any break-like commands.)

(The break command can also be used in a stackless context, such as in a Tk event binding. This is outside the scope of this page.)

A break termination is caused by invoking the break command or by invoking the command return -code break -level 0. If you write a new breaking command, you would probably call return -code break (possibly with a result argument as well): then the level of the return would be set to 1, which means that it ends your command.

Continue termination

Most of the above about break termination applies to continue termination too.

The flow control convention for continue termination is to terminate an inner script and then let the enclosing command evaluate the script again in the next iteration of the loop, unless this was the last iteration, in which case the looping command is terminated.

A continue termination is caused by invoking continue or by invoking return -code continue -level 0. If you write a new continuing command, you would probably call return -code continue (possibly with a result argument as well): then the level of the return would be set to 1, which means that it ends your command.