[Fabricio Rocha] - 08-Feb-2010 - Error treatment in programming always seems to be an underestimated topic, often untold by and to newbies, while it's a useful thing that might be naturally taught along with the basics in a programming language. Only after some two years of studying Tcl/Tk I was able to find some information about this subject and develop myself a very basic and limited idea of how applications can avoid being crashed by bugs or misuse, so I would like to discuss some error management techniques with the experienced folks, while building up a tutorial from this discussion (something highly useful by aspiring Tclers like me). And, please, treat the errors you find... **Which error-management features are provided by Tcl?** ***unknown*** Whenever a script invokes a command/procedure which is not defined anywhere, the Tcl interpreter triggers a built-in command called [unknown]. This command searches a definition of the procedure in other places than the interpreter's context, and if a procedure with the name is not found anywhere, [unknown] stops the script's processing and shows an error message in the console. Like other Tcl built-in commands, ::unknown can be renamed and substituted by a procedure with the same name which can do other things before the default actions; and this is the way unknown is best used. Since Tcl 8.5, there is also the [namespace unknown] command, which allows the programmer to name a procedure which will be called when a command/procedure lookup fails in the scope of a specific [namespace]. ***::errorCode*** A reserved and global variable called [errorCode] is automatically created by the Tcl interpreter during the execution of a script for holding information about errors occurred in runtime, so its contents are changed everytime an error happens. `errorCode` is a variable-length list whose first element is a string which indicates the type of error which happened, and the following elements, if existant, are details about the errors which can be used by a procedure for error treatment. As of Tcl8.5, `::errorCode` seems to be still underused by many of the core Tcl commands, and these are the possible values and structures that are generated by these commands and stored in `::errorCode`, according to the official documentation [http://www.tcl.tk/man/tcl8.5/TclCmd/tclvars.htm]: * "ARITH" ''code msg'' - Arithmetic error. The ''code'' element can contain the strings DIVZERO, DOMAIN, OVERFLOW or IOVERFLOW. ''msg'' contains a human-readable description of the problem. * "CHILDKILLED" ''pid sigName msg'' * "CHILDSUSP" ''pid sigName msg'' - Those errors are related to the use of processes in the underlying OS shell by the Tcl interpreter; more specifically, they contain information about processes which were unexpectedly terminated or suspended. ''pid'' is the process identifier; ''sigName'' is the signal which caused the process end or suspension; ''msg'' is a human-readable explanation of the problem. The list of possible values for ''sigName'' is in the system's C standard library ''signal.h'' header file (''TODO: list them here''). * "CHILDSTATUS" ''pid code'' - These values are set when an external program used by a Tcl script ends with non-zero value, which is considered an abnormal end. In such cases, the second element of `::errorCode` will contain the process identifier number and the third one will hold the process "exit code". Actually, some system utilities intended for use in pipe sequences exit non-zero values as the correct result of their operations, so the ''code'' value may be the real and valid result of the child process. * "POSIX" ''errName msg'' - Lots of commands which depend on OS-provided functionalities, like file and [socket] operations, can result in errors of this family. The possible values for the ''errName'' item are listed in the ''errno.h'' header file of the C standard library (TODO: list them here). There is some contestation about the precision of these error reports, mainly under Windows, which is not exactly POSIX-compliant. * "NONE" - This single value in a one-element `::errorCode` is set when a procedure generates an error -- intentionally or not -- but no detailed information is given about this error. Any procedure can set its own error values in `::errorCode` by using the "advanced" options for the command `return`, as we will see below. ***return*** The [return] command may have more uses than just giving back a valid or invalid value to a script or procedure which had called the procedure it's in. As often told in many Tcl tutorials and books, the use of a `return` command in the end of a procedure is always recommended even if no value is actually returned (i.e., like Pascal procedures or C "void functions"), and if the `return` command is omitted, the procedure will return the result of its last command when it ends. If everything went right, this value will be 0 by default. For some procedures, returning 1 or 0, "yes" or "no" will be sufficient for telling to their callers that something was done or not. But quite often a procedure must return one of three or more values, and there is no way to tell the caller that a certain returned value is actually the result of an error. Fortunately, Tcl allows the programmer to use special parameters that tell the interpreter and the whole application that something went wrong and the returned value should not be taken as valid. The values passed with these special parameters can be placed in the ::errorCode variable, and they normally tell the interpreter to do something different instead of just going on with the script processing -- the most common action is just halt processing and issue error messages in the console. Here are these parameters: '''-code''' ''code'' - This option allows the programmer to purposedly interrupt the procedure and raise an error flag. It can receive the following values: * '''0 or "ok"''' - This is the default value assumed when the option is not present. It means that the value passed with the `return` command is valid and script parsing can continue normally. * '''1 or "error"''' - Indicates that an error happened and the returned value can not be considered valid -- in fact, it should be considered as a string which explains the error. This code raises an error in the interpreter just like the [error] command, and if there is no proper handling of this error state, the interpreter stops parsing the script and shows error messages in the console. * '''2 or "return"''' - Has the effect of causing a `return` without arguments in the caller's context, the upper level in the procedures stack. (''Can someone provide some examples of situations where this is useful?'') * '''3 or "break"''' - Mainly used by the Tcl commands which provide loops; has the effect of a [break] command issued in the caller's context. * '''4 or "continue"''' - Also used basically by the Tcl commands themselves; has the same effect of a [continue] command in the calling context. '''-errorcode''' ''list'' - If the `-code` option was set to 1, the `::errorCode` variable is, by default, set to NONE. This option allows the programmer to set `::errorCode` to the format and values of ''list'', thus providing more details about the error. According to the official documentation, this option is ignored if the `-code` option is set to any other value than 1 or "error". '''-errorinfo''' ''string'' - This option is also valid only if the `-code` parameter was set to 1 or "error". When omitted, a stack trace, listing the latest procedure calls that happened before the error, is stored at the `::errorInfo` global variable. With this option, a more detailed information can be included in `::errorInfo` -- for example, what the procedure was trying to do in the moment of the crash. If the stack trace is still of interest, the programmer can retrieve the contents of the `::errorInfo` variable, append them to a customized message, then put the whole thing in the `-errorinfo` option of `return`. '''-options''' ''list_of_pairs'' - These pairs of options/values, with any contents, are simply appended to the other pairs which are given back by `return` to the procedure caller. This is useful for passing to the caller all the error information which the procedure itself received from a command it had used. '''-level''' ''number'' - ''number'' is the number of levels up in the procedures stack in which the error code defined by the `-code` option will be applied. For example, if procedure A calls B, B calls C and C ends with something like `return -code error -level 2 "Houston, we have a problem"`, the error won't be raised in the context of B (which would be the default behaviour), but instead in the context of A. (''A good use for this, anyone?'') ***catch*** The [catch] command is the Tcl way for directly receiving the special values that [return] may have set in the case of an error. It runs a certain procedure/command under a second instance of the interpreter, which will not crash the application if something goes wrong. If the "catched" procedure ends abnormally -- i.e., its return `-code` is other than 0 --, `catch` will return exactly this code. Two variables names can optionally be passed to `catch`: the first one will receive the value passed by the called procedure's `return` command (which, in case of an error, is expected to be an explanation of the error) and the second one will hold a [dict] (which can be processed like a list of pairs) pretty similar to the contents of ::errorCode and to the extra options used in the `return` command, with the keys `-errorcode`, `-errorinfo` and `-errorline`. ***error*** The [error] command triggers the error-handling measures in the application and/or Tcl interpreter. Before Tcl 8.5 introduced some of the advanced options for the [return] command, `error` was the preferred way to intentionally signal an error. ***bgerror*** The [bgerror] command is called when an uncaught error reaches the Tcl/Tk event loop; it gives the application the ability to handle the error in some appropriate way. In GUI applications, it's common to report the error the user and give them the ability to easily send the stack trace and any related information to the developer. In non-GUI applications, it's useful to log the stack trace and related information to a log file; then, the application can either keep running or shutdown, as appropriate. **How to use all this stuff?** The infrastructure provided by Tcl allows applications to use [exception handling], in the traditional sense of "try to do this, and if something goes wrong tell me and I'll see what can I do". This contrasts to the approach of "errors prediction", which, for example, performs a series of tests on the data which will be passed to a command for checking its validity, before the operation is performed. Both techniques are not excludent, however. Tcl allows various approaches to errors management, with their pros and cons: *** Approach 1: return, catch and process the error *** 1) Always use the advanced `return` options when writing procedures which can cause or face errors, or which may give back an invalid result; 2) Always use `catch` for calling commands or your own procedures which can cause or face errors like described in 1; 3) Create a procedure to be called in the case that `catch` captures an error, for interpreting the error codes and, based on that, show error messages in friendly and standardized dialogs and perform operations which could minimize or solve the error. *** Approach 2: tracing ::errorCode*** Create a [trace] on `::errorCode`, and a procedure to be called everytime it is modified, for interpreting the codes, display them, provide minimization measures, etc. ''Any other? Please add what you do!'' [LV] One useful thing that I sometimes use is creation of log files containing information intended to be useful in determining the state of the program during particular points. Sometimes, displaying information about the values of a number of variables is not as helpful as having that information written to a file - for instance, there are times when a [GUI] application might not have easy access to [stderr] for error traces. Writing information to a log file, which is available - and perhaps even emailable - to the programmer responsible is helpful. **Which errors shall be told to the user?** Failure in files, channels and sockets operations? Errors caused by invalid inputs. It is often useful to use a distinct error code (e.g., '''INVALID''') for data validation errors, as it makes it possible for the application to distinguish between errors in the user's input and errors in the validation or execution code. **Which errors shall NOT be told to the user?** '''Syntax errors and programming bugs''' - They'd better be fixed. Sure, but.... [LV] Certainly they need to be fixed. However, if you hide the info from the user, how will the programmer know what the bug/error is? Unless you have a guaranteed method of getting said info to the programmer (and email doesn't count - the user MIGHT be working off line), then providing the user with sufficent information to a) know what the error is and b) know who to contact or what to do about the problem seems the best approach to me. [Fabricio Rocha] - 12-Feb-2010 - One more reason for having a way to intercept and explain this kind of errors to common users is that it seems that any test suite or any test routine will not be able to find some errors that users are able to find. Of course it is not nice to show weaknesses to a final user, but this is something practically unavoidable in software. And in addition to the situations listed by [LV], we can consider that, for an open source/free software, providing good information about an error is a way to c) allow a user with sufficient programming knowledge to fix the problem and possibly contribute to the software development. ***See Also:*** * [try]/[catch] <>Category Debugging | Category Dev. Tools