Version 9 of null

Updated 2007-01-06 13:15:36

AMG: Everything is a string is nice and all, but for many applications it's important to have a special value that's outside the allowable domain.

If the domain of values is numbers, any non-numeric string (e.g., "") will do, so "" can be used to signify that the user didn't specify a number. C strings can't contain NUL and therefore are free to reserve NUL as a terminator or field separator. Unix filenames reserve / and NUL, so / is available to separate path components and NUL can be used with find -print0, xargs -0, and cpio -0 to separate filenames in a list. (The more common practice of separating filenames with whitespace breaks whenever whitespace is used in filenames.)

But if the allowable domain of values is any string at all, no string can be reserved for a special purpose.

Since Tcl has nothing that is not a string, the only remaining solution is to have a separate, out-of-band way of tracking the special case. Returning to the C example, if a program needs to support having NUL in the middle of a string, it must either encode the string using a possibly fragile quoting scheme, or it can use a separate variable to track its length. As for the Unix filename example, if a filename needs to contain a /, it absolutely must be encoded, for instance as %2F, but then the quote character must also be encoded (%25). This is because Unix filenames have no room for an out-of-band channel. (By the way, KDE uses this encoding scheme to support / in filenames.) In Tcl, a separate variable can be used, such as a variable that's false when the user didn't specify a string.

This can be very cumbersome and isn't always viable (again, when the domain is all strings). Two examples are default arguments and SQL nulls. Foolproof tracking of the former requires the proc to accept args and do its own defaulting; [llength $args] serves as the out-of-band channel. Tracking the latter may require asking the database to prepend a special character to all non-null string results; basically the first character is the out-of-band communication channel identifying the nullity of the result. A more straightforward option is to SELECT the NOTNULL of the string columns whose values could be null.


jhh proposes a possible solution in TIP 185 [L1 ]. Basically, {null}! is recognized by the parser as a null, which is not a string; it is distinct from all possible strings. "{null}!" is, of course, a seven-character-long string, and it's also a one-element list whose sole element is a null.

I (AMG) have several strong comments regarding the TIP:

  • I prefer to say "null" instead of "null string" because I feel that a null is not a string at all. It's the one thing that isn't a string! I guess we'll need to change our motto. :^)
  • Likewise, I'd rather not tack the null management functionality onto the [string] command.
  • I think I'd prefer a [null] command for generating nulls and testing for nullity. It's best not to use the == and != expr operators for this purpose; null isn't equal to anything, not even null.
  • We can ditch the {null}! syntax in favor of using the [null] command to generate nulls, but then [null] cannot be implemented in pure script. This might be an important concern for safe interps.
  • Automatic compatibility with "null-dumb" commands is a mistake; it's the responsibility of the script to perform this interfacing.
  • When passed a null, the Tcl_GetType() and Tcl_GetTypeFromObj() functions should return TCL_ERROR or NULL (in the case of Tcl_GetString() and Tcl_GetStringFromObj()).
  • Most commands should be "null-dumb". Only make a command handle nulls when it is clear how they should be interpreted.
  • The non-object Tcl commands can probably represent nulls as null pointers ((void*)0 or NULL). If for some reason that can't work, reserve a special address for nulls by creating a global variable.

Feel free to argue. :^)


AMG: Here's a silly and inefficient proc to help me play around with the ideas presented above:

 proc foobar {varname {value {null}!}} {
    upvar 1 $varname var
    if {![null $value]} {
       set var $value
    }
    return $var
 }

This proc should behave the same as [set].

You will notice that I used {null}! even though in my above comments I suggested removing it in favor of always using [null] to obtain nulls. But it turns out that's not feasible in the above code; it would only result in $value defaulting to the string "[null]". To get the desired behavior, I'd have to write [list varname [list value [null]]]], which is far from readable. (With Tcl 9.0 Wishlist #67, it becomes (varname (value [null])), which I can live with.)

That's one black mark against my idea...

A more worrying problem is that [foobar] can't be used to set a variable to null! Why? Because the domain of $value includes all strings and null, there is (once again) no possible value outside the domain that can be used to indicate that a special condition occurred and cannot be "forged" by the caller. So what are nulls good for again?

I'm up to two black marks now. It's not looking good.

It seems nulls aren't as useful as originally hoped. (Notice the use of the passive voice.) But are they still good for something? The reason [foobar] doesn't work in the above case is that it is being driven by the script, and the script is capable of producing nulls. If its input instead came from a file or socket, it would be just fine because reading from a channel will never result in a null. Of course, at this point I'm reminded of tainting, which might be a better solution.


wdb When switching from Lisp to Tcl, the lack of some special value such as NULL was one of the drawbacks I decided that I can live with it. It is the price of the simplicity I am willing to pay. There are more than one cases where something similar is resolved by some trade-off:

  • In the switch statement, the word default impacts the value "default".
  • In proc's arg list, the word args impacts the choice of argument names.
  • In Snit and Itcl, the argument #auto or %AUTO% impacts the choice of instance name.
  • And so on.

Extending the value range of type string leads to the consequence of leaving the principle eias. It is possible, and sometimes even desirable, to extend it. If so, ask yourself, if Tcl is your right choice anymore.

If you ask me: I prefer the state as is. The drawbacks are known, and as mentioned above, I can live with them.


[ Category Language ]