Version 0 of null

Updated 2007-01-06 06:21:27

AMG: Everything is a string is nice and all, but for many applications it's important to have a special value that's outside the allowable range.

If the range of values is numbers, any non-numeric string (e.g., "") will do, so "" can be used to signify that the user didn't specify a number. C strings can't contain NUL and therefore are free to reserve NUL as a terminator or field separator. Unix filenames reserve / and NUL, so / is available to separate path components and NUL can be used with find -print0, xargs -0, and cpio -0 to separate filenames in a list. (The more common practice of separating filenames with whitespace breaks whenever whitespace is used in filenames.)

But if the allowable range of values is any string at all, no string can be reserved for a special purpose.

Since Tcl has nothing that is not a string, the only remaining solution is to have a separate, out-of-band way of tracking the special purpose case. Returning to the C example, if a program needs to support having NUL in the middle of a string, it must either encode the string using a possibly fragile quoting scheme, or it can use a separate variable to track its length. As for the Unix filename example, if a filename needs to contain a /, it absolutely must be encoded, for instance as %2F, but then the quote character must also be encoded (%25). This is because Unix filenames have no room for an out-of-band channel. (By the way, KDE uses this encoding scheme to support / in filenames.) In Tcl, a separate variable can be used, such as a variable that's false when the user didn't specify a string.

This can be very cumbersome and isn't always viable (again, when the range is all strings). Two examples are default arguments and SQL nulls. Foolproof tracking of the former requires the proc to accept args and do its own defaulting; [llength $args] serves as the out-of-band channel. Tracking the latter may require asking the database to prepend a special character to all non-null string results; basically the first character is the out-of-band communication channel identifying the nullity of the result. A more straightforward option is to SELECT the NOTNULL of the string columns whose values could be null.


jhh proposes a possible solution in TIP 185 [L1 ]. Basically, {null}! is recognized by the parser as a null, which is not a string; it is distinct from all possible strings. "{null}!" is, of course, a seven-character-long string, and it's also a one-element list whose sole element is a null.

I (AMG) have several comments regarding the TIP:

  • I prefer to say "null" instead of "null string" because I feel that a null is not a string at all. It's the one thing that isn't a string! I guess we'll need to change our motto. :^)
  • Likewise, I'd rather not tack the null management functionality onto the [string] command.
  • I think I'd prefer a [null] command for generating nulls and testing for nullity. It's probably best not to use the == and != expr operators for this purpose; null isn't equal to anything, not even null.
  • We can ditch the {null}! syntax in favor of using the [null] command to generate nulls, but then [null] cannot be implemented in pure script. This might be an important concern for safe interps.
  • Automatic compatibility with "null-dumb" commands is a mistake; it's the responsibility of the script to perform this interfacing.
  • When passed a null, the Tcl_Get*() and Tcl_Get*FromObj() functions should return TCL_ERROR or NULL (in the case of Tcl_GetString*()).
  • Most command should be "null-dumb". Only make a command handle nulls when the meaning is clear.
  • The non-object Tcl commands can probably represent nulls as null pointers ((void*)0 or NULL). If for some reason that can't work, reserve a special address for nulls by creating a global variable.

Feel free to argue. :^)


[ Category Language ]