[AMG]: [Everything is a string] is nice and all, but for many applications it's important to have a special value that's outside the allowable domain. If the domain of values is numbers, any non-numeric string (e.g., "") will do, so "" can be used to signify that the user didn't specify a number. [C] strings can't contain '''NUL''' and therefore are free to reserve '''NUL''' as a terminator or field separator. [Unix] filenames reserve '''/''' and '''NUL''', so '''/''' is available to separate path components and NUL can be used with '''find -print0''', '''xargs -0''', and '''cpio -0''' to separate filenames in a list. (The more common practice of separating filenames with whitespace breaks whenever whitespace is used in filenames.) But if the allowable domain of values is any string at all, no string can be reserved for a special purpose. Since [Tcl] has nothing that is not a string, the only remaining solution is to have a separate, out-of-band way of tracking the special case. Returning to the C example, if a program needs to support having '''NUL''' in the middle of a string, it must either encode the string using a possibly fragile quoting scheme, or it can use a separate variable to track its length. As for the Unix filename example, if a filename needs to contain a '''/''', it absolutely must be encoded, for instance as '''%2F''', but then the quote character must also be encoded ('''%25'''). This is because Unix filenames have no room for an out-of-band channel. (By the way, [KDE] uses this encoding scheme to support '''/''' in filenames.) In Tcl, a separate variable can be used, such as a variable that's false when the user didn't specify a string. This can be very cumbersome and isn't always viable (again, when the domain is all strings). Two examples are default arguments and SQL nulls. Foolproof tracking of the former requires the [proc] to accept [args] and do its own defaulting; '''[[[llength] $args]]''' serves as the out-of-band channel. Tracking the latter may require asking the database to prepend a special character to all non-null string results; basically the first character is the out-of-band communication channel identifying the nullity of the result. A more straightforward option is to '''SELECT''' the '''NOTNULL''' of the string columns whose values could be null. ---- [jhh] proposes a possible solution in [TIP] 185 [http://tip.tcl.tk/185]. Basically, '''{null}!''' is recognized by the parser as a null, which is ''not'' a string; it is distinct from all possible strings. '''"{null}!"''' is, of course, a seven-character-long string, and it's also a one-element list whose sole element is a null. I ([AMG]) have several strong comments regarding the TIP: * I prefer to say "null" instead of "null string" because I feel that a null is not a string at all. It's the one thing that isn't a string! I guess we'll need to change our motto. :^) * Likewise, I'd rather not tack the null management functionality onto the [[[string]]] command. * I think I'd prefer a '''[[null]]''' command for generating nulls and testing for nullity. It's best not to use the '''==''' and '''!=''' [expr] operators for this purpose; null isn't equal to anything, not even null. * We can ditch the '''{null}!''' syntax in favor of using the '''[[null]]''' command to generate nulls, but then '''[[null]]''' cannot be implemented in pure script. This might be an important concern for [safe interps]. * Automatic compatibility with "null-dumb" commands is a mistake; it's the responsibility of the script to perform this interfacing. * When passed a null, the '''Tcl_GetType()''' and '''Tcl_GetTypeFromObj()''' functions should return '''TCL_ERROR''' or '''NULL''' (in the case of '''Tcl_GetString()''' and '''Tcl_GetStringFromObj()'''). * Most commands should be "null-dumb". Only make a command handle nulls when it is clear how they should be interpreted. * The non-object Tcl commands can probably represent nulls as null pointers ('''(void*)0''' or '''NULL'''). If for some reason that can't work, reserve a special address for nulls by creating a global variable. Feel free to argue. :^) ---- [AMG]: Here's a silly and inefficient proc to help me play around with the ideas presented above: proc foobar {varname {value {null}!}} { upvar 1 $varname var if {![null $value]} { set var $value } return $var } This proc should behave the same as [[[set]]]. You will notice that I used '''{null}!''' even though in my above comments I suggested removing it in favor of always using '''[[null]]''' to obtain nulls. But it turns out that's not feasible in the above code; it would only result in '''$value''' defaulting to the string '''"[[null]]"'''. To get the desired behavior, I'd have to write '''[[[list] varname [[list value [[null]]]]]]''', which is far from readable. (With [Tcl 9.0 Wishlist] #67, it becomes '''(varname (value [[null]]))''', which I can live with.) That's one black mark against my idea... A more worrying problem is that '''[[foobar]]''' can't be used to set a variable to null! Why? Because the domain of '''$value''' ''includes'' all strings ''and'' null, there is (once again) no possible value outside the domain that can be used to indicate that a special condition occurred and cannot be "forged" by the caller. So what are nulls good for again? I'm up to two black marks now. It's not looking good. It seems nulls aren't as useful as originally hoped. (Notice the use of the passive voice.) But are they still good for something? The reason '''[[foobar]]''' doesn't work in the above case is that it is being driven by the script, and the script is capable of producing nulls. If its input instead came from a file or socket, it would be just fine because reading from a channel will never result in a null. Of course, at this point I'm reminded of [taint]ing, which might be a better solution. ---- [wdb] When switching from Lisp to Tcl, the lack of some special value such as ''NULL'' was one of the drawbacks I decided that I can live with it. It is the price of the simplicity I am willing to pay. There are more than one cases where something similar is resolved by some trade-off: * In the [switch] statement, the word [default] impacts the '''value''' "default". * In [proc]'s arg list, the word [args] impacts the choice of argument names. * And so on. Extending the value range of type string leads to the consequence of [eias]. It is possible, and sometimes even desirable, to extend it. If so, ask yourself, if Tcl is your right choice anymore. If you ask me: I prefer the ''state as is''. The drawbacks are known, and as mentioned above, I can live with them. ---- [[ [Category Language] ]]