[AMG]: [Everything is a string] is nice and all, but for many applications it's important to have a special value that's outside the allowable domain. If the domain of values is numbers, any non-numeric string (e.g., "") will do, so "" can be used to signify that the user didn't specify a number. [C] strings can't contain '''NUL''' and therefore are free to reserve '''NUL''' as a terminator or field separator. [Unix] filenames reserve '''/''' and '''NUL''', so '''/''' is available to separate path components and NUL can be used with '''find -print0''', '''xargs -0''', and '''cpio -0''' to separate filenames in a list. (The more common practice of separating filenames with whitespace breaks whenever whitespace is used in filenames.) But if the allowable domain of values is any string at all, no string can be reserved for a special purpose. Since [Tcl] has nothing that is not a string, the only remaining solution is to have a separate, out-of-band way of tracking the special case. Returning to the C example, if a program needs to support having '''NUL''' in the middle of a string, it must either encode the string using a possibly fragile quoting scheme, or it can use a separate variable to track its length. As for the Unix filename example, if a filename needs to contain a '''/''', it absolutely must be encoded, for instance as '''%2F''', but then the quote character must also be encoded ('''%25'''). This is because Unix filenames have no room for an out-of-band channel. (By the way, [KDE] uses this encoding scheme to support '''/''' in filenames.) In Tcl, a separate variable can be used, such as a variable that's false when the user didn't specify a string. This can be very cumbersome and isn't always viable (again, when the domain is all strings). Two examples are default arguments and SQL nulls. Foolproof tracking of the former requires the [proc] to accept [args] and do its own You can use the trick from [ML]defaulting; '''[[[llength] $args]]''' serves as the out-of-band channel. Tracking the latter may require asking the database to prepend a special character to all non-null string results; basically the first character is the out-of-band communication channel identifying the nullity of the result. A more straightforward option is to '''SELECT''' the '''NOTNULL''' of the string columns whose values could be null. ---- [jhh] proposes a possible solution in [TIP] 185 [http://tip.tcl.tk/185]. Basically, '''{null}!''' is recognized by the parser as a null, which is ''not'' a string; it is distinct from all possible strings. '''"{null}!"''' is, of course, a seven-character-long string, and it's also a one-element list whose sole element is a null. I ([AMG]) have several strong comments regarding the TIP: * I prefer to say "null" instead of "null string" because I feel that a null is not a string at all. It's the one thing that isn't a string! I guess we'll need to change our motto. :^) * Likewise, I'd rather not tack the null management functionality onto the [[[string]]] command. * I think I'd prefer a '''[[null]]''' command for generating nulls and testing for nullity. It's best not to use the '''==''' and '''!=''' [expr] operators for this purpose; null isn't equal to anything, not even null. * We can ditch the '''{null}!''' syntax in favor of using the '''[[null]]''' command to generate nulls, but then '''[[null]]''' cannot be implemented in pure script. This might be an important concern for [safe interps]. * Automatic compatibility with "null-dumb" commands is a mistake; it's the responsibility of the script to perform this interfacing. * When passed a null, the '''Tcl_GetType()''' and '''Tcl_GetTypeFromObj()''' functions should return '''TCL_ERROR''' or '''NULL''' (in the case of '''Tcl_GetString()''' and '''Tcl_GetStringFromObj()'''). * Most commands should be "null-dumb". Only make a command handle nulls when it is clear how they should be interpreted. * The non-object Tcl commands can probably represent nulls as null pointers ('''(void*)0''' or '''NULL'''). If for some reason that can't work, reserve a special address for nulls by creating a global variable. Feel free to argue. :^) ---- [AMG]: Here's a silly and inefficient proc to help me play around with the ideas presented above: proc foobar {varname {value {null}!}} { upvar 1 $varname var if {![null $value]} { set var $value } return $var } This proc should behave the same as [[[set]]]. You will notice that I used '''{null}!''' even though in my above comments I suggested removing it in favor of always using '''[[null]]''' to obtain nulls. But it turns out that's not feasible in the above code; it would only result in '''$value''' defaulting to the string '''"[[null]]"'''. To get the desired behavior, I'd have to write '''[[[list] varname [[list value [[null]]]]]]''', which is far from readable. (With [Tcl 9.0 Wishlist] #67, it becomes '''(varname (value [[null]]))''', which I can live with.) That's one black mark against my idea... A more worrying problem is that '''[[foobar]]''' can't be used to set a variable to null! Why? Because the domain of '''$value''' ''includes'' all strings ''and'' null, there is (once again) no possible value outside the domain that can be used to indicate that a special condition occurred and cannot be "forged" by the caller. So what are nulls good for again? I'm up to two black marks now. It's not looking good. It seems nulls aren't as useful as originally hoped. (Notice the use of the passive voice.) But are they still good for something? The reason '''[[foobar]]''' doesn't work in the above case is that it is being driven by the script, and the script is capable of producing nulls. If its input instead came from a file or socket, it would be just fine because reading from a channel will never result in a null. Of course, at this point I'm reminded of [taint]ing, which might be a better solution. ---- [wdb] When switching from Lisp to Tcl, the lack of some special value such as ''NULL'' was one of the drawbacks I decided that I can live with it. It is the price of the simplicity I am willing to pay. There are more than one cases where something similar is resolved by some trade-off: * In the [switch] statement, the word [default] impacts the '''value''' "default". * In [proc]'s arg list, the word [args] impacts the choice of argument names. * In [Snit] and [Itcl], the argument #auto or %AUTO% impacts the choice of instance name. * And so on. Extending the value range of type string leads to the consequence of leaving the principle [eias]. It is possible, and sometimes even desirable, to extend it. If so, ask yourself, if Tcl is your right choice anymore. If you ask me: I prefer the ''state as is''. The drawbacks are known, and as mentioned above, I can live with them. [AMG]: [switch] can select on the value "default" if "default" is not the last option given. [proc] can accept an argument named "args" if it's not the last one in the list (although see [Tcl 9.0 Wishlist] #77). I'm just pointing out that these "keywords" only have special meaning when in combination with some other out-of-band data, which in these cases is list position. One more example is the use of '''-''' to signify an option. To disambiguate, we have '''--''' to partition the argument list into options and non-options (see ['--' in Tcl]). Yes, it's totally true we can live without nulls. The real problem comes when interfacing with systems that ''do'' have nulls. Tcl has no easy and safe way to represent them. Reserving a string will work most of the time, but the Tcl script becomes confused when the reserved string collides with valid data. This may happen by accident or as part of a malicious attack, which means even nonsense strings like "ßÿÑâŖΊ" aren't safe. All the other stuff I said about nulls is just cute, sugary things we can do with them if they were added. ---- [wdb] (again) but if really necessary, it is possible to introduce typed data to tcl. Just put them in a list the first of which contains the type, and the second the data as follows: set typed_value1 {allowed {hello world}} set typed_value2 {disallowed {bye bye}} This example shows the use of two data types ''allowed'' and ''disallowed''. It allows easily to construct a null value by choice of type ''disallowed''. [AMG]: This is like [jhh]'s method of prepending a special character to all non-null SQL results, except of course it's cleaner. [NEM]: Tagged data is also how [functional programming] languages like [ML] and [Haskell] handle optional data/NULLs: # data Maybe a = Just a | Nothing proc Just data { return [list Just $data] } proc Nothing {} { return [list Nothing] } set val1 [Just "some data including Just and Nothing"] set val2 [Nothing] Then you can test for missing data (NULL/Nothing) using a switch: switch -exact [lindex $data 0] { Just { do stuff with [lindex $data 1] } Nothing { handling missing data } } Alternatively, in many cases you can use the (non-)existence of a variable or dictionary/array element to test for nullity. e.g. in a database-like interface: $db query $query row { if {![info exists row(name)]} { # name is NULL } } [Lars H]: In the original discussion of TIP#185, the following methods were proposed for interfacing Tcl with systems that have NULL values: 1. If the external function returns a value or NULL, then have the corresponding Tcl command return a list of one element for non-NULL values or an empty list for a NULL value. In Tcl 8.5, [{*}] greatly simplifies using such commands. 2. If the external function returns a "record" where some of the entries may be NULLs, then have the corresponding Tcl command return a dictionary which only has entries for the fields with non-NULL values. Type-tagging values using lists as shown above may also be necessary when interacting with other systems, as some indeed take different actions for data of different types (even if the values are the same). [tcom] apparently has some troubles in this area, as it does not provide for specifying the type of data to pass on. In [TclAE], the types are instead explicitly specified. What NULL proponents should take note of is that Tcl values, as a consequence of the [dodekalogue], constitute a monoid [http://en.wikipedia.org/wiki/Monoid] with the empty string as identity element and string concatenation ([cconcat], for those who require a command name) as operation. The [Everything is a string] principle says that the monoid of Tcl values is in fact a free monoid (currently the free monoid of words in the alphabet of all BMP [Unicode] code-points), and I think it is an '''extremely good''' principle, but the dodekalogue does not explicitly proclaim it. Hence one could imagine a Tcl where there in addition to the strings exists a NULL value, but then it would have to be sorted out how this NULL should act under concatenation. What is passed on to A in the following commands? A [null][null] A [null]somestring A somestring[null] Another problem with introducing special values like NULL is that there's no reason to believe that ''one'' special value is always going to be sufficient: once in widespread use, someone will come up with a situation where NULLs should be handled as an ordinary value, but at the same time needs a SUPERNULL that isn't! On the whole, it is much simpler to avoid introducing any special values. [AMG]: Existence checking can work. Maybe [sqlite] eval's two- and three-argument forms can unset the variable or array element to signify that its value is null. The script already knows all the variable names, so it shouldn't need to be explicitly told what's null. But on the other hand, maybe the array, dictionary, or whatever can be accompanied by a list of all fields whose values turned up null. Encoding the data as a list seems clever. If [[[llength]]] is zero, the data is null. If [[[llength]]] is one, the data is stored in [[[lindex] 0]]. Use [{*}] to get at it most easily. I imagine it's possible to recursively apply this encoding to dictionaries and nested lists. Regarding the combination of nulls and non-nulls, [jhh]'s TIP suggested that concatenating a null with a string resulted in a null. "Nulls propagate. A null combined with any nonnull is null. Appending a null to a string, or substituting a null into a string nulls the entire string." By this rule, '''A''' will receive null in all three cases. My [[foobar]] example shows a case where SUPERNULL would at first glance appear help, but of course it's a ridiculous thing to ask for, especially since it would still not allow setting a variable to SUPERNULL. What's asked for is a value outside the input domain, but no such value can exist because a variable can be set to anything (string or otherwise). Therefore the only solution is the out-of-band channel, as in [[[llength] $[args]]] indicating how many arguments were passed. With some sugar it might be possible to add a command to check if an argument was explicitly passed or if it was left at its default; this seems like a halfway point because [[[llength] $[args]]] is being used internally but to the programmer it's no different than checking for null. I don't propose such a thing; I'm just giving examples. Lars, as you say, null would only be useful for this purpose so long as the command is intended only to interface with stuff that cannot generate its own nulls. But this is a funky reason to advocate null--- the original impetus was the desire to interface with stuff that ''does'' generate nulls. ---- [[ [Category Language] ]]