[AMG]: [Everything is a string] is nice and all, but for many applications it's important to have a special value that's outside the allowable domain. If the domain of values is numbers, any non-numeric string (e.g., "") will do, so "" can be used to signify that the user didn't specify a number. [C] strings can't contain '''NUL''' and therefore are free to reserve '''NUL''' as a terminator or field separator. [Unix] filenames reserve '''/''' and '''NUL''', so '''/''' is available to separate path components and NUL can be used with '''find -print0''', '''xargs -0''', and '''cpio -0''' to separate filenames in a list. (The more common practice of separating filenames with whitespace breaks whenever whitespace is used in filenames.) But if the allowable domain of values is any string at all, no string can be reserved for a special purpose. Since [Tcl] has nothing that is not a string, the only remaining solution is to have a separate, out-of-band way of tracking the special case. Returning to the C example, if a program needs to support having '''NUL''' in the middle of a string, it must either encode the string using a possibly fragile quoting scheme, or it can use a separate variable to track its length. As for the Unix filename example, if a filename needs to contain a '''/''', it absolutely must be encoded, for instance as '''%2F''', but then the quote character must also be encoded ('''%25'''). This is because Unix filenames have no room for an out-of-band channel. (By the way, [KDE] uses this encoding scheme to support '''/''' in filenames.) In Tcl, a separate variable can be used, such as a variable that's false when the user didn't specify a string. This can be very cumbersome and isn't always viable (again, when the domain is all strings). Two examples are default arguments and SQL nulls. Foolproof tracking of the former requires the [proc] to accept [args] and do its own [[some text appears to be missing here]] You can use the trick from [ML]defaulting; '''[[[llength] $args]]''' serves as the out-of-band channel. Tracking the latter may require asking the database to prepend a special character to all non-null string results; basically the first character is the out-of-band communication channel identifying the nullity of the result. A more straightforward option is to '''SELECT''' the '''NOTNULL''' of the string columns whose values could be null. ---- [jhh] proposes a possible solution in [TIP] 185 [http://tip.tcl.tk/185]. Basically, '''{null}!''' is recognized by the parser as a null, which is ''not'' a string; it is distinct from all possible strings. '''"{null}!"''' is, of course, a seven-character-long string, and it's also a one-element list whose sole element is a null. I ([AMG]) have several strong comments regarding the TIP: * I prefer to say "null" instead of "null string" because I feel that a null is not a string at all. It's the one thing that isn't a string! I guess we'll need to change our motto. :^) * Likewise, I'd rather not tack the null management functionality onto the [[[string]]] command. * I think I'd prefer a '''[[null]]''' command for generating nulls and testing for nullity. It's best not to use the '''==''' and '''!=''' [expr] operators for this purpose; null isn't equal to anything, not even null. * We can ditch the '''{null}!''' syntax in favor of using the '''[[null]]''' command to generate nulls, but then '''[[null]]''' cannot be implemented in pure script. This might be an important concern for [safe interps]. * Automatic compatibility with "null-dumb" commands is a mistake; it's the responsibility of the script to perform this interfacing. * When passed a null, the '''Tcl_GetType()''' and '''Tcl_GetTypeFromObj()''' functions should return '''TCL_ERROR''' or '''NULL''' (in the case of '''Tcl_GetString()''' and '''Tcl_GetStringFromObj()'''). * Most commands should be "null-dumb". Only make a command handle nulls when it is clear how they should be interpreted. * The non-object Tcl commands can probably represent nulls as null pointers ('''(void*)0''' or '''NULL'''). If for some reason that can't work, reserve a special address for nulls by creating a global variable. Feel free to argue. :^) ---- [AMG]: Here's a silly and inefficient proc to help me play around with the ideas presented above: proc foobar {varname {value {null}!}} { upvar 1 $varname var if {![null $value]} { set var $value } return $var } This proc should behave the same as [[[set]]]. You will notice that I used '''{null}!''' even though in my above comments I suggested removing it in favor of always using '''[[null]]''' to obtain nulls. But it turns out that's not feasible in the above code; it would only result in '''$value''' defaulting to the string '''"[[null]]"'''. To get the desired behavior, I'd have to write '''[[[list] varname [[list value [[null]]]]]]''', which is far from readable. (With [Tcl 9.0 Wishlist] #67, it becomes '''(varname (value [[null]]))''', which I can live with.) That's one black mark against my idea... A more worrying problem is that '''[[foobar]]''' can't be used to set a variable to null! Why? Because the domain of '''$value''' ''includes'' all strings ''and'' null, there is (once again) no possible value outside the domain that can be used to indicate that a special condition occurred and cannot be "forged" by the caller. So what are nulls good for again? I'm up to two black marks now. It's not looking good. It seems nulls aren't as useful as originally hoped. (Notice the use of the passive voice.) But are they still good for something? The reason '''[[foobar]]''' doesn't work in the above case is that it is being driven by the script, and the script is capable of producing nulls. If its input instead came from a file or socket, it would be just fine because reading from a channel will never result in a null. Of course, at this point I'm reminded of [taint]ing, which might be a better solution. ---- [wdb] When switching from Lisp to Tcl, the lack of some special value such as ''NULL'' was one of the drawbacks with which I decided that I can live. It is the price of the simplicity I am willing to pay. There are more than one cases where something similar is resolved by some trade-off: * In the [switch] statement, the word [default] impacts the '''value''' "default". * In [proc]'s arg list, the word [args] impacts the choice of argument names. * In [Snit] and [Itcl], the argument #auto or %AUTO% impacts the choice of instance name. * And so on. Extending the value range of type string leads to the consequence of leaving the principle [eias]. It is possible, and sometimes even desirable, to extend it. If so, ask yourself, if Tcl is your right choice anymore. If you ask me: I prefer the ''state as is''. The drawbacks are known, and as mentioned above, I can live with them. [AMG]: [switch] can select on the value "default" if "default" is not the last option given. [proc] can accept an argument named "args" if it's not the last one in the list (although see [Tcl 9.0 Wishlist] #77). I'm just pointing out that these "keywords" only have special meaning when in combination with some other out-of-band data, which in these cases is list position. One more example is the use of '''-''' to signify an option. To disambiguate, we have '''--''' to partition the argument list into options and non-options (see ['--' in Tcl]). Yes, it's totally true we can live without nulls. The real problem comes when interfacing with systems that ''do'' have nulls. Tcl has no easy and safe way to represent them. Reserving a string will work most of the time, but the Tcl script becomes confused when the reserved string collides with valid data. This may happen by accident or as part of a malicious attack, which means even nonsense strings like "ßÿÑâRI'" aren't safe. All the other stuff I said about nulls is just cute, sugary things we can do with them if they were added. ---- [wdb] (again) but if really necessary, it is possible to introduce typed data to tcl. Just put them in a list the first of which contains the type, and the second the data as follows: set typed_value1 {allowed {hello world}} set typed_value2 {disallowed {bye bye}} This example shows the use of two data types ''allowed'' and ''disallowed''. It allows easily to construct a null value by choice of type ''disallowed''. [AMG]: This is like [jhh]'s method of prepending a special character to all non-null SQL results, except of course it's cleaner. [NEM]: Tagged data is also how [functional programming] languages like [ML] and [Haskell] handle optional data/NULLs: # data Maybe a = Just a | Nothing proc Just data { return [list Just $data] } proc Nothing {} { return [list Nothing] } set val1 [Just "some data including Just and Nothing"] set val2 [Nothing] Then you can test for missing data (NULL/Nothing) using a switch: switch -exact [lindex $data 0] { Just { do stuff with [lindex $data 1] } Nothing { handling missing data } } Alternatively, in many cases you can use the (non-)existence of a variable or dictionary/array element to test for nullity. e.g. in a database-like interface: $db query $query row { if {![info exists row(name)]} { # name is NULL } } [Lars H]: In the original discussion of TIP#185, the following methods were proposed for interfacing Tcl with systems that have NULL values: 1. If the external function returns a value or NULL, then have the corresponding Tcl command return a list of one element for non-NULL values or an empty list for a NULL value. In Tcl 8.5, [{*}] greatly simplifies using such commands. 2. If the external function returns a "record" where some of the entries may be NULLs, then have the corresponding Tcl command return a dictionary which only has entries for the fields with non-NULL values. Type-tagging values using lists as shown above may also be necessary when interacting with other systems, as some indeed take different actions for data of different types (even if the values are the same). [tcom] apparently has some troubles in this area, as it does not provide for specifying the type of data to pass on. In [TclAE], the types are instead explicitly specified. What NULL proponents should take note of is that Tcl values, as a consequence of the [dodekalogue], constitute a monoid [http://en.wikipedia.org/wiki/Monoid] with the empty string as identity element and string concatenation ([cconcat], for those who require a command name) as operation. The [Everything is a string] principle says that the monoid of Tcl values is in fact a free monoid (currently the free monoid of words in the alphabet of all BMP [Unicode] code-points), and I think it is an '''extremely good''' principle, but the dodekalogue does not explicitly proclaim it. Hence one could imagine a Tcl where there in addition to the strings exists a NULL value, but then it would have to be sorted out how this NULL should act under concatenation. What is passed on to A in the following commands? A [null][null] A [null]somestring A somestring[null] Another problem with introducing special values like NULL is that there's no reason to believe that ''one'' special value is always going to be sufficient: once in widespread use, someone will come up with a situation where NULLs should be handled as an ordinary value, but at the same time needs a SUPERNULL that isn't! On the whole, it is much simpler to avoid introducing any special values. [AMG]: Existence checking can work. Maybe [sqlite] eval's two- and three-argument forms can unset the variable or array element to signify that its value is null. The script already knows all the variable names, so it shouldn't need to be explicitly told what's null. But on the other hand, maybe the array, dictionary, or whatever can be accompanied by a list of all fields whose values turned up null. Encoding the data as a list seems clever. If [[[llength]]] is zero, the data is null. If [[[llength]]] is one, the data is stored in [[[lindex] 0]]. Use [{*}] to get at it most easily. I imagine it's possible to recursively apply this encoding to dictionaries and nested lists. Regarding the combination of nulls and non-nulls, [jhh]'s TIP suggested that concatenating a null with a string resulted in a null. "Nulls propagate. A null combined with any nonnull is null. Appending a null to a string, or substituting a null into a string nulls the entire string." By this rule, '''A''' will receive null in all three cases. My [[foobar]] example shows a case where SUPERNULL would at first glance appear to help, but of course it's a ridiculous thing to ask for, especially since it would still not allow setting a variable to SUPERNULL. What's asked for is a value outside the input domain, but no such value can exist because a variable can be set to anything (string or otherwise). Therefore the only solution is the out-of-band channel, as in [[[llength] $[args]]] indicating how many arguments were passed. With some sugar it might be possible to add a command to check if an argument was explicitly passed or if it was left at its default; this seems like a halfway point because [[[llength] $[args]]] is being used internally but to the programmer it's no different than checking for null. I don't propose such a thing; I'm just giving examples. Lars, as you say, null would only be useful for this purpose so long as the command is intended only to interface with stuff that cannot generate its own nulls. But this is a funky reason to advocate null--- the original impetus was the desire to interface with stuff that ''does'' generate nulls. ---- [jhh]: Let me try to clarify what I was proposing in TIP 185 [http://tip.tcl.tk/185]. (I see more comments have come in since I looked 2007-01-06 14:42GMT, so this is not up to date on the discussion.) Forgive me if I screw up this funky wiki markup language. I agree, every thing is, and should, be a string, but I regret Tcl has no way of representing a null string, unlike most other languages, even Java and lowly Visual Basic. There are really two proposals here, and I probably confused people them by lumping them together in one TIP. The reason I did was that the two together are synergistic and compelling (at least to me). They are: 1. Extend the meaning of a string to include a null string. I will call this TIP 185a. 1. Extend the meaning of list and dicts to allow the representation of unknown elements. I will call this TIP 185b. TIP 185a might be helpful on its own. For example, SQLite's loop construct (e.g., "'''db eval {select * from accts} {} {''' ...'''}'''") can return null information without using a contrived query statement and similarly contrived decoder code, thus opening the door to the creation of ''general purpose'' packages to integrate databases. Either proposal could exist alone, but together they allow Tcl, the preeminent system glue language, to transparently manipulate system communications without kludging them as they enter and leave. A lot of my coding time is spent on this silly matter; if the system involved two or more database engines, as is common, the time is thus multiplied. And ''every'' programmer is doing the same thing, over and over. Aside from that, those of us who think Tcl is the best medium for exploring ideas and algorithms would be grateful at having this gap filled, and those unfamiliar with nulls would soon find nulls quite useful in their own right. TIP 185 was not well received, perhaps partly due to my poor presentation of the idea -- I would do it differently now, having seen the response. My recent thinking is to wait for Tcl 9, or perhaps submit TIP 185a separately. I think the most difficult issues are implementation and performance, and the handling of legacy, null-dumb commands: should they see an empty string and proceed, or should they fail? Currently I am leaning toward the latter, more conservative direction. Most voices against the idea have argued from misunderstanding, or have been vague, so I am not yet convinced the idea has no merit. Little else has been presented that would really help the matter, aside from endless accounts of the workarounds that we all have to invent in the absence of true null handling. These are invariably presented as reasons for why we don't need the feature. The sheer number and variety seems to argue just the opposite. I think if we can recognize the lack of null handling as a real shortcoming and get people working on the problem, we can come up with a fine solution. Imagine if, back at Tcl 6, people had said "speed isn't an issue, just use C for that, then wrap it in Tcl," and did not pursue the performance issue. At the recent conference in Naperville Illinois, there ''were'' two papers on relational algebra packages that might be helpful, though neither package currently supports nulls. I approached Andrew Mangogna, author of ''TclRAL: A Relational Algebra for Tcl'' [http://tclral.sourceforge.net/], about adding the feature, and he explained that he is a follower of C.J.Date, who opposed the use of nulls. I explained my pragmatic argument: perhaps the largest use for Tcl is interfacing with databases, nearly all of which, including SQLite, routinely represent null data. The relational algebra packages presented might provide an interface that could help compensate for Tcl's weakness in this area. He was not interested. Theoretical orthodoxy seemed to be important than practical programming. Later I talked to Jean-Claude Wippler [jcw], author of ''Vlerq + Ratcl = Easy Data Management'' [http://www.t-ide.com/tcl2005e/4_ratcl.pdf.gz], and a very practical programmer, and he found the pragmatic argument more compelling, and promised to look into it. I hope he was in earnest. While such a package is not a full answer to the problem, could provide * A common interface for relational databases, so database APIs could be more uniform. * Providing null handling -- a sort of "standard workaround." * A more complete implementation of relational algebra than can be offered by most database engines, optimized for large databases and high query volumes. A few miscellaneous points : '''1.''' If I revise TIP 185, I will change the nomenclature to "unknown" ('''{unk}!''') instead of "null," because the word is more accurate, and because the English meaning of "null" as "nothing," instead of "unknown," was such a stumbling block for so many people. I originally chose "null" because it is used in SQL. C.J.Date (1995, ''An Introduction to Database Systems,'' 6th ed) uses "UNK" for unknown values in relational algebra, but it wasn't adopted in SQL. I have used the "null" nomenclature here, for consistency with the present discussion. '''2.''' A null concatenated with a string is a null. Again, the "null" nomenclature is confusing. One envisions something like "The quick \000 fox", but that thing in the middle might be "brown" or "" or the full contents of the 1957 ''Encyclopedia Britannica'', we just don't know, thus making the ''whole string'' unknown, or null. This example seems to suggest the programmer might have been better off modeling the phrase as a list of words: set phrase [list The quick {null}! fox] or set color [null] set phrase [list The quick $color fox] Note that if he then '''join'''s the list, it will collapse into a null, like a black hole. '''3.''' A '''[null]''' command can generate a null (TIP 185a), but can not replace the '''{null}!''' syntax. Its purpose is to implement TIP 185b. Sticking '''[null]''' in the middle of a list serialization string turns it into a null (by point 2, above), instead of a serialization of a list with a null element. '''4.''' Three valued logic is widely used and well standardized, particularly in relational databases. Yes, a null is not equal or unequal to anything, including null: the result is never true or false, it is ''null''. However, '''null && false''' is false, etc. There is a whole set of tautologies, I think I included most of them in the TIP write-up. These will propagate sensibly through expressions, just as IEEE floating point NANs do. I mentioned above that C.J.Date argues against the use of nulls. This is ''not'' because of problems with three valued logic, but because of unavoidable paradoxical result sets in formal relational algebra. I reiterate my pragmatic argument to Andrew Mangogna, above. '''5.''' One of the most common misunderstandings in previous discussions is that nulls can not be serialized. This is an example of the synergism of TIP 185a and b: 185b enables this -- merely encapsulate a transmission in a list. Now you can send ''any'' data structure representable by a string or list, containing nulls in whole or embedded in any part. It can be handled on the other end by general purpose code, without any need for a special protocol or understanding of the transmitted data structure. This data format might well be appreciated, like Tk, by a broader audience than just the Tcl community. We would be leading, instead of lagging behind, in the area of data management. Sorry for the long exposition. Hope some of you look it over and find it useful. [NEM]: From the discussion above, it appears that your main point ''for'' including a special null non-value into Tcl is that NULLs are widespread in the world of SQL databases (your pragmatic argument). Can you describe how the alternatives outlined by others, which do not require such a radical alteration of Tcl's [EIAS] philosophy, fail to address this issue? In particular, I see at least three separate means of encoding the concept of missing data into a run-of-the-mill string: 1. Using a list encoding: ''llength $data == 0'' for NULL, otherwise ''lindex $data 0'' is the real data. 2. Using a tagged list encoding: Nothing vs {Just $data} (a variant of the above). 3. Using a dict or array encoding where missing data is represented by a missing key, so that ''dict exists'' can be used to check for null/missing data. To me, these are all much nicer options than introducing a special non-string non-value. ---- [slebetman] I'd just like to point out that Tcl is not the only language that doesn't have a NULL data. Neither C nor C++ has NULL data. NULL pointers yes, nul byte yes, but not NULL data. This is not just pedantic but fundamental in C. C is like Tcl (or should it be that Tcl is modeled like C?): "everything is a number". C, freestanding in itself, doesn't recognise the nul terminator in strings as anything other than ordinary data. It is the standard library that cares, not C itself (though it is part of the C standard). Even for a NULL pointer which C does fuss about, it is only special when used as a pointer. The NULL pointer if used as data will be treated by C as a regular integer. In C NULL data is just a convention, nothing more. NULL data is useful for determining the difference between the user entering an empty string and the user not entering anything though. But in this case I don't see why we can't steal the C convention and signify NULL data with \0. I don't know much about SQL, am I missing something? This is all very confusing because it starts out with a misleading C example (A nul in the middle of a "standard" C string is by definition the end of the string hence is not in the middle) then goes on to talk about a completely different concept of NULL data in SQL. Note that when sending data to an external program, Tcl already support C nul. There is absolutely no problem in this department so I have no idea what the C compatibility complaint is about in this discussion: # Saves a file which C can read as containing 3 strings: set f [open test.dat w] puts -nonewline $f "this\0is a\0test" close $f # It's also easy to parse C strings: set f [open test.dat r] set strlist [split [read $f] \0] close $f Indeed for me, Tcl has one mechanism not found in C/C++ (at least not usable in runtime code): the nonexistent variable! For true NULL values I normally simply don't set the variable at all unless needed. Then use [info exists] to check the difference between an empty (possibly binary) string and a nonexistent string. Though you need to be careful to always unset the variable at the end of the block lest you accidentally use the previously set value. There are already commands in Tcl that use this convention. [regexp] is one example which simply doesn't set the match variables if the values don't exist. [AMG]: I wasn't trying to be misleading when I wrote about NULs, but I see that it worked out that way despite my best intentions. Let me reiterate my very first point at the top of this page. I said that it's often useful to have a value outside the acceptable domain to signify an exceptional circumstance. My second example of this is that C strings are arrays of bytes in the range 1 through 255, so 0 (NUL) is available for this purpose, and it is defined to be a terminator (standard C string), but some systems (find/xargs/cpio) use it as a separator. (If you think about it, these are really the same use.) SQL was merely another example of this general concept. What concept? The concept is sending control data in the same stream as regular data. This can be implemented in one of three ways: 1. Have a separate, parallel data stream. [[llength $args]], [[info exists]], and the "contrived" SQL queries are examples. 2. Encode with some kind of quoting. The backslash character is an example. Note how the backslash must itself be quoted; null would not have the same problem since it's outside the domain. 3. Signal special data with a value that cannot appear in the normal data stream. '''{null}!''' can be used when the normal data stream can be any string. When the data stream is numbers, any non-numbers will do. There's some overlap between the three. But since I probably haven't succeeded in clearing anything up, I'll try to make this all really simple by saying that it's dumb to use null for this purpose and that I'm sorry I brought it up. Instead let's focus on interfacing with other systems that use nulls. Also consider why other systems use nulls (to represent unknowns); maybe Tcl can benefit from using that same feature internally. Yes, it's true that [C] doesn't really have null either, but it does have the NULL pointer because it reserves one memory address for that purpose. More memory addresses can be reserved simply by creating globals or defining functions. The meaning of the NULL pointer (or any other reserved pointer) is application-defined, and it's often used to signal something exceptional, such as "data not available", "cached string representation invalid", "error", or "SQL query returned null". By the way, I use several distinct terms that are all pronounced the same way. They mean different things. Let me clarify: * null: A non-string, non-value object. As [jhh] says, it is used to indicate an unknown. * NULL: (void*)0, the value assumed by a C pointer when it doesn't point to anything in particular. * NUL: (char)0, an ASCII value used by C to terminate a string. None of this discussion is about interfacing with NUL or NULL, only with null. (Man, that sounds stupid when you say it out loud! I can see why this got confusing.) [slebetman]: Again you're using C as an example and again I must point out that C does not have a feature to signify a non-entity when working with binary data, only when working with text strings. So again I'm asking what's wrong with using \0? This does not fall into either 1. or 2. but fits perfectly for 3. which is what you want. If you're only working with strings, simply return the nul character like C if you find C's behavior acceptable. Like C, if you decide to work with binary data then the null representation is unavailable. Regarding the other meaning of null, that is something similar to C's NULL, where C has the convention of using the NULL pointer as a signal, Tcl have traditionally used the empty string as the same signal. Some people find this unacceptable* since they don't consider the empty string as being outside the domain of valid data. I think this is the real issue. Not the "interfacing" with other systems part because Tcl can do that already since it can handle binary data transparently. * Note: That includes me. I do personally wish that Tcl differentiates between an empty string and '''"null"''' (or "undef" in Perl). But this kind of '''null''' is different from what is described by TIP 185. It's more like the half-way point in the existence of a variable: the variable exists but the value doesn't. [AMG]: Pointers can be used for more than just text strings. An SQL query result can be formatted as an array of pointers to values, for example '''{int* user_id, char* account_name, char* nickname, int* tel_exchange, int* tel_extension}'''. If any of these values is null for a particular row, the pointer can be NULL. A C string consisting only of \0 (NUL, as I call it) is not a null; it's an empty string. Since I've been a really poor communicator, I think I must say again that I too think the real issue is that using the empty string (or any other string) is not an acceptable method of representing nulls. This is because no matter what string is chosen, that string might possibly show up in valid, non-null data. I then spewed some gobbledegook about how this method works just fine when using limited domains like numbers or arrays of bytes 1 through 255 instead of all strings, but that was just me trying to be complete. Interfacing with other systems still remains an issue because some systems really do try to send nulls in addition to strings*. On the Tcl side, methods 1 and 2 (above) must be used to do this safely, but usually method 3 is used even though the possibility of ambiguity exists. (*any array of binary data is a string, which can include any number of NULs) "The variable exists but the value doesn't." Yes, this pretty much sums up what I'd like to see in a true null implementation. The variable definitely does exist, but its value isn't known. ---- [LV] There's lots of theoretical discussion regarding null vs empty strings above. The best place to look in terms of practical application, is interfacing with true SQL. How do extensions like [oratcl] and others, currently, provide the tcl developer with the ability to distinguish a row/column entry which has an empty string as a value vs having no value set (i.e. the NULL situation)? [AMG]: [sqlite] has a '''nullvalue''' subcommand on database connection objects which is used to set the string representation for nulls. It defaults to the empty string. [http://www.sqlite.org/tclsqlite.html#nullvalue] ---- [[ [Category Language] ]]