Dictionaries as Arrays

PWQ 2004-01-29:

With a casual look through the code, there seems to be at first glance only a small amount of effort to give access to dict types using the `$() array notation.

What some people desire is the following:

set fred [dict create A 1 B 2]
puts "B = $fred(B)"
puts "fred is a $fred"

While there are pro's and cons to both sides for and against. It could only be a good thing if variables, lists, arrays, and now dicts interoperate as much as possible to limit the number of ways data is accessed.

One option would be to deprecate the use of $var which would be one way to reduce the number of different data access methods, by enforcing the use of type get notation for all.

Assuming the above is not a contender, then another simple change would be to modify Tcl_GetVar2() and cousins.

There could be another way that avoids any changes to the core, see Dictionaries as Arrays by Stubs.

At the moment, if a reference in the form of $name(element) finds that name is a scalar, then an error is thrown. It would be simple at this point to add a test to see if the scalar is a type DICTIONARY. The code can then fall through to Tcl_DictGet() (or similar).

I suspect that in the bytecode$|$byte compiler it is not that simple; my comment would be Too freaking bad. Any complications caused by the BC should NOT be used as arguments against evolution of the language. (MS agrees fully, but cannot recall any instance of that argument being used, and does not think the TCT would even consider it.)

If we were to accept the above is trivial and feasible, and apply proactive design methodologies (rather than the normal reactive ones the TCT uses (DKF: Most of the TCT are not paid to do core work, so they are constrained in what they can do.)), we would at this point replace the test for DICTIONARY with instead an installable subsystem that could implement more than just dictionaries, and only add a few extra commands to the tcl core in the process.

Before embarking on a discussion of the above, one should first determine the frame of reference for what the considerations should be. For example:

  • Does the change enhance the language.
  • Is there a use for implementing the change.
  • What negative impacts on the language does the change create.
  • Could the change be incorporated into further core parts and thus simplify other aspects of data access.

My evaluation of the above, shows that the pro's significantly outweigh the cons.

I could at this point extend the changes to include such things as array shadowing, persistent stores, but I am sure the above is contentious enough to ensure a resounding NO.

jcw 204-01-30: Philip, my understanding of OSS in general and Tcl in particular, is that it is usually not possible to get consensus up front, i.e. with proposals which can be subject to interpretation and mis-understanding. The way to get things done IMO is to build it, or work with colleagues who want to do it for/with you, then share the results and respond to Q's and adjust things where you feel the points made are valid, and then stand up and make your case on why/how it should go into the core. This is how stubs and VFS went in (not that long ago, btw). I'm afraid you won't find many people to scratch your itch by filing a generic complaint - then again, I may well be wrong... I'm just trying to encourage anyone who wants to take Tcl further.

PWQ 2004-01-31: I don't mean do read like I am complaining, generally when I get things for free I don't complain. This post is one of many 'ramblings' on Tcl that I have that I decided to share. But since you have detected an undertone in my text I will expand on it If I were to complain here and not in this thread.

I have made changes to the core in the past for my own purposes, but you then get trapped in the new release/incompatibility race which I quickly tired of. So now , If I can't implement it in Tcl then it doesn't get implemented. Like the TCT , I too have finite time to contribute to OSS rather than the task I infrequently get paid for.


Setok: For what it's worth, I've argued for this for a while now and definitely agree. It could work both ways: access var as array --> convert it into one, return array as var --> convert it into key-value job. I just think this is natural and a simple way to combine facilities of the language.


Steve Bennett 2010-10-28: Anyone who is interested in this topic should definitely experiment with Jim where lists are dictionaries are arrays. This allows things like:

Welcome to Jim version 0.64, Copyright (c) 2005-8 Salvatore Sanfilippo
. set x {a 1 b 2}
a 1 b 2
. puts $x(b)
2
. dict set x c 3
a 1 b 2 c 3
. incr x(c)
4
. parray x
x(a)  = 1
x(b)  = 2
x(c)  = 4

And even:

. foreach {k v} $x { puts $k=$v }
a=1
b=2
c=4
. set x(d) [dict create A one B two]
A one B two
. dict get $x d A
one
. parray x
x(a)  = 1
x(b)  = 2
x(c)  = 4
x(d)  = A one B two
. parray x(d)
x(d)(A)  = one
x(d)(B)  = two

RS: I like the first style, for reading accesses, much better:

set value $dic($key)           ;# 1) not possible now
set value [dict get $dic $key] ;# 2) planned in 8.5

But in the other direction, we get an asymmetry when setting a value:

set dic($key) $value           ;# 3) creates dic as an array, if it doesn't exist
puts $dic                      ;# 4) error if array, success if dict

MS is not clear as to the expected behaviour of upvar and trace. Currently they are able to operate on individual elements of arrays (which are themselves variables) but not dicts (whose elements are values).

PWQ 2004-01-31 interjects: Nothing should change, an up var is a link to another variable which could be the name of an array variable (for want of a better word). Since TclGetVar2() uses the name, not an object, it will work as expected.

MS continues: Another issue is how the proposed changes apply to nested dicts - would the $ syntax require some expansion?

PWQ: You are right, it does require expansion. The issue is how?. Since we are limited to $name(...) as a syntactical item, any nesting would have to be an encoding of characters enclosed in the parentheses. As an example, using dot as a delimiter:

$dict($A.b.c) :- could imply [dict get [dict get [dict get $dict $A]  b] c] , or simply [dict get $dict $A b c]

The use of dot is suggested because $name'' breaks at punctuation.

MS: Semantically, dicts are much closer to lists than to arrays. A TIP to extend the $ notation to lists was rejected some time ago (just info, not arguing for or against anything).

PWQ: IF I was to ask for anything, it would just be that the meaning of $ could be redefined at the script level, not that any changes be done to the core (AKA Metaprogramming).


MS also notes that this proposal poses some deeper difficulties. Consider

set A "0 1 2 3"
set B [list 0 1 2 3]
set C [dict create 0 1 2 3]

Now: $A, $B and $C are equal, as they are represented by the same string. The question is: should $ interpret the first two also as dicts, or consider them as different types?

If $B is interpreted as a dict, we would get things like

set B [list 0 1 2 3]

resulting in $B(0) (value 1) being different from lindex $B 0` (value 0), which may or may not be what the user is expecting.

Another problem: strings that can be converted to a list with an even number of elements can be interpreted as dicts, if the number is odd they can't. What should Tcl then do with

set str1 "hi call me miguel"
set str2 "hi please call me miguel"
puts $str1(hi)
puts $str2(hi)

OTOH, introducing types in Tcl semantics (violating 'everything is a string') is most probably a no-no.

CN: actually, your C doesn't have a well-defined string representation until you use it, so it will only equal $A and $B half of the time; I actually find this amusing: just checking for equality can presumably effect the execution order of a later foreach x $C {...}! (And now, approx. three minutes later, I'm no longer so sure that I want to keep up this claim about the changing execution order: presumably this could only happen if there were two dict functions that had different preferences about the access order... and presumably this condition is currently void. Oops.)

DKF: I think there'd need to be a command to declare that a variable can be used as an array and what interpretation to put on the keys (dict keys, list indices, etc.) Having a command to set something clever in Tcl's guts wouldn't violate anyone's ground semantics...

jcw: Would it make sense to tie a var to an arbitrary tcl command? So $var(a) gets executed as cmd read var a and so on for names, size, etc?

PWQ: I would be happy with this if it can be done at the script level. This almost can be done with a var trace, but the semantics of trace dictate the the value be communicated back through the original reference, not by return.


FW: I very strongly support dicts being independent of array syntax, personally. Tcl's arrays, which are a special syntax to access a certain type of value, don't even resemble dicts in anything but the shallowest ways. Arrays are just variables. They're a method of namespacing variables. Like, very early in Tcl, a method to store associative data like that, say user preferences, would often be in variables, since at the time (and even now) that's the only native value->value mapping system available. You'd have, say, $john_prefs and $mary_prefs. The arrays standardized that method into $prefs(john) and $prefs(mary), but it's still using variable names to store data. That's pretty hackish, but that's beside the point.

PWQ 2004-02-01 interjects: But is that not how you think, in associations?. If the language mirrors they way you think then you are more likely to program the correct solution. Why do other languages have structures?. A Program is all about manipulating data, the easier it is to do that, the easier it is to write quality code.

FW continues: To make array syntax a method of accessing dicts would turn array syntax from merely a slightly special kind of variable, to an access method for a first-class data structure. Which, besides being incompatible, would be hazardous and even hypocritical, because it gives preference to dicts when, say, a special list index access syntax would also be handy. Once we're allowing a special syntax for data structure access, in a language which shuns unnecessary syntax, why don't we create special syntax for accessing list indices as well? Say, $a:1 for lindex $a 1. Or, to get more ridiculous, how about stream access? That sound good? $!stdin looks better than gets stdin, after all. Why invent a special accessor syntax in a language that was based on the idea of minimalist syntax? The whole idea is that coming up with a syntax for every little task is a slippery slope (see: Perl), and we'd be shamelessly advocating that.

PWQ 2004-02-01 interjects: Some really good points. Go back to the start, should there be a $ notation at all then?, your argument would have to be that their shouldn't. What is wrong with having $x:1 as a shorthand for lindex $x 1. What is to stop me from writing a command named ! as a short cut for gets stdin?. My comments were that since we do have a data access method called $ that we should be able if wanted to change its meaning. In the OO camp, this level of control is mandatory. Your case is more about code readability by having a standard than about the feature it self being good/bad.

FW continues: So. Array syntax for accessing dictionaries is a feature that's non-progressive, indiscriminate, incompatible, arbitrary, literally goes against several of the basic tenets Tcl was created on, and more than likely actually evil. Introducing array syntax as an accessor for dictionaries would probably release a plague of locusts upon the land. Shame on anyone who suggests it. Repent for your sins.

PWQ 2004-02-01:

Incompatible with what?, arbitrary compared with what?, There are no tenets in Tcl, everything is a string. Making a small modification to $ notation to reference dicts would be of limited value, That is why I suggested the proactive approach of installable handlers to allow programmers (not you, me, or the TCT) decide what meaning that $xxx(a.b.c) would have in THEIR program. I believe that that desire is in fact the core tenet of Tcl. This in effect turns $ back into set, which then gives control back to the programmer at the script level.

I can redefine everything else in Tcl. From set through to if and proc , I am free to choose the meaning of the token. Everything that is except what $ does.

Another approach that has been suggested by others is to internally change $ to set, this again allows programmers to determine the meaning of set themselves and code in their way of thinking.

FW: (note that up to now, PWQ was responding to my one original message paragraph-by-paragraph, this is my first response - the formatting makes it look like we've been having a back-and-forth argument) You're proposing the language should have options to toggle the syntax of the language. That would, for one, require clauses for the different modes in man Tcl(n). The attitude Tcl was designed was very not to provide a bunch of different syntax options - if you want special syntax, go ahead and do some radical language modification with unknown, but that's not the interpreter's business. Changing $ to call set would be a minor benefit, because you would still have to alter the syntax for your extension to catch non-alphanumeric variables without {}` grouping.

PWQ: There is nothing radical about exposing the functionality of $. Since initially there was no $ , or lists or arrays. Try and see the wood from the trees. Allowing programmer X to attach special meaning to $ is no more significant that allowing traces on variables. An Unknown is of no use to variable references. And unknown has limited appeal since it is a global catch all that is also used for other purposes.

FW: It is more significant than traces, because it requires new syntax. Code has not yet been allowed access to change syntax yet. There are plenty of cases where it'd be somewhat handy, but Tcl (or almost any other language) just doesn't go there.


schlenk: Syntax changes are the one thing getting heavy discussion. See the {*} discussion. BTW. the {*} model could be used for things PWQ wants and some of the ideas like Macro processing with it were discussed, but mostly rejected as unclean and error prone. Tcl is quite verbose and $ syntax is a major convenience shortcut. You can redefine every non-syntax construct in Tcl, but not the really small and compact syntax. If you want all the power, use set for everything and redefine its meaning. I don't think a programming language should be modeled the way you think and talk (Perl took this route AFAIK), as thoughts are highly context-dependent and only really relevant for the one person thinking them. It's a nice thing for one-man fire-and-forget projects, but a nightmare for larger projects. BTW. we have a TIP for this rmmadwim TIP 131, the April 1 TIP.


NEM I'm interested in the idea of redefining '$' syntax to become an actual call to set. Paul Duffin had thoughts on this with Feather. See [L1 ] for some notes which are probably of interest. There are of course some interesting problems that this presents. One immediate point is that '$' and set have different rules for delimiting the variable name (see for instance the difference between puts $foo.bar and puts [set foo.bar]). So, how would this be resolved - would $ syntax use the current rules (so puts $foo.bar becomes "puts [set foo].bar")? Or would it use the current command word delimiter rules (puts [set foo.bar])? This is perhaps a minor point. More importantly, though, is whether anyone could practically redefine the way set (and $ syntax) work. To do this in a script means you have to be extremely careful that you don't break any Tcl extensions you load. This is presumably why very few people ever redefine proc or friends, and those that do usually make sure it behaves in a compatible way (e.g. guarded proc). It would certainly be interesting to have this functionality exposed in Tcl, to see what could be achieved with it.

DKF: FWIW, $ is the same as (one-argument) set in all respects except word-delimiter rules. There's a difference there because it makes $ easier to use in the normal case (e.g. in many Tk scripts.)

Larry Smith: If you have dictionaries with the above semantics then why do we even need arrays? If dicts can provide all the current capabilities of arrays, just get rid of arrays and bolt in dictionaries, with their associated improvements. All this will do right now is make code that currently errors into code that means something. I see no reason to continue to drag arrays around.

RS: Arrays are more powerful than dicts because they are collections of variables (a precursor of namespaces), where traces can be bound to. Also, it's interesting that the $ parser allows only [A-Za-z0-9_] for variable names, but anything except non-escaped whitespace for array keys, until the final paren is closed, e.g.

puts $arr(1,apparent($subindex),+)

On the other hand, they're limited because they're not pure values (can't be referenced by $arr, for example). In many places where we used to use arrays or lists, dicts will become the better choice. So I think, for now all of them have their place in Tcl - until someone comes up with a grand design that is simpler yet more powerful...


PWQ 2004-02-06:

Summary

After one week of discussion the essence of the discussion is as follows:

  1. There was no technical comment on the original post. That is, to change the functions Tcl_Get/Set_Var to allow access to dicts using array syntax (The actual intent of the post).
  2. There is support in general for arrays as FCV. And also to have more powerful array features.
  3. This is limited support for the ability to redefine the meaning of $, or other metaprogramming features.
  4. It is unlikely that the byte coder would complicate the change.
  5. It is not possible to use stubs to achive the goal, as the core does not use the stubs table.
  6. A number of replies indicate that the syntax of dicts is more verbose than desired. And the $ is in fact viewed by most as a short cut for set var.
  7. Emotional rhetoric accounts for approx 50% of discussion.
  8. There was no discussion on the merits of dict items.
  9. There was no discussion on the merits of arrays as FCV.

NEM: Regarding arrays as First-Class Values, I'm not sure what is meant here. As has been pointed out, arrays are collections of variables not values. As such, I'm not sure how they could be first-class values. For instance:

array set foo { a 1 b 2 }
somefunc $foo

What should the above do? Should it produce a dict of the current values of all the variables? Should it pass a first-class reference to the array (something which currently isn't possible)? Should it just pass the array name (and expect an upvar to be used)? What it currently does is error, which seems reasonable to me. dicts on the other hand are first-class values, but don't support $ syntax to access sub-elements. If Tcl was perfect then we could probably jettison arrays now that we have both namespaces and dicts, but we can't do that as arrays are too widely used. This makes it a pain trying to reuse the $() syntax. I can see where this discussion is leading, and after the great {*} debate has finally been settled, I'd hate to see another syntax war. Is there any fundamentally missing feature? Or is this mainly about programmer convenience?

PWQ: I fail to see any supporting argument for including dicts in the core. They can be done in pure Tcl as well as a C extension. If no-one on the TCT has the skill to implement a change that allows arrays to become FCV (as far as any scripting programmer is concerned), then it would be best for you to support the original suggestion so that the $ notation can be applied to dicts and then arrays can be deprecated.

I have no further interest in this page. Having failed to secure any technical information on the proposed changes, there is nothing this page has to offer other than being the basis for a yet another flame war.


DKF: Answering the points listed above.

  1. There are deep differences between arrays and dictionaries that stem from the fundamental difference between a variable and a value. Once we understand these well enough, refactoring arrays so that they can be backed up by dictionaries (or any other kind of container datatype for that matter) is relatively straight-forward.
  2. All of this doesn't matter until point 1 is resolved.
  3. There is no TCT support for redefining the basic meaning or syntax of $.
  4. Maybe, maybe not. Difficult to tell at this stage. This point should be ignored though, since modifying the bytecode engine (in some ways that I don't particularly wish to enumerate here) isn't a vast deal.
  5. Stubs are a red herring. I for one would much prefer a proper API instead. ;^)
  6. Your point is...?
  7. Your point is...?
  8. Your point is...?
  9. Your point is...?