[Philip Quaife] 8 May 2004.

In musing over [A Case for Metaprogamming] and [Dictionaries as arrays]
I have had some new thoughts on a concept that advances the unification
of arrays lists and others by way of the '''$''' notation.


'''Introduction'''

We access variables using the ''set'' command. We also use ''$'' as an alias for ''set'' (but with slightly different parsing rules for the variable name).

When accessing parts of a variable's value, we resort to functions specific for the type of data that the variable contains.
  i. e.: [lindex] - for lists, [string range] - for strings.

Each data type has a plethora of associated functions for manipulating a datum.

The following is a conceptual change to the handling of a tclObjType that allows a more uniform acessor method.

The accessor model is usually found in OO languages where all object properties are hidden and can only be read/written through object methods. Usually all objects support the same getter and setter method names for consistency.

In TCL we only expose a "string" as our public property. This allows us to use other representations for optimizations when strings are not the most appropriate format without affecting code that uses our object types. We see this in mathematical expressions for example.

Take the following:
   interp alias {} get {} set
   set x Hello
   set x -> Hello
   get x -> Hello

We see that the semantics of get/set are not disimilar. While in TCL we use the duality of ''set'' for both operations, consider for the moment that we use the ''get/set'' model. We also implement this by way of extending the internal tclObjType handler to have pointers for getter/setter functions.

Now take the following:

   command $var
   command $var(key)
These can be interpreted as:
   command [get var] is the same as [set var]
   command [get var key]

The meaning of:
    $var(key)
currently is: ''get the variable from the hash '''var''' that has the key '''key''' and return it's tclObj datum.''

The new meaning would be: ''call the tclObjType accessor function for the variable '''var''' requesting the part '''key''' ''.

When a varible has a type of ''assocArray'' (new obj type) then the key would refer to an entry in an attached associative array whose value is a tcl ''variable'' structure.

Note there is no inherent meaning of the interpretation of ''key''. Each tclObjType will use ''key'' as appropriate.

Some subsequent examples will clarify this.

Now we see why we cannot use the duality of ''set'' for both read and write. We now have extended the ''get'' proc to have a second argument that we will apply the notation of:
   part

We think of the value within the parentheses as a request for part of the whole. Such as with an array, we request one of the elements not all of them.

By definition when there is no parentheses then we are requesting the whole of the datum.

The ''raison d'etre'' for this change in interpretation of the notation: '''$var(index)''', Is to allow a common syntax regardless of internal representation, as well as extensibility.

Rather than limiting the notation to arrays we are free to apply the notation to any tclObjType.

Note we will also have to change the implimentation of ''set'' to call the setter method of the tylObjType for the variable when given a variable notation of: '''var(key)'''.
''In all likelyhood we would change set to '''set var part value''' notation.''

'''Benefits'''
   1. Unified syntax.
   1. Extensibility.
   1. less obfusacation of the meaning of the statements.


'''Contradictions'''
   1. The main problem in the implementation of this construct is that ''associative arrays'' are a cludge that has never been 
reworked when objifying tcl variables.
   1. A associative array is an abstract quantity, so how do you call setter/getter functions when you don't know the type of data associated with the variable until you have accessed it?
   1. Performance. While a reference to a scalar will degenerate to a call to ''set'' which will be byte coded out of existance, and an array reference will turn into a cached hash lookup, commands such as ''lindex'' and ''string range'' will not be able to be bytecoded when called as accessors through the ''$'' notation (due to run type type determination). 

'''Pause and Think'''

Nothing stated above requires any change in syntax or causes any incompatibility with the current handling of associative arrays. The above is not an attempt to replace list handling functions or string handling functions.

Nor is the above a switch in methodologies to an Object-based notation. In fact the use of ''get/set'' is consistant with the current use of a common function regardless of type.

What it does, is for those people that want it, provide a more concise way of slicing up a variable's data.

For associative arrays, there is no change conceptually to how these retrieve and store data. There is a huge change in the core code that implements arrays and most likely any code that makes reference to associative arrays. One hopes that the advent of the ''dict'' tclObjType for tcl 8.5 and associated changes to arrays will making this change a practical proposition.

'''Changes in the core'''
The main change would be to add two more function pointers to the tclObjType structure. One for setting and one for getting.

The set method would only be called when requesting setting part of the data (as in one element of an associative array).

Likewise for the get function, as an objects string rep and the object data itself are available to the caller.

''Examples''

Now lets extend the '''accessor'' function to other tclObjTypes.

Try A String:
   set astring {Hello!}
   puts $astring(0) -> H
   puts $astring(end) -> !
   puts $astring(1 3) -> ell

Try lists:
   set alist {A {B C D} E} ;# ms pointed out this is not a list yet
   set alist [list A [list B C D] E] ;# this is a list

   puts  $alist(0) -> A
   puts $alist(end) -> E
   puts $alist(1 2} -> C
''[MS] notes that this is not what would happen; until some list op is done to $alist, it has a string tclObjType; so that [[puts $alist(1 2)]] -> " \{" and not "C". Had there been an intervening [[llength $alist]] the story would change.'' ,pwq: ammended above thanks.

Try Dicts:
   set fred [dict create A 1 B 2 C 3 D 4] ;# or what ever syntax is implemented.
   $fred add New [dict create A 100 B 10]
   puts $fred(A) -> 1
   puts $fred(end) -> {}
   puts $fred(1)  -> {}
   puts $fred(New A) -> 100
   puts "I think that [get $fred {New A}] is the same as $... above"

Try Keyed lists as structures, or binary data represented as named quantities (As in ASN.1 notation for example).

'''Pause and Think'''

You do not have to use it.

What the above does, is give the programmer the ability to determine what meaning the ''$'' notation has.

If you want you can always use [[set varname]] to access a variable rather than ''$'' notation.

'''Whats happening behind the variable'''

Take for example accessors for a ''list'' type.
The notation ''$var(end)'' could be interpreted as:
   lindex $var end

While the notation: ''$var(1 .. 3)'',
Could be interpreted as:
      lrange $var 1 3

This could be a call through the scripted TCL interface , or it could be a direct call to the lower level C api function, or it could be implemented inside the accessor function.

Which is the best approach probably depends on the underlying data type. Maybe this model is best reserved for creating new '''Abstract Data Types''' and the core types such as dicts, lists, arrays , do not implement any accessor functions.

Note '''In the above, ''could'' means just that. The meaning of the notation is entirely determined by the ''getter'' function. Any defaults as applied to core types such as ''list'' would be dictated by the ''TCT''. It could even raise an error. The programmer can however override this default interpretation if desired. The above is an example of one such interpretation.'''

'''Var vs $var'''

Currently ''set'' takes ''varname'' rather than tclObj as the reference to the variable to be set. It is not yet determined if the new Tcl commands ''set/get'' should/need to take ''varname''. The principle requirement would be that traces can be determined when a variable is accessed. Idealy, the use of the tclObj directly would be a more orthongonal one.

'''Exposing accessors to the script level'''

The most benefit to having accessors would be realised if the functionality is available as scripted procs.

This would allow programmers to change the meaning of the ''key'' inside the parentheses to create new constructs that match the application processing of the data.

Coupled with namespaces allowing private getter/setter functions allows a controlled and structured  replacement strategy. Ie this does not need to affect the default acessor behaviour much like namespaces allow the overrided on core tcl command procs with safety.


'''But we don't need to do it'''

We also do not need a '''virtual file system''' in TCL. I can do the same thing under Linux. I can mount an ftp server as a directory and any program can access files as though they were local to the machine.

However there are times when the VFS facility is of use, such as in starkits.

Likewise, the ability to implement accessor functions can be of benefit when the programmer requires them.

The key phrase would be:
   Is there any reason to limit functionality.

You only know they are needed when the job requirements call for them and you do not have them.

----
[NEM] This is quite interesting, and partially related to some stuff I have been thinking about recently, with a view to eventually working towards a TIP. See [Feather] and particularly read up on the ''interfaces'' stuff, as this is very related. The getter/setter methods to Tcl_ObjType you propose above would be instead in an interface. Paul came up with a generic ''container'' interface (or something like that), which was similar. The mechanism is more generalised though. One thing which needs to be cleared up in the above is the difference between ''values'' and ''variables''. [[set]] works with variables, and we have the following scheme currently:

 set var "a"  ;# Store the value "a" in the variable "var"
 set var       ;# Retrieve the value stored in the variable "var"
 set foo(a) "bar" ;# Store the value "bar" in the variable with key "a" which is part of the array variable "foo"
 set foo(a)   ;# etc

The point is, that the $a(b) syntax (and the [[set]] equivalent) currently means finding a value which is stored in a variable which is stored inside an array variable. AIUI, your proposed change is to just have normal variables (and drop arrays), so that:

 puts $foo(a)

would instead retrieve the ''value'' held in the variable foo, and then locate a sub-part in that which corresponds to the key "a". The difference is subtle, but involves one less dereference than currently:

'''Current:'''
   * find variable called "foo"
   * get array "value" from that
   * find variable called "a" in array
   * get value from that
'''Proposed:'''
   * find variable called "foo"
   * get container value from that
   * get part "a"

In the new scheme, when we have found what "a" refers to, we just return it, as it is a ''value''. Currently, what "a" corresponds to is a ''variable'' so we have to dereference it again to fetch the actual value (and trigger traces on it etc). This makes reusing the array syntax problematic, as we cannot deal with arrays and container values in the same way.

The above description of arrays is probably not how they actually work (I haven't checked), but hopefully it is conceptually correct. The array "value" is something which is never actually visible at the Tcl level, but is instead manipulated through operations on the variable that contains it (it is opaque, and the variable name is the handle). Tcl treats arrays specially in this regard.

[PWQ]: The main benefit of feather seems to be that it allows multiple representatons of an tclObj that helps prevent shimmering. It's a shame that there is not more documentation on the point of the feather extension. I have compiled it and run through the examples (such as tree.tcl) but don't have any feel for the actual point of it all.

----
[Category Suggestion] | [Category Data Structure]