Version 4 of A Case for Accessor Functions

Updated 2004-05-08 15:28:02

Philip Quaife 8 May 2004.

In musing over A Case for Metaprogamming and Dictionaries as arrays I have had some new thoughts on a concept that advances the unification of arrays lists and others by way of the $ notation.

Introduction We access variables using the set command. We also use $ as an alias for set (but with slightly different parsing rules for the variable name).

When it comes to to acessing parts of a variables value we resort to functions specific for the type of data that the variable contains.

  ie lindex - for lists , string range - for strings.

Each data type has a plethora of associated functions for manipulating a datum.

The following is a conceptual change to the handling of a tclObjType that allows a more uniform acessor method.

The acessor model is usually found in OO languages where all object properties are hidden and can only be read/written through object methods. Usually all objects support the same getter and setter method names for consistence.

In TCL we only expose a "string" as our public property. This allows us to use other representations for optimizations when strings are not the most appropriate format without affecting code that uses our object types. We see this in mathmatical expressions for example.

Take the following:

   interp alias {} get {} set
   set x Hello
   set x -> Hello
   get x -> Hello

We see that the semantics of get/set are not disimilar. While in TCL we use the duality of set for both operations, consider for the moment that we use the get/set model. We also implement this by way of extending the internal tclObjType handler to have pointers for getter/setter functions.

Now take the following:

   command $var
   command $var(key)

These can be interpreted as:

   command [get var] is the same as [set var]
   command [get var key]

The meaning of:

    $var(key)

currently is: get the variable from the hash var that has the key key and return it's tclObj datum.

The new meaning would be: call the tclObjType accessor function for the variable var requesting the part key .

When a varible has a type of assocArray (new obj type) then the key would refer to an entry in an attached associative array whose value is a tcl variable structure.

Note there is no inherent meaning of the interpretation of key. Each tclObjType will use key as appropriate.

Some subsequent examples will clarify this.

Now we see why we cannot use the duality of set for both read and write. We now have extended the get proc to have a second argument that we will apply the notation of:

   part

We think of the value within the parentheses as a request for part of the whole. Such as with an array, we request one of the elements not all of them.

By definition when there is no parentheses then we are requesting the whole of the datum.

The raison d'etre for this change in interpretation of the notation: $var(index), Is to allow a common syntax regardless of internal representation, as well as extensibility.

Rather than limiting the notation to arrays we are free to apply the notation to any tclObjType.

Note we will also have to change the implimentation of set to call the setter method of the tylObjType for the variable when given a variable notation of: var(key). In all likelyhood we would change set to set var part value notation.

Benefits

  1. Unified syntax.
  2. Extensibility.
  3. less obfusacation of the meaning of the statements.

Contradictions

  1. The main problem in the implementation of this construct is that associative arrays are a cludge that has never been

reworked when objifying tcl variables.

  1. A associative array is an abstract quantity, so how do you call setter/getter functions when you don't know the type of data associated with the variable until you have accessed it?
  2. Performance. While a reference to a scalar will degenerate to a call to set which will be byte coded out of existance, and an array reference will turn into a cached hash lookup, commands such as lindex and string range will not be able to be bytecoded when called as accessors through the $ notation (due to run type type determination).

Pause and Think Nothing stated above requires any change in syntax or causes any incompatibility with the current handling of associative arrays. The above is not an attempt to replace list handling functions or string handling functions.

Nor is the above a switch in methodologies to an Object based notation. In fact the use of get/set is consistant with the current use of a common function regardless of type.

What it does, is for those people that want it, provide a more concise way of slicing up a variables data.

For associative arrays, there is no change conceptually to how these retrieve and store data. There is a huge change in the core code that implements arrays and most likely any code that makes reference to associative arrays. One hopes that the advent of the dict tclObjType for tcl 8.5 and associated changes to arrays will making this change a practical proposition.

Changes in the core The main change would be to add two more function pointers to the tclObjType structure. One for setting and one for getting.

The set method would only be called when requesting setting part of the data (as in one element of an associative array).

Likewise for the get function, as an objects string rep and the object data itself are available to the caller.

Examples

Now lets extend the 'accessor function to other tclObjTypes.

Try A String:

   set astring {Hello!}
   puts $astring(0) -> H
   puts $astring(end) -> !
   puts $astring(1 3) -> ell

Try lists:

   set alist {A {B C D} E}
   puts  $alist(0) -> A
   puts $alist(end) -> E
   puts $alist(1 2} -> C

MS notes that this is not what would happen; until some list op is done to $alist, it has a string tclObjType; so that [puts $alist(1 2)] -> " \{" and not "C". Had there been an intervening [llength $alist] the story would change.

Try Dicts:

   set fred [dict create A 1 B 2 C 3 D 4] ;# or what ever syntax is implemented.
   $fred add New [dict create A 100 B 10]
   puts $fred(A) -> 1
   puts $fred(end) -> {}
   puts $fred(1)  -> {}
   puts $fred(New A) -> 100
   puts "I think that [get $fred {New A}] is the same as $... above"

Try Keyed lists as structures, or binary data represented as named quantities (As in ASN.1 notation for example).

Pause and Think You do not have to use it.

What the above does, is give the programmer the ability to determine what meaning the $ notation has.

If you want you can always use [set varname] to access a variable rather than $ notation.

Whats happening behind the variable Take for example accessors for a list type. The notation $var(end) could be interpreted as:

   lindex $var end

While the notation: $var(1 .. 3), Could be interpreted as:

      lrange $var 1 3

This could be a call through the scripted TCL interface , or it could be a direct call to the lower level C api function, or it could be implemented inside the accessor function.

Which is the best approach probably depends on the underlying data type. Maybe this model is best reserved for creating new Abstract Data Types and the core types such as dicts, lists, arrays , do not implement any accessor functions.

Note In the above, could means just that. The meaning of the notation is entirely determined by the getter function. Any defaults as applied to core types such as list would be dictated by the TCT. It could even raise an error. The programmer can however override this default interpretation if desired. The above is an example of one such interpretation.

Var vs $var Currently set takes varname rather than tclObj as the reference to the variable to be set. It is not yet determined if the new tcl commands set/get should/need to take varname. The principle requirement would be that traces can be determined when a variable is accessed. Idealy, the use of the tclObj directly would be a more orthongonal one.

Exposing accessors to the script level The most benefit to having accessors would be realised if the functionality is available as scripted procs.

This would allow programmers to change the meaning of the key inside the parentheses to create new constructs that match the application processing of the data.

Coupled with namespaces allowing private getter/setter functions allows a controlled and structured replacement strategy. Ie this does not need to affect the default acessor behaviour much like namespaces allow the overrided on core tcl command procs with safety.

But we don't need to do it

We also do not need a virtual file system in TCL. I can do the same thing under Linux. I can mount an ftp server as a directory and any program can access files as though they were local to the machine.

However there are times when the VFS facility is of use, such as in starkits.

Likewise, the ability to implement accessor functions can be of benefit when the programmer requires them.

The key phrase would be:

   Is there any reason to limit functionality.

You only know they are needed when the job requirements call for them and you do not have them.


NEM This is quite interesting, and partially related to some stuff I have been thinking about recently, with a view to eventually working towards a TIP. See Feather and particularly read up on the interfaces stuff, as this is very related. The getter/setter methods to Tcl_ObjType you propose above would be instead in an interface. Paul came up with a generic container interface (or something like that), which was similar. The mechanism is more generalised though. One thing which needs to be cleared up in the above is the difference between values and variables. [set] works with variables, and we have the following scheme currently:

 set var "a"  ;# Store the value "a" in the variable "var"
 set var       ;# Retrieve the value stored in the variable "var"
 set foo(a) "bar" ;# Store the value "bar" in the variable with key "a" which is part of the array variable "foo"
 set foo(a)   ;# etc

The point is, that the $a(b) syntax (and the [set] equivalent) currently means finding a value which is stored in a variable which is stored inside an array variable. AIUI, your proposed change is to just have normal variables (and drop arrays), so that:

 puts $foo(a)

would instead retrieve the value held in the variable foo, and then locate a sub-part in that which corresponds to the key "a". The difference is subtle, but involves one less dereference than currently:

Current:

  • find variable called "foo"
  • get array "value" from that
  • find variable called "a" in array
  • get value from that

Proposed:

  • find variable called "foo"
  • get container value from that
  • get part "a"

In the new scheme, when we have found what "a" refers to, we just return it, as it is a value. Currently, what "a" corresponds to is a variable so we have to dereference it again to fetch the actual value (and trigger traces on it etc). This makes reusing the array syntax problematic, as we cannot deal with arrays and container values in the same way.

The above description of arrays is probably not how they actually work (I haven't checked), but hopefully it is conceptually correct. The array "value" is something which is never actually visible at the Tcl level, but is instead manipulated through operations on the variable that contains it (it is opaque, and the variable name is the handle). Tcl treats arrays specially in this regard.


Category Suggestion | Category Data Structure