Version 0 of A Case for Accessor Functions

Updated 2004-05-08 11:56:42

Philip Quaife 8 May 2004.

In musing over A Case for Metaprogramming and Dictionaries as arrays I have had some new thoughts on a concept that advances the unification of arrays lists and others by way of the $ notation.

Introduction We access variables using the set command. We also use $ as an alias for set (but with slightly different parsing rules for the variable name).

When it comes to to acessing parts of a variables value we resort to functions specific for the type of data that the variable contains.

  ie lindex - for lists , string range - for strings.

Each data type has a plethora of associated functions for manipulating a datum.

The following is a conceptual change to the handling of a tclObjType that allows a more uniform acessor method.

The acessor model is usually found in OO languages where all object properties are hidden and can only be read/written through object methods. Usually all objects support the same getter and setter method names for consistence.

In TCL we only expose a "string" as our public property. This allows us to use other representations for optimizations when strings are not the most appropriate format without affecting code that uses our object types. We see this in mathmatical expressions for example.

Take the following:

   interp alias {} get {} set
   set x Hello
   set x -> Hello
   get x -> Hello

We see that the semantics of get/set are not disimilar. While in TCL we use the duality of set for both operations, consider for the moment that we use the get/set model. We also implement this by way of extending the internal tclObjType handler to have pointers for getter/setter functions.

Now take the following:

   command $var
   command $var(key)

These can be interpreted as:

   command [get var] is the same as [set var]
   command [get var key]

The meaning of:

    $var(key)

currently is: get the variable from the hash var that has the key key and return it's tclObj datum.

The new meaning would be: call the tclObjType accessor function for the variable var requesting the part key .

When a varible has a type of assocArray (new obj type) then the key would refer to an entry in an attached associative array whose value is a tcl variable structure.

Note there is no inherent meaning of the interpretation of key. Each tclObjType will use key as appropriate.

Some subsequent examples will clarify this.

Now we see why we cannot use the duality of set for both read and write. We now have extended the get proc to have a second argument that we will apply the notation of:

   part

We think of the value within the parentheses as a request for part of the whole. Such as with an array, we request one of the elements not all of them.

By definition when there is no parentheses then we are requesting the whole of the datum.

The raison d'etre for this change in interpretation of the notation: $var(index), Is to allow a common syntax regardless of internal representation, as well as extensibility.

Rather than limiting the notation to arrays we are free to apply the notation to any tclObjType.

Note we will also have to change the implimentation of set to call the setter method of the tylObjType for the variable when given a variable notation of: var(key). In all likelyhood we would change set to set var part value notation.

Benefits

  1. Unified syntax.
  2. Extensibility.
  3. less obfusacation of the meaning of the statements.

Contradictions

  1. The main problem in the implementation of this construct is that associative arrays are a cludge that has never been

reworked when objifying tcl variables.

  1. A associative array is an abstract quantity, so how do you call setter/getter functions when you don't know the type of data associated with the variable until you have accessed it?
  2. Performance. While a reference to a scalar will degenerate to a call to set which will be byte coded out of existance, and an array reference will turn into a cached hash lookup, commands such as lindex and string range will not be able to be bytecoded when called as accessors through the $ notation (due to run type type determination).

Pause and Think Nothing stated above requires any change in syntax or causes any incompatibility with the current handling of associative arrays. The above is not an attempt to replace list handling functions or string handling functions.

Nor is the above a switch in methodologies to an Object based notation. In fact the use of get/set is consistant with the current use of a common function regardless of type.

What it does, is for those people that want it, provide a more concise way of slicing up a variables data.

For associative arrays, there is no change conceptually to how these retrieve and store data. There is a huge change in the core code that implements arrays and most likely any code that makes reference to associative arrays. One hopes that the advent of the dict tclObjType for tcl 8.5 and associated changes to arrays will making this change a practical proposition.

Changes in the core The main change would be to add two more function pointers to the tclObjType structure. One for setting and one for getting.

The set method would only be called when requesting setting part of the data (as in one element of an associative array).

Likewise for the get function, as an objects string rep and the object data itself are available to the caller.

Examples

Now lets extend the 'accessor function to other tclObjTypes.

Try A String:

   set astring {Hello!}
   puts $astring(0) -> H
   puts $astring(end) -> !
   puts $astring(1 3) -> ell

Try lists:

   set alist {A {B C D} E}
   puts  $alist(0) -> A
   puts $alist(end) -> E
   puts $alist(1 2} -> C

Try Dicts:

   set fred [dict create A 1 B 2 C 3 D 4] ;# or what ever syntax is implemented.
   $fred add New [dict create A 100 B 10]
   puts $fred(A) -> 1
   puts $fred(end) -> {}
   puts $fred(1)  -> {}
   puts $fred(New A) -> 100
   puts "I think that [get $fred {New A}] is the same as $... above"

Try Keyed lists as structures, or binary data represented as named quantities (As in ASN.1 notation for example).

Pause and Think You do not have to use it.

What the above does, is give the programmer the ability to determine what meaning the $ notation has.

If you want you can always use [set varname] to access a variable rather than $ notation.

Whats happening behind the variable Take for example accessors for a list type. The notation $var(end) could be interpreted as:

   lindex $var end

While the notation: $var(1 .. 3), Could be interpreted as:

      lrange $var 1 3

This could be a call through the scripted TCL interface , or it could be a direct call to the lower level C api function, or it could be implemented inside the accessor function.

Which is the best approach probably depends on the underlying data type. Maybe this model is best reserved for creating new Abstract Data Types and the core types such as dicts, lists, arrays , do not implement any accessor functions.

Note In the above, could means just that. The meaning of the notation is entirely determined by the getter function. Any defaults as applied to core types such as list would be dictated by the TCT. It could even raise an error. The programmer can however override this default interpretation if desired. The above is an example of one such interpretation.

Var vs $var Currently set takes varname rather than tclObj as the reference to the variable to be set. It is not yet determined if the new tcl commands set/get should/need to take varname. The principle requirement would be that traces can be determined when a variable is accessed. Idealy, the use of the tclObj directly would be a more orthongonal one.

Exposing accessors to the script level The most benefit to having accessors would be realised if the functionality is available as scripted procs.

This would allow programmers to change the meaning of the key inside the parentheses to create new constructs that match the application processing of the data.

Coupled with namespaces allowing private getter/setter functions allows a controlled and structured replacement strategy. Ie this does not need to affect the default acessor behaviour much like namespaces allow the overrided on core tcl command procs with safety.

But we don't need to do it

We also do not need a virtual file system in TCL. I can do the same thing under Linux. I can mount an ftp server as a directory and any program can access files as though they were local to the machine.

However there are times when the VFS facility is of use, such as in starkits.

Likewise, the ability to implement accessor functions can be of benefit when the programmer requires them.

The key phrase would be:

   Is there any reason to limit functionality.

You only know they are needed when the job requirements call for them and you do not have them.