Version 10 of What If: Everything is a Thing

Updated 2005-04-09 08:24:38 by suchenwi

RHS 08April2005

I'll start this out by saying that many of you will read this and immediately think "But, then it's not Tcl!". This is a perfectly valid statement. However, what I'm aiming at here is not to propose changes to Tcl, but to discuss what the benefits and problems would be if... Everything is a Thing .

Moving on to what it is I mean by "Everything is a Thing"... I originally meant to title this writeup "Everything is an Object". However, after some discussions in the chat room, I realized the word Object carries too much baggage with it. What is meant by the title phrase is just that Not everything is a string.

== Issues with Everything is a String ==

Currently in Tcl, it is stated that everything is a string. This leads to a number of complications with the language, both in its use and in its implementation.

Command/Thing Lookup

When creating new things that should be anonymous commands, such as lambdas and objects, there is a need to either pollute the proc space, or to perform various complicated workarounds. For things that require internal data, a handle is generally used that is tied to a proc of the same name. While this is an adequate approach, there are various conceptual reasons for the desire not to create a command name for every such thing. One of these reasons is the possibilty of name clashes.

When trying to avoid polluting the proc space, it is mandatory to have the string representation contain all the information about the thing. This approach, combined with various techniques (such as leading word expansion, etc), leads to true anonymous commands. However, even for places where all the information about a thing is able to be placed in the string rep, the approach fails when that data is mutable.

Arrays are another example of having to work around the string rep limitation. Consider that arrays are a hashmap to a group of anonymous variables. The fact that there is no string representation for variables means that we cannot pass arrays as arguments. Instead, we need to pass in the name of the array and upvar into it.

Automated Resource/Garbage Collection

The second real problem is that of automated resource and/or garbage collection. This problem is only true for things that require internal data and/or are tied to a named command. Tcl's automated resource/garbage collection is based on the idea that a value is available to be cleaned up once there are no variables pointing to it (i.e., its RefCount is 0). For things that are not referred to via variables (such as commands), there is no way to clean up after them, since we never know when the code is done with them. The fact that they are not anonymous means that they can be accessed later, even if there is no variable containing a reference to them.

== The Two Types of Things ==

I propose that there are two types of things. The first type are those such that each one is-a value. The second type are those such that each one has-a value.

Is-a Value Things

Things that satisfy the condition that each one is-a value are those that currently exist as first class things; lists, strings, dicts, integers, etc. Each one is such that the value is the thing. There is nothing about them that isn't naturally stored in the string representation, and each one is immutable.

By immutable, we mean that the thing itself cannot change. Instead, to alter the value we create a new thing that has the new value and point the variable at the new thing. The underlying code can make optimizations to avoid the cost of copying an object but, even then, they need to make a copy when multiple variables point to the same object.

These is-a things would be handled in exactly the same way as they are now.

Has-a Value Things

These things are those that either do not meet the requirement that the natural string representation holds all the information about the object, or that need to be mutable: arrays, lambdas, objects, file handles, etc. It is with these that we will make changes in how they are handled.

For these things, we will state that they cannot be converted to other types and retain their identity. This is similar to the way arrays work now. While we can ask an array for it's list representation, that list is not the array; it does not contain all the information the array needs to be what it is. The difference is that we would be able to refer to the array using code such as:

 # Setup the array
 array set tmpArr {a 1 b 2 c 3}
 # Pass the array into a command, and it is free to modify it
 sortArray $tmpArr

When we ask for a different representation of the thing, what we get is not the original thing. Instead, we some other thing that contains all the information about our original thing that is natural for it to contain, but is not our original. For example:

 array set tmpArr {a 1 b 2 c 3}
 llength $tmpArr

In the above, the llength command asks for the list representation of the array. Once it has this, it calculates the length. However, the thing it calculates on is not the array itself, but a list with the keys and values that the array contains.

== The Results of the Change ==

The implications of the above constraint are fairly far reaching, and allow us to ignore many of the hoops that we previously had to jump through.

Automated Resource/Garbage Collection

Now that we know only the original (underlying) thing can hold the actual thing, we can apply the RefCount method of garbage collection to automate cleaning up after ourselves. As examples:

  • File Channels can be closed, and their resources cleaned up, once they're no longer being used
  • Object and Lambdas can have their resources cleaned up once they no longer have a reference to them. (see Command Resolution below)
  • Arrays can be passed as arguments

Command Resolution

Since we now have things that are of a given type only if they are of that type, we no longer need to answer the question of "how did the caller intend this to be used?" Instead, we can say "What type of thing is this?" This provides us with a great boon for commands resolution. When we have a command, we can follow the following logic:

  1. Is it a thing that can act as a Dispatcher (object or lambda)? If so, invoke it with the args. Otherwise...
  2. Ask the thing for its string representation and call the command whose name is that string.

== Other Notes ==

* Circular references are still an issue, as long as RefCounting is used.


Discussion

DKF: What would be the negative consequences of altering the basic model of values this way? (It is late, and I cannot quite spot them at the moment myself.)


SYStems I am probably speaking too early, but it also late and I didnt wonna forget this opinion

First, we need to find out the name of the topic we are discussing, second, we need to reference and summarize the literature of that topic, I think you are breaking your head re-inventing the wheel, find out this topic name (is it type theory, object orientation, or what exactly) second, buy and read the best books written on it! This is the obvious advice

Another thing, is on the two type of things, thing. Let assume this scenario, Ahmed, is married to Gilane, both are humans (same type) , so lets say you want to call Ahmed's wife, you have two means, you can call Gilane, cause you know Gilane is Ahmeds wife, or you can simply call for Madame Ahmed. Three years from now, Ahmed and Gilane get a divorce, and he marries Mona. Calling for Gilane as Madame Ahmed (or Misses Ahmed, please correct my english), doesn't work. But maybe you are not interested in Gilane, you just wonna call, whoever Ahmed is maried to. Ahmed can be used as a reference to his wife, but Ahmed also have a life of his own, actually even if Ahmed Dies, Gilane can still live.

Plus there is more to Ahmed than just a wife, Ahmed also have a car and a house, all of which can be reference through Ahmed, all of which can live after Ahmed dies (a related reaserch topic might be Object life cycle)

So even if we never talk about Ahmed anymore, we might still wonna talk about his house, only then, we won't refer to it as Ahmed's house, see, when Ahmed and Gilane got a divorce, she got the house, the house is her responsibility now, and we refer to that same house throught Gilane (see relational databases, keys, candidate keys, primary keys, etc...)

So as you see, grade in uni "GPA 3.7", is a value, it refers to "GPA 3.7" and also refers to how brave I was in uni.

Therefore, there is no such thing as have a value, is a value, the only immutable value, is the most elementary one! In real life, I have no clue what that is, in computers, it's I and O, in Tcl, it's recommmended we think of it as the string (don't dig below a string, even if you can), everything thing else change over time as the result of the transactions (see database literature) applied to it.

And A can refer to the ascii code <insert asci code here>

 % scan A %c 
 65

and it can refer to my grade in the database course, a value is in the eye of the beholder!

I wish I made any sense, I know I still need to read a lot, please recommend anything


RS 2004-04-09: I think the is-a and has-a distinction is elsewhere called transparent vs. opaque objects. Transparent objects ("pure values") have no garbage collection issues, as Tcl does that already - and up to a certain size, just modifying a copy is pretty sufficient. However, for a list of 100.000 items, changing it in place with lset is much more efficient. But that requires a variable, which is about the smallest "opaque" object we deal with all the time... See also TOOT.


Category Discussion