[RHS] ''08April2005'' I'll start this out by saying that many of you will read this and immediately think ''"But, then it's not Tcl!"''. This is a perfectly valid statement. However, what I'm aiming at here is not to propose changes to Tcl, but to discuss what the benefits and problems would be if... '' '''Everything is a Thing''' ''. Moving on to what it is I mean by ''"Everything is a Thing"''... I originally meant to title this writeup "Everything is an Object". However, after some discussions in the chat room, I realized the word Object carries too much baggage with it. What is meant by the title phrase is just that Not everything is a string. '''== Issues with Everything is a String ==''' Currently in Tcl, it is stated that everything is a string. This leads to a number of complications with the language, both in its use and in its implementation. '''Command/Thing Lookup''' When creating new things that should be anonymous commands, such as lambdas and objects, there is a need to either pollute the proc space, or to perform various complicated workarounds. For things that require internal data, a handle is generally used that is tied to a proc of the same name. While this is an adequate approach, there are various conceptual reasons for the desire not to create a command name for every such thing. One of these reasons is the possibilty of name clashes. When trying to avoid polluting the proc space, it is mandatory to have the string representation contain all the information about the ''thing''. This approach, combined with various techniques (such as leading word expansion, etc), leads to true anonymous commands. However, even for places where all the information about a ''thing'' is able to be placed in the string rep, the approach fails when that data is mutable. Arrays are another example of having to work around the string rep limitation. Consider that arrays are a hashmap to a group of anonymous variables. The fact that there is no string representation for variables means that we cannot pass arrays as arguments. Instead, we need to pass in the name of the array and upvar into it. '''Automated Resource/Garbage Collection''' The second real problem is that of automated resource and/or garbage collection. This problem is only true for ''things'' that require internal data and/or are tied to a named command. Tcl's automated resource/garbage collection is based on the idea that a value is available to be cleaned up once there are no variables pointing to it (i.e., its RefCount is 0). For things that are not referred to via variables (such as commands), there is no way to clean up after them, since we never know when the code is done with them. The fact that they are not anonymous means that they can be accessed later, even if there is no variable containing a reference to them. '''== The Two Types of Things ==''' I propose that there are two types of ''things''. The first type are those such that each one '''is-a''' value. The second type are those such that each one '''has-a''' value. '''Is-a Value Things''' Things that satisfy the condition that each one '''is-a''' value are those that currently exist as first class ''things''; lists, strings, dicts, integers, etc. Each one is such that the value '''is''' the thing. There is nothing about them that isn't naturally stored in the string representation, and each one is immutable. By immutable, we mean that the ''thing'' itself cannot change. Instead, to alter the value we create a new ''thing'' that has the new value and point the variable at the new thing. The underlying code can make optimizations to avoid the cost of copying an object but, even then, they need to make a copy when multiple variables point to the same object. These '''is-a''' ''things'' would be handled in exactly the same way as they are now. '''Has-a Value Things''' These ''things'' are those that either do not meet the requirement that the natural string representation holds all the information about the object, or that need to be mutable: arrays, lambdas, objects, file handles, etc. It is with these that we will make changes in how they are handled. For these ''things'', we will state that they cannot be converted to other types and retain their identity. This is similar to the way arrays work now. While we can ask an array for it's ''list'' representation, that list is not the array; it does not contain all the information the array needs to be what it is. The difference is that we would be able to refer to the array using code such as: # Setup the array array set tmpArr {a 1 b 2 c 3} # Pass the array into a command, and it is free to modify it sortArray $tmpArr When we ask for a different representation of the ''thing'', what we get is not the original ''thing''. Instead, we some other ''thing'' that contains all the information about our original ''thing'' that is natural for it to contain, but is '''not''' our original. For example: array set tmpArr {a 1 b 2 c 3} llength $tmpArr In the above, the llength command asks for the ''list'' representation of the array. Once it has this, it calculates the length. However, the thing it calculates on is not the array itself, but a list with the keys and values that the array contains. '''== The Results of the Change ==''' The implications of the above constraint are fairly far reaching, and allow us to ignore many of the hoops that we previously had to jump through. '''Automated Resource/Garbage Collection''' Now that we know only the original (underlying) ''thing'' can hold the actual ''thing'', we can apply the RefCount method of garbage collection to automate cleaning up after ourselves. As examples: * File Channels can be closed, and their resources cleaned up, once they're no longer being used * Object and Lambdas can have their resources cleaned up once they no longer have a reference to them. ''(see Command Resolution below)'' * Arrays can be passed as arguments '''Command Resolution''' Since we now have ''things'' that are of a given type '''only''' if they are of that type, we no longer need to answer the question of ''"how did the caller intend this to be used?"'' Instead, we can say ''"What type of thing is this?"'' This provides us with a great boon for commands resolution. When we have a command, we can follow the following logic: 1. Is it a ''thing'' that can act as a Dispatcher (object or lambda)? If so, invoke it with the args. ''Otherwise...'' 2. Ask the ''thing'' for its string representation and call the command whose name is that string. '''== Other Notes ==''' * Circular references are still an issue, as long as RefCounting is used. ---- '''Discussion''' [DKF]: What would be the negative consequences of altering the basic model of values this way? ''(It is late, and I cannot quite spot them at the moment myself.)'' [RHS]: The only negative I was able to think of, from the programmers point of view, is the argument ''"In Tcl, Everything Is A String. If you change that, it's not Tcl anymore"''. That is not to say, however, that there wouldn't be other issues... just that I wasn't able to come up with them. ---- [SYStems] I am probably speaking too early, but it also late and I didnt wonna forget this opinion First, we need to find out the name of the topic we are discussing, second, we need to reference and summarize the literature of that topic, I think you are breaking your head re-inventing the wheel, find out this topic name (is it type theory, object orientation, or what exactly) second, buy and read the best books written on it! This is the obvious advice Another thing, is on the two type of things, thing. Let assume this scenario, Ahmed, is married to Gilane, both are humans (same type) , so lets say you want to call Ahmed's wife, you have two means, you can call Gilane, cause you know Gilane is Ahmeds wife, or you can simply call for Madame Ahmed. Three years from now, Ahmed and Gilane get a divorce, and he marries Mona. Calling for Gilane as Madame Ahmed (or Misses Ahmed, please correct my english), doesn't work. But maybe you are not interested in Gilane, you just wonna call, whoever Ahmed is maried to. Ahmed can be used as a reference to his wife, but Ahmed also have a life of his own, actually even if Ahmed Dies, Gilane can still live. Plus there is more to Ahmed than just a wife, Ahmed also have a car and a house, all of which can be reference through Ahmed, all of which can live after Ahmed dies (a related reaserch topic might be Object life cycle) So even if we never talk about Ahmed anymore, we might still wonna talk about his house, only then, we won't refer to it as Ahmed's house, see, when Ahmed and Gilane got a divorce, she got the house, the house is her responsibility now, and we refer to that same house throught Gilane (see relational databases, keys, candidate keys, primary keys, etc...) So as you see, grade in uni "GPA 3.7", is a value, it refers to "GPA 3.7" and also refers to how brave I was in uni. Therefore, there is no such thing as have a value, is a value, the only immutable value, is the most elementary one! In real life, I have no clue what that is, in computers, it's I and O, in Tcl, it's recommmended we think of it as the string (don't dig below a string, even if you can), everything thing else change over time as the result of the transactions (see database literature) applied to it. And A can refer to the ascii code % scan A %c 65 and it can refer to my grade in the database course, a value is in the eye of the beholder! I wish I made any sense, I know I still need to read a lot, please recommend anything [RHS] I'm really not able to follow where you're going with the above. When you have time, would you consider rephrasing and shortening it? Thanks. [SYStems] I wanted to point out, that everything is a value that is composed of other values and you go down until you can't anymore in Tcl this lowest layer is called a string , let consider this, {Ahmed Youssef} is a value, and at the same time, its a key, to an entity far more complex then this simple key, {Ahmed Youssef} is a value and and has a value, both, it doenst matter much, thing is. Within a Tcl prgram, you cannot extract more values from {Ahmed Youssef} except what it is, {Ahmed Youssef} is a value expressed in the most elementary form Tcl handles, static text! a string. Now lets look at a command [expr] now, expr is a key to a value (a calculator) and there is more to expr than just 4 chars, and Tcl knows its to see the real monster behind expr you need to do something like info args expr info body expr # I know it won't work because expr is a command, but it can't hurt to return the c code! This way, you will know all tcl knows about expr, as for {Ahmed Youssef} there just is not anymore to it! But if you want you can decode {Ahmed Youssef} to mean whatever you want it to mean (you can then decode a string to see it's binary representation, and use each bit as a flag to something special) expr is an encoding of the value, that could have been returned by [info] You are better of thinking of expr and the value that expr is, as the same thing! cause they are ... My most favorite approach to create anon functions, is the one what ppl wrapped the proc procedure into a function that return a unique key! the unique key can be though of as the string representation of the proc, you will of course need tcl info help to be able to decipher it thought! ---- [RS] 2004-04-09: I think the is-a and has-a distinction is elsewhere called ''transparent'' vs. ''opaque'' objects. Transparent objects ("pure values") have no garbage collection issues, as Tcl does that already - and up to a certain size, just modifying a copy is pretty sufficient. However, for a list of 100.000 items, changing it in place with [lset] is much more efficient. But that requires a variable, which is about the smallest "opaque" object we deal with all the time... See also [TOOT]. [RHS] Variables are always mutable though. What I'm talking about is a level lower, where the Tcl_Obj itself, effectively, does or does not represent a pure value (i.e. has or does not have a string rep). The case is the most obvious, the way things stand now, when looking at arrays. If we had values that didn't need to be able to shimmer to other types without losing information, then we could pass arrays into procs as first class objects. [NEM] - RHS, have you read the [Feather] paper? [RHS] No, I hadn't read it. I'd looked at the extension, but never noticed there was a paper. I'll have to go read that. From what I've heard, the thoughts behind it are quite impressive (and probably more thought out than what I have written here :) [NEM] Yes and no. Some of the ideas are very interesting. Some of them are less so. Your "has-a things" above are similar to Feather's opaque values. There are two types of thing that currently aren't represented as strings: those for which there is no natural string rep (e.g. C-coded commands), and those which involve state (channels, arrays, vars, objects, etc). In both cases, you need to put the "real" structure somewhere else, and then use a name (key) to look it up when needed. Now, from what I can gather, you are proposing that we should be able to store this "real" structure in the internal rep of a Tcl_Obj, so that it can be passed around without having to do name lookups. As this is the only copy of the structure, you need to prevent it going away. So, you mark the Tcl_Obj as opaque (or some such), which prevents the internal rep from being destroyed (in a conversion). Then, any convert-to-type operations must take a copy of the Tcl_Obj. That part ''seems'' reasonable. However, you now have two Tcl_Objs which are "the same", in the sense that one is a copy of the other. However, they are not equivalent -- only one of them has the magic extra state. So they will be indistinguishable at the script level, except that they will produce different behaviour in certain situtations. This is a violation of an important principle that lies at the heart of value semantics (i.e. what distinguishes values from variables): referential transparency. This only gets worse in the cases where the hidden state is mutable (mutable opaque objects, in Feather terminology). So, while you ''could'' make the change you propose, and with a consistent semantics, it's no longer Tcl, and it's not a change I would advise. Values are representations at the script level. Hidden state is, by definition, not represented at the script level, so it seems a mistake to try to graft on support for hidden state to the representation of values. I'd argue that we should be making more things explicit at the Tcl level, and relying ''less'' on hidden state, rather than the other way round. [RHS] While I agree that it's better to rely less on hidden state, there are a number of things that just cannot be handled without it. Due to this, there are inconsistancies in the "everything is a string" mandate, and there are many hoops to jump through in other places. My thought is, basically, to discuss what happens if we admit that not everything is a string; what benefits and problems doing so brings with it. [NEM]: By "hidden state", I was referring to state which is not represented ''at all'' in the value representation -- i.e. not directly, or indirectly via a name. I was arguing that such implicit (that's probably a better term) state is always a bad idea. The current mechanism Tcl uses is to use a name (which is a string), and then use that to lookup the state. The key difference is that you regain the property that two values that look the same at the Tcl level will always refer to the same "thing". The state is still hidden, but at least you know it's there. With your proposal (and in [Feather]) there are situations where two values are syntactically indistinguishable (at the script level), and yet are semantically distinct. [RHS] Yes and no. While it would be impossible to "visually" tell the difference between them, it's feasible to extend the script level comparison methods to be able to compare opaque things. For example, and opaque Tcl_Obj could be required to supply an ''isEquals'' command, much the way all Tcl_Objs are required to register a cleanup method now. [NEM]: So, I'd have to do something like [[someObjType isEquals $a $b]]? This doesn't solve the problem though: you are still relying on implicit state to distinguish syntactically identical values. There are many languages which do this, of course. But the fact that Tcl doesn't do this is one of the things I like most about the language. [RHS]: Actually, that's not what I meant. What I was getting at is that you would still be able to do: if { $myArray == $somethingElse } { ... } Under the hood, Tcl would look at myArray and say ''"This is not a pure value, I'll call it's comparison method with somethingElse to see if they're equal"''. Ie, it would do comparisons the way it does them now, unless one (or both) of the values were opaque things. In that case, it would call the registered comparison command for one of them, handing it the other as an argument. [NEM]: It doesn't matter whether you introduce a new operator at the script level, or change the behaviour of ==. The result is the same: introducing semantic differences at the script level on operations over syntactically indistinguishable (at the script level) entities. ---- [RS]: If you can go without [array] element [trace]s, the [dict]s available from 8.5 have all the other [array] functionality, while being pure values. And I think that "everything has a string rep" is too valuable a feature to throw overboard... [RHS] As I admitted right at the beginning of my "paper", the "In Tcl, Everything Is A String" argument is a powerful one. However, my point was merely to discuss what admitting that that statement isn't true gains/costs us. In what situations does something that cannot be represented as a first class value (file handles, arrays, etc) actually lose something if we make it so they are considered "references" instead... so that can be passed around, etc? ---- [Peter Newman]: [RHS], I basically agree with you. And Tcl already can do what you want. The problem is that [JO] never clearly defined what ''everything is a string'' means. And different people interpret that concept/idea in different ways. In my opinion [JO], like Larry Wall with Perl, was looking for ways to make programming simpler - and eliminate the complexity of C. And ''everything is a string'' is the part of that that says ''let's make it so that all'' '''parameters passed between functions''' ''are passed as strings''. That eliminates the need for type casting - which is a major cause of the extra complexity of C, compared with Perl and Tcl. But it DOESN'T mean that any function, internally, has to process stuff as strings. Nor does it mean that data has to be stored in string format. That's why in Tcl, we pass (opened) files by reference - using their file handles. And though lists are currently passed by value, they could also just and easily be passed by reference. (And IMHO,it would be better if they were). You could do this by giving lists ''list handles'' - and passing that to the list functions. ''Everything is a string'' simply means that the ''list handle'' would be passed as a string (just like file handles are). The list, internally, would be stored in whatever format makes for the fastest/most efficient processing. It may be that I've mis-understood what you're saying. But as I see it, Tcl already allows everything to be a thing. [RHS] I think you've misunderstood what the main thrust of my writeup was getting at. Currently, in saying that ''Everything Is A String'', we mandate that everything (including handles) must be able to be converted to a string, and then back again with no loss of information. For example: set fd [open myfile.txt r] set notFd [string range $fd 0 end] unset fd gets $notFd The problem is that, by mandating this, we run into a couple problems that can be hard (or impossible) to work around: * We cannot cleanup the resources/memory of the internal data of the ''thing'', since it is almost impossible to tell if it can still be referenced * We cannot ask (at the C or Tcl layers) ''Is this thing of the type X'', since it might not be of an X at the time we ask. By saying ''"Some things just aren't strings"'', we free ourselves of that limitation: set fd [open myfile.txt r] set notFd [string range $fd 0 end] unset fd gets $notFd ;# --> error The variable ''notFd'' is set to whatever the file channel (''fd'') considers it's string representation. It is not, however, equal to the file channel... nor can it be converted to one. [NEM] Leaving aside problems of whether automatic garbage collection of external resources (e.g. file handles) is a good thing, there are clearly two issues here: one is with the management of stateful things; the other is with typing. I agree that management of stateful entities (variables, objects, channels etc) is a weak point in Tcl currently. However, I think the correct way to "solve" this problem (if things really are that bad), is at the level of names/variables. Names can already be passed around quite conveniently, and don't have any problems with loosing internal rep. What would be nice would be to have a garbage collection mechanism which allows the registering of names to be managed by the GC (this would be simplified by an unification of naming schemes for things like commands, vars, channels etc, which is in itself a radical proposal). [Jim] has a notion of references that works in a similar way: the mechanism is heuristic, and there are some edge cases which it might miss (AIUI), but it is at least as good as the opaque object method. The advantage of working with names is as I have stated above: it avoids having to rely on information which is not available at the script level. The second issue, about typing, is connected with what I've just said: asking "Is this thing of type X?" also relies on a notion of type which is not explicitly represented. ---- [Category Discussion]