Getting rid of the value/command dichotomy for Tcl 9

AMG: This page was written before the advent of the apply command.


Extracted from Tcl 9.0 Wishlist...


Right now, we have variables (and arrays), and we have commands. Commands are not first-class objects, and can not be passed around as values, other than by coming up with a name for them and passing that around.

What if these things were merged? This touches on what Feather went after, I think. Lisp s-expressions also come to mind.


LV Why not ruminate on what benefits doing this would provide to Tcl 9? For instance, one benefit appears to be moving towards making lambdas possible - which is something that's been discussed in the past as being desirable. Another thing I think I see as I read below is that it may make the major/minor (command/subcommand) support stronger in Tcl. That's another requested goal. Are there other benefits in this refactoring?

ulis 2004-03-01: I never seen why lambdas are desirable.

MSW 2004-03-02: You don't have to name a proc for every callback you need (there's more, but that alone is obvious and helpful enough)


So the notation

proc a {b c} { ... }

would be shorthand for something like

set a {lambda {b c} { ... }}

Calling this would be done as usual:

$a 1 2

Now add a touch of syntactic sugar: every first word of a command is treated as a variable name, the value of which is used to supply the code. So the line

a 1 2

is internally interpreted as

[set a] 1 2

and then evaluated as it is now. Another way of saying this, is that there's an implicit "$" at the start of each command.

Combined, this would allow the following to work:

set a {lambda {b c} { ... }}
a 1 2

And if you think about it: "set" is now a variable too, with the code it takes to do what set always does.

Access in this way has further implications. One can now write

a(ha) 1 2

which would use $a(ha) as source of the command to evaluate.

Is there a way to generalize this even further? Could "a(ha)" and [list a ha] be synonyms? Would that lead to an even more generic design, which can unify minor/major and namespaces as well?

Lars H - You're quite far out here; in the last paragraph the language has probably stopped being Tcl. (This doesn't mean it is necessarily bad, but it's no longer just a change of dialect.)

KJN - also, if the command a is interpreted as $a then the language stops being Tcl. This bites if a string is eval'd as a command - if the string is created and executed in different contexts, a will be substituted with $a in the execution context and not in the creation context.

male 03.04.2003 - Start:

No, not this kind of merging commands and variables, just because it wouldn't be possible to have variables with the same name than a command! This would restrict a lot! But in general I would like to have "nameless" procs stored in variables and arrays or procs to "carry" around like proc arguments! Examples:

% set a 1
% $a
invalid command name "1"
% set a {{name} {return "Hello ${name}!}
% $a Martin
Hello Martin!
% set pi {{} {return [expr {4*atan(1)}];}
% $pi
3.14159265359
% proc a {} {return "Hello world!";}
% a
Hello world!
% proc b {procA} {return [$procA];}
% b a; # calls a as procA
Hello world!

male 03.04.2003 - End!

The idea to have lambdas is certainly a good one, but unifying the namespaces for commands and variables is a large step that will certainly break many scripts. One could live with it, but I don't see that it would improve on anything. (It would make sense in a language where every token is subjected to variable substitution after it has been parsed -- myVar is treated as if it was $myVar -- but Tcl has never been such a language. Anyone wanting a Tcl-like language of this kind should probably make a fresh start at it, as it would be a very radical language modification.) One should rather accept that Tcl has variables and commands, and that it depends on the context whether a string is looked up as one or the other.

A consequence of that would be that defining a command requires the use of another command than set. If there was an interp define:

interp define path command-name lambda

then

proc a {b c} { ... }

would rather be a shorthand for something like

interp define {} a [lambda {b c} { ... }]

(possibly modulo some differences w.r.t. namespace lookup) than a shorthand for the set a sketched above.

Also, it occurs to me that the [$a 1 2] syntax probably is where Feather gets it wrong. It is certainly very nice, but it doesn't work with the language, since in principle any string is a legal command name, including those that are string representations of lambdas. However, if one is prepared to give up on this syntax then everything becomes much simpler. It is only natural to have a command for evaluating lambdas, so one could have something like

mu $a 1 2

instead of requiring that some values should be given special treatment. When it is a command rather than the core interpreter that says something is a lambda to evaluate, then there is no longer a problem with giving lambdas readable string representations; it could simply be that of the list of the arguments to the original lambda command, or even that command as a whole!

jcw - Agreed - lamda strings are tricky, if there happens to be a command with the same name (possible, not plausible).

Unifying variables and commands is a way of treating everything as a call, or everything as a value, if you prefer. It's what some languages introduce as "properties". This becomes useful when one wants to hide the choice between calculated results and lookups, for example. Or when creating "access proxies" which need to do something special before a value is available (such as fecthing it from another machine). Some of that can be tricked with traces on variables/arrays, but why have three mechanisms where one might suffice?

I don't quite go along with the comment that this has stopped being Tcl. The "a 1 2" notation is just as much Tcl as it is now, the only thing that has changed is the evaluation logic. And the fact that commands, variables, and namespaces might be unified a bit (perhaps not fully, I'm stretching it for the sake of argument). Oh, and that's just half the story - how about unifying file names, package names, and namespaces? It's not far-fetched - Python did that from day one, without loss of expressive power.

Oh well, just rattling the cage a bit... :)

Lars H: It occurs to me that the way to introduce radical language modifications of this type is to introduce a new type of interpreter. Since procedure names, variables, namespaces etc. are all local to the interpreter, one can have one set of the rules for these in one interperter and a completely different set of rules in another. There would be an interp newstylecreate command for creating a slave interpreter with the new set of rules, and such new style interpreters would conversely have a command for creating a slave interpreter with the old set of rules. One could have command-line switches for specifying which style of interpreter the shell starts up with.

(I'm still not particularly suggesting that one should try to unify variables and procedures as outlined above. My point is that I think the cleanest way to do that would be to introduce a new kind of interpreter.)


TV apr 3 03: Apart from rattling the cage thoughts, there is the question of whether one can get out, and it seems to me that the above primarily is not in that range. Making a lambda function in tcl could work, which is essential, except one may be interested in multithreading transparency and namespace neatness. A procedure is a special list, but it can be read and written, albeit that one has to use the

info args procname
info body procname
info defaults procname

constructs. - RS: See Procs as data structures which exploits this feature.


NEM This rattling cages looks like fun! OK, so let's try and stir things up some more with some random, ill-thought-through, but (hopefully) interesting ideas. Firstly, I like the idea of unifying var and command namespaces. The idea of making [somecmd args] dereference somecmd is nice, so I can do e.g.

set add [lambda {a b} {expr {$a + $b}}]
add 1 2

which is equivalent to

$add 1 2

for whichever way you implement lambdas. Can we take this a step further though? In Lambda in Tcl, EB suggested (and I elaborated) having a lambda which has a string rep which can be invoked, looking something like:

% set add [lambda {a b} {expr {$a + $b}}]
lambdaeval {a b} {expr {$a + $b}}

Not the nicest string rep, especially if you have a large body, but most of the time you wouldn't actually be looking at it. Now, the problem with this is that it cannot be evaluated directly, via:

% $add 1 2
=> no such command \"lambdaeval {a b} {expr {$a + $b}}\"

So, what if we do an {*} (8.5) on the leading word?

% $add 1 2

becomes

% {*}$add 1 2

which in turn becomes,

% lambdaeval {a b} {expr {$a + $b}} 1 2

and, bingo everything works. To incorporate with the stuff in the rest of this page:

% set add [lambda {a b} {expr {$a + $b}}]
lambdaeval {a b} {expr {$a + $b}}
% add 1 2
3

So, now we have both dereferencing, and expansion of leading word in command lookups. Of course, some of you may now have noticed that the definition of lambda is wrong now, if we look at how the "add" command is resolved:

add 1 2
{*}$add 1 2
lambdaeval {a b} {expr {$a + $b}} 1 2
{*}$lambdaeval {a b} {expr {$a + $b}} 1 2

Now what? Is [lambdaeval] implemented in terms of a lambda (let the recursion begin!), or is it handled specially? Or, perhaps our string rep was wrong all along. The correct string rep for a lambda should be a list of two elements - arglist and body:

% lambda {a b} {expr {$a + $b}}
{{a b} {expr {$a + $b}}}

So, lambda is now just an alias for [list]. But this looks nothing like what the Tcl parser currently looks for in a command - i.e. a name. You could create a proc for each lambda using that rep as the name, but this is backwards - using proc to implement lambda, instead of using lambda to implement proc. So, how about this as rules for the interpreter:

add 1 2
{*}$add 1 2
{a b} {expr {$a + $b}} 1 2

So, what we do, is we dereference the command name/variable to get the command itself - a list. The interpreter then expands this list, and executes the result - first word is argument list, second is body, and any remaining are arguments. The reason for expanding the command is for currying:

set add [list {a b} {expr {$a + $b}}]
# Curry
lappend add 1
# Eval
add 2
{*}$add 2
{a b} {expr {$a + $b}} 1 2

Everything just works like this (I think). Of course, there are issues here - how to hook in the byte-compiler for all this (maybe by having a [lambda] command, distinct from a [list] command, which byte-compiles the body and argument handling and caches it in the Tcl_Obj?), and how to deal with the string rep of C-coded commands - for which the body is not available. Also, while this is somewhat neat, I'm not going to promote it as "the One True Way". I don't understand this issues enough, or any possible gains. Just trying to provoke some discussion. Oh and while we're at it, my current favourite want: dicts as installable scopes... I'll leave that for another day though ;)

NEM 2Mar04 - I've been thinking about this some more on the bus on the way into university this morning. What I propose above doesn't really work for lambdas. It works, if you do the following:

set add [lambda {a b} {expr {$a + $b}}]
add 1 2

But, it wouldn't work if you did this:

[lambda {a b} {expr {$a + $b}}] 1 2

Why not? Because of the explicit dereference discussed on this page, it becomes:

{*}${{a b} {expr {$a + $b}}} 1 2

and we don't have a variable/binding with the name "{{a b} {expr {$a + $b}}}". So in order to make it work, our lambdas have to be bound to some name - i.e. they aren't anonymous anymore (and thus, aren't lambdas). This situation is no better than the current one, where we just have procs. We could solve this by saying: "Try and dereference the first word, and if that fails, try and evaluate the line directly" but this is both ugly, and seems to give lambdas an implicit name (their string rep). So maybe, we should skip the dereferencing and just do the expand?

[lambda {a b} {expr {$a + $b}}] 1 2

becomes

{*}{{a b} {expr {$a + $b}}} 1 2
{a b} {expr {$a + $b}} 1 2

This takes us back to having to do:

$puts "Hello, World!"

and so on, for normal commands, which isn't so good. I'm not sure what my conclusion is here. Perhaps that this sort of thing is going to be ugly any way you cut it.

BTW, I realise this has got a bit off-topic for this page. If someone wants to move this stuff somewhere more appropriate, please do so (but leave a link here).

DKF: You could use a command-name like apply to help, so:

[lambda {a b} {expr {$a + $b}}] 1 2

is the same as:

apply {a b} {expr {$a + $b}} 1 2

What does this gain you? Well, it might make currying easy, and it also means that you can get a consistent interpretation of:

set add [lambda {a b} {expr {$a + $b}}]

which would be equivalent to:

set add {apply {a b} {expr {$a + $b}}}

Indeed, lambda would actually be surprisingly similar to:

list apply

as in this:

interp alias {} lambda {} list apply

The real magic would be in the apply command, of course, and in leading-word autoexpansion.

NEM: That's exactly what I do at Lambda in Tcl. I originally called my command "apply" too, but changed it to "lambdaeval" to match EB's version. Of course, you could actually make the "apply" command be called "lambda" and then use:

set add [list lambda {a b} {expr {$a + $b}}]

The leading word auto-expansion is the key here. Is there a drawback to doing this throughout the core? Obviously, commands with spaces in them wouldn't work:

{my command with spaces} $arg1 $arg2

But then, maybe that should *really* be equivalent to:

my::command::with::spaces $arg1 $arg2

as namespaces are ensembles are commands... So perhaps leading word expansion would be useful here too? The most common case is where the command is a single word - and expanding a single word gives back the single word, so not a problem here. Anybody use command names which aren't valid lists? Can you even do that?

RS Easily tested:

% lrange {"} 0 end
unmatched open quote in list
% proc {"} args {return "command names may be not-lists"}
% {"}
command names may be not-lists

NEM People reading this page may be interesting in my experiments at More functional programming. There's some common functional programming ideas there - and it unifies the command and variable namespaces. All in pure Tcl too (although not very efficient).


wdb 2006-08-26 -- I understand the wish to unify variables carrying functions and variables carrying string data as it is what makes e.g. Scheme attractive for hackers. But this intention breaks one of Tcl's fundamental principles: Everything is a string.

After becoming a little bit comfortable with the basics, I do not see any dichotomy as both --- command-name expansion by the rule of first position and var-name expansion via set --- can be seen as external programs called by Tcl as it was originally intended as a "glue language" which mainly calls external applications.

Just for my programming, I understand procs and vars as follows:

  • The command-name expander uses an own table with 2 cols where col1 contains the key (proc name), and col2 contains arguments and body.
  • The variable-name expander set uses an own table with 2 cols where col1 contains the key (var name), and col2 contains its string content.
  • When entering/leaving stack frames or namespaces, these expanders do their very best to hide from me that they are external applications. But without them, Tcl were just able to produce one-liners.

I watch this as a new paradigm which differs e.g. from the symbol values in Scheme. A string can be used to ask the procedure set about its value, and the same string can be used to invoke the execution by the command-name expander. Tcl has no variables on its own, so there's no dichotomy.

NEM In what way does using the same namespace for commands and variables violate EIAS? The only thing I can think is if you try to get the value of a variable containing a C-coded command. This could be dealt with either by throwing an error as for array variables, or by coming up with some opaque string representation for C-coded commands (e.g. "<Command:0x01>"). An error is probably the best option.

wdb If a variable can carry anything else than a string, then its value is not a string. That is the violation I meant. Your suggestion to throw an error on fetching such value does not change the fact that it is a violation. Call me a purist.

The only way I see to unify were changing the 11 rules such that for a C command, the name of the procedure is e.g. "<Command:0x01>" (where it can reflect the true address in memory space, calculated on init of Tcl, but immutable afterwards), and {lambda args body} for user-defined procs. Then, you can say e.g. (assuming <Command:0x01> does the same as list and $::set does the same as set and $::uplevel does the same as uplevel):

% <Command:0x01> apple tree
apple tree
% # this was a "direct" call of a C-defined function
% $::set list <Command:0x01>
<Command:0x01>
% $list apple tree
apple tree
% # this was a call via variable

and

% {lambda x {[$::uplevel #0 $::set list] $x $x}} appletree
appletree appletree
% # this was a "direct" call
% $::set list2 {lambda x {[$::uplevel #0 $::set list] $x $x}}
lambda x {[$::uplevel #0 $::set list] $x $x}
% $list2 appletree
appletree appletree
% # this was a call via string-containing variable

Btw, in this example, a real procedure was constructed via $::set. The procedure proc is not necessary.

RS In your examples all commands seem to have to be $-prefixed, which doesn't improve the beauty of the language. interp alias might help here:

interp alias {} list  {} <Command:0x01>
interp alias {} list2 {} {lambda x {[uplevel #0 $::set list] $x $x}}

wdb worse:

$interp alias {} list  {} <Command:0x01>
$interp alias {} list2 {} {lambda x {[$::uplevel #0 $::set list] $x $x}}

Moreover, with this solution, the dichotomy is not removed because you cannot say, set list etc., because they are not really set but aliased. But removing dichotomy was the main intent for our try.

Frankly spoken, I am not convinced that it is possible to remove the dichotomy of Tcl without changing it radically. Moreover, I am happy with the current non-unified situation. I enjoy the other simplicistic aspects of Tcl. There are other beautiful languages, e. g. Scheme, where the dichotomy is removed. I prefer Tcl's state as is.

NEM: It's much much easier to unify things by moving variables into the command namespace than attempting to move commands into the variable namespace. The command namespace doesn't support get/set operations directly, only an invoke operation. This means that you don't have to worry about what the string representation of a command is, because there is no way to get it. Variables can then be built on top as commands:

var foo 12
foo <- 15 ;# Update, other syntax would be possible of course
foo trace add write ...
incr foo ;# Not all operations have to be "methods", of course
puts "foo = [foo]"

That's a much cleaner option than allowing $-signs all over the place (we could even remove $-syntax in this scheme... :)

RS In one way, yes - if we had no variables. But then we wouldn't have local variables either, all of them cluttering around in global namespace... The theorem still holds: "It's easy to do it different from Tcl, but hard to do it better." :^)

wdb Fascinating, nonetheless ... I must think about it.

NEM If you were changing the language to unify commands/vars in this way, it would be natural to then also allow local command environments. Add lexical scoping, and IMO you have a better Tcl.

wdb (few days later) -- yep. Possibly for Tcl 10? Use set instead of var. For reasons of heritage include a high-level defined alias $ to avoid incompatibilities to older scripts ... ok, convinced. wdb 2006-18-10 For example, see tcl9var where a raw implementation shows the variable behaviour on top-level usage.


gchung We can get rid of the value/command dichotomy by making commands values, then naturally commands and variables will live in a unified namespace. Consider how it is done in Python:

x = 1
def x(): return 2
print x, x()
=> <function x at 0x1021a9ed8> 2

Notice that the variable name x now points to a function. This doesn't happen in Tcl:

set x 1
proc x {} {return 2}
puts [list $x [x]]
=> 1 2

But if commands were values, the code above would behave more like Python. Then we can pass commands around like any other value:

# Tcl where commands are values:
proc x {} {return 2} ;# Let $x be "<function x at 0x1021a9ed8>"
set y $x
puts [list y $y [y]]
=> y {<function x at 0x1021a9ed8>} 2

Notice that $y and [y] mean different things. The former has the value substituted in, and the latter is a function call. Here, I assume a command's value is some opaque reference string.

In my version of the language, the first argument of a command list is the name of the variable that points to the command value. The interpreter looks up the variable once (I don't like having the interpreter look up recursively). Here's an illustration:

proc x {} {return 2} ;# Let $x be "<function x at 0x1021a9ed8>"
set str "Hello"
x     ;# returns 2
$x    ;# error: can't read "<function x at 0x1021a9ed8>": no such variable
str   ;# error: string is not callable
$str  ;# error: can't read "Hello": no such variable

Command Scope: Because commands are now stored in variables, they are subject to the same scoping rules as regular Tcl variables. But this causes problems because we expect our procs to be global. One "fix" is, when it comes to command lookup, the interpreter searches the outer scope when it fails to find the variable in the local scope.

Lambdas: I imagine the syntax for lambdas would look like this:

set x [lambda {} {return 2}]
x ;# returns 2

Notice that we just say x and not $x. Also notice that set+lambda and proc are pretty much equivalent. We can even define proc in terms of set and lambda:

set ::proc [lambda {name arglist body} { tailcall set $name [lambda $arglist $body] }] 

Command Ensembles and Objects: Right now, namespaces and most object systems use the command namespace. But now that commands are values, namespaces and objects are values too. If the first argument of the command list is a namespace/object, the interpreter will look into that namespace/object for the subcommand/method provided by the second argument. So the commands [string length ...] and [string::length ...] are equivalent. The value of the [string length] command is stored in the variable $::string::length.

gchung 2012-05-29 -- See "Commands as Values in Jim" for my mock up of this in Jim.