Version 11 of everything is a list

Updated 2008-12-13 17:38:28 by LEG
or
Everything is a list of strings
or
How to get rid of the {*}

On the page Everything is a string the discussion at some moment turns to the topic Is everything a list?. LES on April 27, 2004: says there:

  [[..]] the more I use Tcl, the more I am convinced that everything
  is a list. And that's one of the best features in Tcl.

Since this questions is worth further exploration i set up this page. In recent explorations of Tcl i got convinced that the EIAS point of view is not only a shortcoming, but it could be even a great benefit to consider that Everything is a list of strings.

First I want to discuss the execution model of Tcl. Fact is, that the Endekalogue (or Dodekalogue as of Tcl 8.5) strives to, but is not sufficient to understand how Tcl works, since it only describes syntactic conventions. Fact is also, that already in the *dekalogs we learn implicitely about lists. Now, how does Tcl its work:

  • A Tcl script is a string, that is firstly split into a list of commandlines, which are evaluated from first to last, unless one of them alters this execution order.
  • Each commandline is split into a list of words which are subject to substitutions.
  • The first word on the resulting list of words is looked up in the list of commands which, when found, is applied to the rest of the list of words. Or in a different terminology: the rest of the list of words on the commandline are given as arguments to the first which is a command.

We already can see, by virtue of this paraphrasing, that lists are at the core of the Tcl language.

Now to the data nature of lists in Tcl. Some selected citations from the EIAS page:

  • KJN 2004-11-04: The endekalogue does not define lists - it leaves it to commands to decide whether to interpret an argument as a list (and, implicitly, to define what is meant by a list).
  • RHS 05Nov2004: [..] you want you be able to say "this is a list" and "this is not a list". In Tcl, there's no such concept. You can only say "I want to treat this like a list" and "I want to treat this like some-other-thing-that-isn't-a-list".
  • NEM 25July05: [..] what is at the heart of this debate is not [..], but rather a recognition that the notion of a "type" of a value is extrinsic to the value itself.

In fact, it is not until the command receives its arguments as a list of strings, that the type of each is interpretated either as string, list, integer or float.

To stay with strings and lists consider the following, slightly modified, example from shimmering:

# set x {a  b c}
a  b c
# puts $x
==> a  b c

x is set to a string, and puts expects a string. However if we:

# lappend x d
a b c d
# puts $x
==> a b c d
# set x
a b c d

then the string stored in x is converted into a list of strings, visible by the loss of the double space between the first two elements. puts expects a string, so the list in x is converted back to a string on the fly on invocation. x is not altered by this conversion as shown on the following line.


How to get rid of the {*} expansion prefix.

or: How to benefit most from the notion that Everything is a list (of strings).

Since ever, in Tcl list handling had to be done somewhat quirky. Up to Tcl 8.4 this was expressed by the need of using the eval command to unwrap lists on a commandline, which leeds one straight to Quoting hell. In Tcl 8.5 the syntax of the Tcl language has been changed, introducing the expansion prefix {*} which can be put in front of any word to provide this {expand}ing. This prefix breaks implicitely with rule [10] of the Endekalogue, which states that each character is processed exactly once, and introduced the new era of the Dodekalogue.

In the following I suggest considering re-verting the Tcl syntax to a - however modified - Endekalogue, and changing semantic instead of syntax. Maybe as a goal for The Mystical Tcl 9.0. Back to the roots!

Let's revisit the execution model. While, rule [6] (or [7] in 8.5) talks about command substitution, which is the term for function calls in the Tcl language, it does not explicitely state that every command returns exactly zero or one result. More precisely, a command returns either an empty string, or a string.

This is a gross asymmetry in the Tcl language: commands take lists of arguments but can only return strings!

What if we loose this restriction, and state:

  • every command returns a list of words. Note that this includes the posibility to return an empty list or a list with only one word
  • command substitution expands the invoking commandline with the list of words returned by the command
  • return returns the list of words in its commandlines to the invoking stack frame

(Note that this loose formulation can be expressed very precisely in terms used by the *dekalog, without the need to mention the command return explicitely.)

While this should maintain almost complete backwards compatibility for simple scripts, it gives a lot of benefits some of which shall be enumerated later, first some explorations.

You can emulate the proposed behaviour, by prepending any [bracket] expression with the {*} expansion prefix, and enclosing the arguments of any return statements in a list .. construction (without the {*}), if there are more then one return value.

# proc valueOf x {return [list $x]}
# set x {a  b c}
a  b c
# valueOf $x
{a  b c}
# valueOf {*}[set x]
wrong # args: should be "valueOf x"

Of course! any command can now return a number of values, and valueOf only accepts only one. We rewrite valueOf as variadic function, using the tautology list {*}$args = $args.

# proc valueOf args {return $args}
# valueOf {*}[set x]
a b c
# remember!
# valueOf $x
{a  b c}

This example shows already the deep implications of the approach:

  • We can switch easily between list and string representation of a variable by using either set x or $x.
  • The idiom: use set x instead of return $x, will break if x contains a list: the former returns a list, the later returns a list with one element (which is a list).

Why does this benefit us?

  • Some of us have stumbled over the impossibility to return more then one value from a proc, e.g. a status and a value. We have work around either by setting variables in the callers stack frame, which is a 'bad thing' (tm) or by returning lists, which had to be broken up by eval (or {*}.

Bryan Oakley - How can you say such a thing is impossible? 'proc foo {} {return [list a b c]}'. There, I returned a list. Nothing in the current language design forces me to eventually treat that data with eval or {*}.

aricb - There are also [return -code $status $value] and [return [dict create status $status value $value]]].

  • The natural Functional Programming style of Tcl can be exploited to any extent without changing the syntax.
  • The simplear, cleaner syntax of the Tcl language.
  • Variadic functions can be exploited more naturally

An example, stressing the Functional Programming style:

namespace import ::tcl::mathop::*
proc map {prefix args} {
    set r {}
    foreach e $args {lappend r [uplevel $prefix $e]}
    set r
}
if {& [map {file exists} [map {file join / etc} passwd shadow group]]} {
   puts "We seem to be on a unix box with shadow passwords"
}

Which seems fairly more natural than:

if {& {*}[map {file exists} {*}[map {file join / etc} passwd shadow group]]} {
...

Hint for people not used to Functional Programming: try to read the idiom from right to left: a list of names gets converted into a list of filesystem paths gets converted into a list of flags, which are all 'and'ed together.

A typical "old style" implementation for comparision:

set files {}
foreach file {passwd shadow group} {
    lappend files [file join / etc $file]
}
set flag 0
foreach file $files {
    set flag [expr {$flag & [file exists $file]}]
}
if {$flag} {$flag} {
   puts "We seem to be on a unix box with shadow passwords"
}

NEM: Why not just not use a variadic function when you don't want one?

proc map {f xs} {
    set ys [list]
    foreach x $xs { lappend ys [{*}$f $x] }
    return $ys
}
if {[& [map {file exists} [map {file join / etc} {passwd shadow group}]]]} {
   ...
}

LEG 2008-12-13: Just made a slight correction to the above to make it work (missing braces).


Note: While playing with this, I found some gotcha's in the use of {*}:

# proc valueOf args {return $args}
# set x {a  b c}
# set y "{*}[valueOf $x]"
{*}{a  b c}
# set y "{*}$x"
{*}a  b c

I would have expected the result to be a b c and a b c respectively, since e.g. first {*}$x should be substituted by a b c, and then the '"' quotes stripped off. When reading the Dodekalogue I realize, that the {*} expansion occurs only at the start of a word, which explains the shown behaviour.

are you saying that one gotcha is that {*} behaves as documented?

Note that the proposed semantic change to [bracket] expansion would have to deal with a sensible interpretation of substituting a list of words returned by a [bracket] expression into a string. I guess the most simple one would be to use the string representation of the list.


The extensive use of variadic functions as well as introducing (return) values for loops would further even more the Simplification of the Tcl language and make scripts more understandable. See Simplification of the Tcl language for the respective discussion.


aricb 2008-12-10: I'm intrigued by the prospect of [return] being able to return more than one value, but it seems to me that your proposal to eliminate {*} is fundamentally flawed. You say that command substitution would expand arguments, so that $var and [set var] would behave differently. Consider this:

What if I want to return a list? In Tcl 8.5, I can do:

  return [list $value1 $value2 $value3]

but under your proposal, if I read it correctly, this would be equivalent to

  return $value1 $value2 $value3

[list], [dict create], [lreplace], etc. would no longer return lists but a bunch of scalar values. To make matters worse, what if I wanted to return two lists? The following hypothetical command:

  return [list $value1 $value2] [list $value3 $value4]

would be equivalent to

  return $value1 $value2 $value3 $value4

which is not at all correct.

You cannot allow square brackets to do argument expansion without seriously botching up the way Tcl handles lists. If you are going to allow [return] to return multiple values, you have to do it in such a way that the integrity of the values is preserved. In other words, when command substitution takes place, the command must be replaced with a number of words equal to the number of arguments to the [return] statement that terminated the execution of the command.

  proc returnOneValue {} {
      return [list value1A value1B]
  }

  proc returnTwoValues {} {
      return value1 value2
  }

  proc acceptOneArg {arg} {
      puts "arg: $arg"
  }

  proc acceptTwoArgs {arg1 arg2} {
      puts "arg1: $arg1"
      puts "arg2: $arg2"
  }

  acceptOneArg a
  > arg: a

  acceptOneArg [returnOneValue]
  > arg: value1A value1B

  acceptTwoArgs a b
  > arg1: a
  > arg2: b

  acceptTwoArgs [returnTwoValues]
  > arg1: value1
  > arg2: value2

  acceptOneArg [returnTwoValues]
  > wrong # args: should be "acceptOneArg arg"

  acceptTwoArgs [returnOneValue]
  > wrong # args: should be "acceptTwoArgs arg1 arg2"

In a Tcl interpreter where commands could return more than one value, you could do away with {*}$list, but not by replacing it with [set $list]; you would need a new command which takes one argument (a list) and returns each member as a separate value: [expand $list].

It should be noted that allowing a variable number of return values would introduce a serious incompatibility regarding procs that call [return] with no arguments. In Tcl 8.5, this returns an empty string. If return were variadic, [return] with no arguments would truly return nothing. Any script that relies on argumentless [return] returning an empty string would break.

In the end, as useful as multiple return values might be, I think we are better off with the syntax and semantics that are currently in place.

LEG 2008-12-11: Good point. I see that more would have to be done than I thought: [list $value1 $value2] would have to return just one value after expansion.

aricb 2008-12-12: The point was not that [list] should have a special behavior (it should not) but that square brackets must not trigger argument expansion. The fact that you frame the discussion in terms of expansion indicates that you are not really returning multiple values; you are returning a single value (a list) which must be expanded. So you are not proposing any new behavior for the [return] command. The new behavior comes because square brackets will now trigger argument expansion. In your proposal, to return multiple values, you return a list, which gets expanded on the other end.

But as you noted in your discussion, Tcl does not specify when something should or should not be treated as a list, and values which were intended to be scalar can be misinterpreted as lists if the programmer is not careful. To prevent misinterpretations, then, the programmer will be forced to always package return values as a list. So your proposal doesn't simplify anything at all; on the contrary, it complicates the task of programming by forcing programmers to type [return [list $result]]] every time rather than [return $result]. [list ...] is seven characters. {*} is three. So:

  1. arguably you are not saving anybody any effort
  2. while you might remove one wart from the language ({*}), it is a wart that occurs relatively rarely in most people's code, and you are replacing it with a wart that would be orders of magnitude more frequent (return [list $result])

If you want to be able to return multiple values, the solution is not to make brackets perform argument expansion (and redefine a host of commands that return lists); the solution is to make the [return] command variadic and specify that a command surrounded by brackets would be substituted with one word for every argument passed to the command's [return] statement. So, if you want to return one value, you type [return $value]; if you want to return two values, you type [return $value1 $value2], etc. The catch (which is also a catch in your proposal) is that commands that return no values would get substituted with zero words, whereas in today's Tcl they are substituted with one word--an empty string. This catch is such a major issue that I think we are better off with the current idiom--use {*} to expand arguments as needed, and assume by default that when a command returns a list, it intended to return a list.

(a few hours later) Sorry, the above comes across in a harsher tone than I intended it. I really do think the idea of returning multiple values is interesting, and don't mean any hard feelings, despite the fact that I don't like the thought of having square brackets cause argument expansion.

LEG 2008-12-13: no problem, aricb. This are musings about what if everything were a list, so I just try to figure out how to make it work. My first reply also was somewhat short. I'll try to expand a little more.