Quoting and function arguments

started by TV

I'm making a interactive command composer, and happened to be wondering about general usability, so that preferably it wouldn't choke on special cases such as quoted names/arguments, and still generate command lines which can be part of a nested eval.

In bwise, which contains a lot of nested eval stuff probably rightfully not being popular for decent, readable, commercially documented (...) code, though to me is quite reasonable and does a lot of stuff which some day probably will be better understood as lisp-like functional progamming (I learned lisp on the acorn electron, a bbc computer derivative, from an old book I stumbled on in the electrical engineering library, having picked up the programming language's properties somewhere), much does work as intended, and I promise I'll somehow get to the point of explaining the functional decomposition stuff graphically. Which to me is like decent bookkeeping, but it is an efficient, mathematically sound, way of talking to a computer about a certain problem, worthy of consideration ... Straying from the topic.

A procedure definition all tclers know:

proc {{arg1 default_1} {arg2 default_2} ....} {
    body
    return $ret_value
}

This is like most structured programming languages with argument passing.

In Tcl, not just everything is a string but most likely everything is a list, which to my mind has a submeaning that everything can be dealt with at that level, symbolically, so we can automatically generate procedure definitions.

What can easily get in the way, and is usually a drag and at least break of programming language consistency (read: start getting the pointers and the hexdumps), is allowing general definitions and quoting to phrase them in practical use. Here, just that subject, preferably staying out of quoting hell, suppose we want to use function arguments with non-trivial names or defaults.

Examples:

(Theo) 9 % proc test {{{a\ a} content\ a}} {puts ${a\ a}}
(Theo) 10 % test
content a
(Theo) 11 % proc test {{{a\ a} {content\ a}}} {puts ${a\ a}}
(Theo) 12 % test
content\ a
(Theo) 13 % proc test {{{a\\a} {content\ a}}} {puts ${a\\a}}
(Theo) 14 % test
content\ a
(Theo) 15 %

Let's not forget:

THE BASIC MODEL (courtesy of John O.)

Almost all problems can be explained with three simple rules:

  1. Exactly one level of substitution and/or evaluation occurs in each pass through the Tcl interpreter, no more and no less.
  2. Each character is scanned exactly once in each pass through the interpreter.
  3. Any well-formed list is also a well-formed command; if evaluated, each element of the list will become exactly one word of the command with no further substitutions.

...


DKF: This was extracted recently from the recesses of the internet (well, the mirror of the old procplace ftp server) [L1 ] as part of a discussion about eval and list for academic citation purposes. Unfortunately, it's exact provenance and other metadata is lost, but it is apparently by Brent Welch on comp.lang.tcl

<filed in /project/tcl/doc/README.programmer>

This is a short note to describe a deep "gotcha" with TCL and
the standard way to handle it.  Up front, TCL seems pretty
straight-forward and easy to use.  However, trying out some
complex things will expose you to the gotcha, which is
referred to as "quoting hell", "unexpected evaluation",
or "just what is a TCL list?".  These problems, which many
very smart people have had, are indications that programmer's
mental model of the TCL evaluator is incorrect.  The point
of this note is to sketch out the basic model, the gotcha,
and the right way to think (and program) around it.

THE BASIC MODEL (curtesy of John O.)

Almost all problems can be explained with three simple rules:
    1. Exactly one level of substitution and/or evaluation occurs in each
    pass through the Tcl interpreter, no more and no less.
    2. Each character is scanned exactly once in each pass through the
    interpreter.
    3. Any well-formed list is also a well-formed command;  if evaluated,
    each element of the list will become exactly one word of the command
    with no further substitutions.
For example, consider the following four one-line scripts:
    set a $b
    eval {set a $b}
    eval "set a $b"
    eval list set a $b
In the first script the set command passes through the interpreter
once.  It is chopped into three words, "set", "a", and the value of
variable "b".  No further substitutions are performed on the value
of b: spaces inside b are not treated as word breaks in the "set"
command, dollar-signs in the value of b don't cause variable
substitution, etc.

In the second script the "set" command passes through the interpreter
twice: once while parsing the "eval" command and again when "eval"
passes its argument back to the Tcl interpreter for evaluation.
However, the braces around the set command prevent the dollar-sign
from inducing variable substitution:  the argument to eval is
"set a $b".  So, when this command is evaluated it produces exactly
the same effect as the first script.

In the third script double quotes are used instead of braces, so
variable substitution occurs in the argument to eval, and this could
cause unwanted effects when eval evaluates its argument. For example,
if b contains the string "x y z" then the argument to eval will be
"set a x y z";  when this is evaluated as a Tcl script it results in
a "set" command with five words, which causes an error.  The problem
occurs because $b is first substituted and then re-evaluated.  This
double-evaluation can sometimes be used to produce interesting effects.
For example, if the value of $b were "$c", then the script would set
variable a to the value of variable c (i.e. indirection).

The fourth script is safe again.  While parsing the "eval" command,
command substitution occurs, which causes the result of the "list"
command to be the second word of the "eval" command.  The result of
the list command will be a proper Tcl list with three elements: "set",
"a", and the contents of variable b (all as one element).  For
example, if $b is "x y z" then the result of the "list" command will
be "set a {x y z}".  This is passed to "eval" as its argument, and
when eval re-evaluates it the "set" command will be well-formed:
by rule #3 above each element of the list becomes exactly one word
of the command.  Thus the fourth script produces the same effect as
the first and second ones.

THE GOTCHA (observations by Brent Welch)

The basic theme to the problem is that you have an arbitrary
string and want to protect it from evaluation while passing
it around through scripts and perhaps in and out of C code you write.
The short answer is that you must use the list command to
protect the string if it originates in a TCL script, or you
must use the Tcl_Merge library procedure if the string
originiates in your C code.  Also, avoid double quotes
and use list instead so you can keep a grip on things.

Now, lets rewind and start with a simple example to give some context.
We want to create a TK button that has a command associated with it.
The command will just print out the label on the button, and we'll
define a procedure to create this kind of button.
There are two opportunities for evaluation here, one when the button
is created and the command string is parsed, and again later on
when the button is clicked.  Here is our TCL proc:

proc mybutton1 { parent self label } {
    if {$parent == "."} {
        set myname $parent$self
    } else {
        set myname $parent.$self
    }
    button $myname -text $label -command "puts stdout $label"
    pack append $parent $myname {left fill}
}

The intent here is that the command associated with the button is
    puts stdout $label
Now, label is only defined when creating the button, not later on when
the button is clicked.  Thus we use double-quoting to group the words
in the command and to allow substitution of $label so that the button
will print the right value.  However, this version will only work if
the value for label is a single list element.  This is because the
double quotes around
    "puts stdout $label"
allows variable substitution before grouping the words into a list.
If label had a value like "a b c", then the command string defined
for the button would be
    puts stdout a b c
and pass too many arguments to the puts procedure who would complain.

THE SOLUTION

The right solution is to compose the command using the list operator.
list will preserve the list structure and protect the value that
was in $label so it will survive correctly until the button is clicked:

proc mybutton2 { parent self label } {
    if {$parent == "."} {
        set myname $parent$self
    } else {
        set myname $parent.$self
    }
    button $myname -text $label -command list puts stdout $label
    pack append $parent $myname {left fill}
}

In this case, list will "do the right thing" and massage the value
of $label so that it appears as a single list element with respect
to the invocation of puts.  The command string for the button will be:
    puts stdout {a b c}

The second place you experience this problem is when composing
commands to be evaluated from inside C code.  If the example is at
all complex, you'll want to use Tcl_Merge to build up the command
string before passing it into Tcl_Eval.  Tcl_Merge takes an
argc, argv parameter set and converts it to a string while preserving
the list structure.  That is, if you pass the result to Tcl_Eval,
argv[0] will be interpreted as the command name, and argv[1]
up through argv[argc-1] will be passed as the parameters to
the command.  Note that Tcl_VarEval *does not* make this guarantee.
Instead, it behaves more like double-quotes by concatinating
all its arguments together and then reparsing to determine list structure.

ANOTHER GOTCHA

Now, let's extend this example with another feature that I've
found thorny.  Suppose I want the caller of mybutton2 to
be able to pass in more arguments that will be passed to
the button primitive.  Say they want to fiddle with the
colors of the button.  Now I can add the special parameter "args"
to the end of the parameter list.  When mybutton3 is called,
the variable args will be a list of all the remaining arguments.
The naive, and wrong, approach is:

proc mybutton3 { parent name label args} {
    if {$parent == "."} {
        set myname $parent$self
    } else {
        set myname $parent.$self
    }
    button $myname -text $label -command list puts stdout $label $args
    pack append $parent $myname {left fill}

}

This is wrong because button doesn't want a sublist of more arguments,
it wants many arguments.  So, how am I gonna stick
the value of $args onto my button command.  Or, said another way,
how am I going to create the proper list structure?
It is tempting to do the following:

    eval "button $myname -text $label -command list puts stdout $label $args"

However, this construct causes things to go through the evaluator twice,
which will lead to unexpected results.  The double
quotes will allow substitution, so, again, if $label has spaces,
then the button command will not like its argument list.
Another (ugly) try:

    eval "button \$myname -text \$label -command [list puts stdout \$label\] $args"

Now $args is the only variable that is evaluated twice, once to remove its
outermost list structure, and the second time as individual arguments to
the button command.   I think a better approach is the following:

    eval concat {button $myname -text $label -command [list puts stdout $label} $args]

In this case, $args is evaluated twice, once before the call to
concat, and a second time explicitly by calling eval.  The
stuff between the curly braces is protected against substitution
on the first pass, however, (which is good), and so all
concat ends up doing is stripping off the outermost list structure
(the curly braces) from its two arguments and putting a space
between them.  Another, perhaps clearer way of writing this is:

    set cmd {button $myname -text $label -command list puts stdout $label}
    eval concat $cmd $args

Now, with this form it is fairly clear(?) that the items in the button
command and the $args list will only be evaluated one time.  Finally,
it turns out you can eliminate the explicit call to concat because
eval will do that for us if it is given multiple arguments:

    set cmd {button $myname -text $label -command list puts stdout $label}
    eval $cmd $args

Which leads us back to:

    eval {button $myname -text $label -command list puts stdout $label} $args

Note that this was long before the introduction of {*}-syntax, which has a better solution to the problem:

button $myname -text $label -command [list puts stdout $label] {*}$args

PYK: There is also a copy of this at [L2 ]