Version 31 of double substitution

Updated 2016-01-16 12:04:19 by Tcler

An understanding of double substitution is necessary to avoid injection attacks.

Tcl parses a command, performs substitutions on the words of the command, and then executes the command. Some commands in turn perform additional processing of the arguments they receive. eval, subst, and various other commands interpret their arguments as a script to be executed, and pass the script back to Tcl for another round of evaluation. expr, if, and while parse their arguments according to the grammar of expr, resulting in another round of variable and command substitution. Double substitution refers to the substitution that happens during these additional rounds of interpretation.

Double substitution can be useful, but is more often the result of inadvertently neglecting to brace the arguments to expr or to the expr arguments of if and while. static syntax analysis tools can be used to locate these occurrences in a script.

KBK: Braced expressions on if, while, and expr aren't just safer, they're also much, much faster. Unbraced ones have to be parsed at run time; braced ones can be compiled down to very tight bytecode sequences.

See Also

Brace your expr-essions
avoiding unintended double substitution
static syntax analysis
analysing script content prior to its execution

Description

Rule 2 says that a Tcl command is evaluated in two steps. First, the Tcl interpreter breaks the command into words and performs substitutions. Then the interpreter uses the first word as a command name, calls the command, and passes the rest of the words to the command as arguments.

By the time a command gets to see its arguments, Tcl has already performed its various substitutions on those arguments. Many commands in turn perform their own substitutions on their arguments and/or pass those arguments back to the interperter to be evaluated scripts in their own right. The quintessential example is expr, but it is not the only one. Here are some other examples:

after
argument is a script to be evaluated later
expr
In expr scripts, $ causes variable substitution and [ causes command substition.
if
second argument is an expr
set
interpets ( as array element access
list
interprets \ as backslash substitution, " and { as a grouping operators. This is true for all commands that interpret their arguments as lists
eval
concatenates its arguments, interprets the result as a single Tcl script, and executes it.
regexp and regsub
certain arguments are parsed as regular expressions
string match
interprets *, ?, [, and \
trace
argument is a script to be evaluated later
while
second argument is interpreted according as an expression

Each of these commands can be thought of as a separate interpreter that implements its own mini-language. In the case of eval and some others, the mini-language is just Tcl again, but anywhere a command is performing some sort of interpretation on some of the characters in its arguments, there are two layers of interpretation happening: Tcl performs its substitutions on command arguments first, and then the command may perform its own substitutions. Essentially, a script is being composed and then evaluated at runtime. This is a natural part of the design of Tcl, but if done incorrectly, it can leave one vulnerable to injection attacks, so it is important to understand and be aware of when double substitution occurs. This means understanding how each employed command use operates, and how it interprets its arguments. The standard rules of Tcl describe the syntax of Tcl, and each additional command documents its own parsing and interpretation behaviour.

Arguments to expr should almost always be braced because it avoids the first layer of substitution by the Tcl interpreter. The same is true for the first argument to if and while. This is mentioned on the Tcl Style Guide page and is discussed a bit on A Question of Style.

Here is an example of a common mistake:

#warning:  bad code ahead!
set myString hello
if "$myString eq {}" {puts {empty string}}

By the time if sees its first argument, it looks like this:

hello eq {}

hello was quoted for Tcl when it performed its substitutions, but it then was not quoted for expr (via if), which currently requires that literal strings be enclosed in double quotes or braces, so an error occurs.

#warning:  bad code ahead!
set myString {hello there}
if $myString eq {} {puts {empty string}}

In this case, expr gets the following values for its argument:

hello there eq {}

which is even more of an error because now expr can't make sense of the number of arguments.

Note that Tcl did not substitute away the curly brackets after eq, since the Braces and Double quotes rules only apply when a brace or double quote occur at the beginning of a word

A complex example

In the following example, there are many issues:

#warning: bad code ahead!
set myString "This is a string with \[special characters\}"
if $myString eq {} {puts {empty string}}

expr, via if, receives the following value,

This is a string with [special characters} eq {}

tries to parse it as a script, and errors at This, which is not quoted, violating the syntax rules of expressions. Additionally, there is no corresponding right square bracket for the left square bracket that signals command substitution, and no corresponding left curly bracket for the right curly bracket. Even if those problems were fixed, the sequence of words is still nonsense to expr.

The best course of action is to prevent the Tcl substitutions with curly braces:

if {$myString eq {}} {puts {empty string}}

Now expr receives the following argument, which is a well-formed expression:

$myString eq {}