Version 4 of Tcl Rules Redux

Updated 2015-03-01 14:40:34 by pooryorick

The Tcl rules fully specify both list and script syntax, but it isn't always clear which parts apply to which syntax. Tcl Rules Redux specifies the same language, first describing list syntax and then describing script syntax in terms of list syntax.

Lists

A list is a string containing a sequence of words separated by whitespace.

Backslash interpretation: \ and the subsequent character are interpreted simply as that subsequent character, which is useful for representing characters such as ", \, braces, and whitespace, that normally have special meaning. The following backslash sequences have special interpretation:

\a
Audible alert (bell) (Unicode U+000007).
\b
Backspace (Unicode U+000008).
\f
Form feed (Unicode U+00000C).
\n
Newline (Unicode U+00000A).
\r
Carriage-return (Unicode U+00000D).
\t
Tab (Unicode U+000009).
\v
Vertical tab (Unicode U+00000B).
\\
Backslash (“\”).
'\ooo'
The digits ooo (one, two, or three of them) give a eight-bit octal value for the Unicode character that will be inserted, in the range 000–377 (i.e., the range U+000000–U+0000FF). The parser will stop just before this range overflows, or when the maximum of three digits is reached. The upper bits of the Unicode character will be 0.
'\xhh'
The hexadecimal digits hh (one or two of them) give an eight-bit hexadecimal value for the Unicode character that will be inserted. The upper bits of the Unicode character will be 0 (i.e., the character will be in the range U+000000–U+0000FF).
'\uhhhh'
The hexadecimal digits hhhh (one, two, three, or four of them) give a sixteen-bit hexadecimal value for the Unicode character that will be inserted. The upper bits of the Unicode character will be 0 (i.e., the character will be in the range U+000000–U+00FFFF).
'\Uhhhhhhhh'
The hexadecimal digits hhhhhhhh (one up to eight of them) give a twenty-one-bit hexadecimal value for the Unicode character that will be inserted, in the range U+000000–U+10FFFF. The parser will stop just before this range overflows, or when the maximum of eight digits is reached. The upper bits of the Unicode character will be 0.

The range U+010000–U+10FFFD is reserved for the future.

A word enclosed in quotes (") consists of the string between the quotes. The word is subject to backslash interpretation.

A word enclosed in braces ({}) consists of the string between the braces. Brace pairs occuring within the word are ignored for the purpose of finding the matching enclosing brace. The word is not subject to backslash interpretation but a brace preceded by a backslash is ignored for the purpose of finding the matching enclosing brace.

Scripts

A script is an ordered sequence of lists separated by a newline or semicolon. Scripts start with the rules for lists and add the following rules:

One additional special backslash sequence:

\<newline><whitespace>
A backslash and subsequent newline, followed by any combination of space, tab, and newline characters, is replaced by a single space character. This backslash sequence is unique in that it is replaced in a separate pre-pass before the command is actually parsed. This means that it is replaced even when it occurs between braces, and the resulting space will be treated as a word separator if it is not in braces or quotes.

Comment: A number sign (#) at the beginning of a command and not otherwise escaped begins a comment that ends at the first newline. A newline escaped by a backslash is ignored for the purpose of finding the end of the comment.

Substitutions occur at the beginning, within, or at the end of a word, and do not change the boundaries of the word. A word enclosed in quotes is subject to script and variable substitution, but a word enclosed in brackets is not.

Script substitution: A string enclosed in brackets at any position in a word is interpreted as a script and is replaced by the result of the evaluation of that script, i.e. by the result of the final command in the script.

Variable substitution: $ followed by a variable name is replaced by the value of the corresponding variable. The variable name is not subject to backslash interpretation or script substitution. If enclosed in braces, the variable name is composed of all characters up to the matching right brace. Otherwise, a variable name is composed only of letters, digits, underscore, the empty string, or namespace separators. Any other character marks the end of the name. In a variable name, a pair of parenthesis encloses the name of a member variable within a named array. The member variable name is subject to backslash interpretation and substitutions.

Argument Expansion: If {*} is placed in front of a word, each word within that word becomes a word in the command. Backslah interpretation and substitutions occur before argument expansion. For the purpose of braces or quotes, the character after the initial {*} is considered the beginning of the word.