This is the set of 13 rules that define the [Cloverfield] project language. For comparison, see Tcl's [Dodekalogue]. ---- The following rules define the syntax and semantics of the '''' language: **[[1]] Commands.** A '''' script is a string containing one or more commands. Semi-colons and newlines are command separators unless quoted as described below. Close brackets are command terminators during command substitution (see below) unless quoted. **[[2]] Evaluation.** A command is evaluated in two steps. First, the '''' interpreter breaks the command into words and performs substitutions as described below. These substitutions are performed in the same way for all commands. The first word is used to locate a command procedure to carry out the command, then all of the words of the command are passed to the command procedure. The first word is implicitly prefixed by the argument expansion word modifier described in rule [[11]], and is recursively flattened until it forms an atom (ie does not form a valid list with more than one element). For instance, “{{{cmd a b} c d} e f} g h” is equivalent to “cmd a b c d e f g h”. The command procedure is free to interpret each of its words in any way it likes, such as an integer, variable name, list, or '''' script. Different commands interpret their words differently. **[[3]] Words.** Words of a command are separated by white space (except for newlines, which are command separators). **[[4]] Double quotes.** If the first character of a word is double-quote (“"”) then the word is terminated by the next double-quote character. If semi-colons, close brackets, or white space characters (including newlines) appear between the quotes then they are treated as ordinary characters and included in the word. Command substitution, variable substitution, and backslash substitution are performed on the characters between the quotes as described below. However variable references are substituted with their current value at the time of substitution. The double-quotes are not retained as part of the word. **[[5]] Braces.** If the first character of a word is an open brace (“{”) and rule [[11]] does not apply, then the word is terminated by the matching close brace (“}”). Braces nest within the word: for each additional open brace there must be an additional close brace. However, there are instances where characters lose their significance: * if a character is quoted with a backslash; * if a character is located after an unquoted double-quote, up to a matching double-quote; * if a character is located after a hash character, up to the next (unescaped) newline, following rule [[10]]; * if a character is located after a raw data word modifier, up to the matching endtag, following rule [[11]]. No substitutions are performed on the characters between the braces except for backslash-newline substitutions described below, nor do semi-colons, newlines, close brackets, or white space receive any special interpretation. The word will consist of exactly the characters between the outer braces, not including the braces themselves. **[[6]] Parentheses.** If the first character of a word is an open parenthesis (“(”), then the rest of the word is itself broken into subwords until a matching close parenthesis (“)”) is found, and the subwords are in turn substituted in such a way that their boundaries are preserved. The resulting word will consist of exactly the characters between the outer parentheses after substitutions, not including the parentheses themselves or the commented characters or words. **[[7]] Command substitution.** If a word contains an open bracket (“[[”) then '''' performs command substitution. To do this it invokes the '''' interpreter recursively to process the characters following the open bracket as a '''' script. The script may contain any number of commands and must be terminated by a close bracket (“]]”). The result of the script (i.e. the result of its last command) is substituted into the word in place of the brackets and all of the characters between them. There may be any number of command substitutions in a single word. Command substitution is not performed on words enclosed in braces. **[[8]] Variable substitution and reference.** If a word contains a dollar-sign (“$”), not followed by an ampersand (“&”), followed by one of the forms described below, then '''' performs variable substitution: the dollar-sign and the following characters are replaced in the word by the value of a variable. If an ampersand immediately follows the dollar-sign, then '''' performs variable reference: the dollar-sign and the following characters are replaced in the word by a reference to a variable, which will be dereferenced each time its value is needed. Note that the referenced variable is the one that existed at the time of substitution. The reference remains valid even if the variable is deleted for some reason (like a local variable when a procedure returns). Variables are specified by two parts: the name part and the index part. The name part of the variable that immediately follows the dollar-sign or dollar-sign plus ampersand may take any of the following forms: name: Name is the name of a variable; the name is a sequence of one or more characters that are not word boundaries or the beginning of the index part. "name": Name is the name of a variable; the name is a sequence of arbitrary characters which are parsed according to rule [[4]]. {name}: Name is the name of a variable; the name is a sequence of arbitrary characters which are parsed according to rule [[5]]. (name): Name is the name of a variable; the name is a sequence of arbitrary characters which are parsed according to rule [[6]]. [[name]]: Name is the name of a variable; the name is a sequence of arbitrary characters which are parsed according to rule [[7]]. $name: Name is the name of another variable that holds the name of the variable to use. For example “$$var” returns the value of the variable whose name is the value of variable “var”, while “$&$var” returns a reference to the same variable. This form is recursive. The index part immediately follows the name part and may take any of the following forms: name{index ?index...?}: '' '' name{path}: '' '' name{index..index}: Vector access semantics use the name of a variable followed by a zero-based numerical index, a path formed by a list of indices, or a range of indices, enclosed between brackets. The sequence of character is replaced by the value of the element designated by the index, the path or the range in the name variable. Typically used on lists. name(key): Keyed access semantics use the name of a variable followed by a string key enclosed between parentheses. The sequence of character is replaced by the value of the element designated by “key” in the name variable. Typically used on dictionaries. In the absence of a recognized index part, the name designates a scalar variable. There can be several index parts that designate subparts of the variable recursively. There may be any number of variable substitutions or references in a single word. **[[9]] Backslash substitution.** If a backslash (“\”) appears within a word then backslash substitution occurs. In all cases but those described below the backslash is dropped and the following character is treated as an ordinary character and included in the word. This allows characters such as double quotes, close brackets, and dollar signs to be included in words without triggering special processing. The following table lists the backslash sequences that are handled specially, along with the value that replaces each sequence. \a: Audible alert (bell) (0x7). \b: Backspace (0x8). \f: Form feed (0xc). \n: Newline (0xa). \r: Carriage-return (0xd). \t: Tab (0x9). \v: Vertical tab (0xb). \whiteSpace: A single space character replaces the backslash, newline, and all spaces and tabs after the newline. \\: Backslash (“\”). \ooo: The digits ooo (one, two, or three of them) give an eight-bit octal value for the Unicode character that will be inserted. The upper bits of the Unicode character will be 0. \xhh: The hexadecimal digits hh give an eight-bit hexadecimal value for the Unicode character that will be inserted. Any number of hexadecimal digits may be present; however, all but the last two are ignored (the result is always a one-byte quantity). The upper bits of the Unicode character will be 0. \uhhhh: The hexadecimal digits hhhh (one, two, three, or four of them) give a sixteen-bit hexadecimal value for the Unicode character that will be inserted. **[[10]] Line comments.** If a hash character (“#”) appears at a point where '''' is expecting the first character of a word, then the hash character and the characters that follow it, up through the next (unescaped) newline, are treated as a comment and ignored. **[[11]] Word modifiers.** If a word starts with a string that obeys rule [[5]] immediately followed by a non-whitespace character, then the leading part is a word modifier. The interpretation of the rest of the word depends on the form taken by this word modifier. Word modifiers serve varying purposes, such as modifying the behavior of the parser, or the interpretation of the word data. Recognized word modifiers are: *** Null data. *** If a word is preceded by the modifiers “{null}” or “{nil}”, then the word is parsed as any other word, but not substituted. The word is then replaced by a special null value which is distinct from any other value, including the empty string. *** Word comments. *** If a word is preceded by the modifier “{#}”, then the word is parsed as any other word, but not substituted. The word is finally ignored and removed from the command being substituted. For instance, “cmd a {#}{b c} d” is equivalent to “cmd a d”. *** Argument expansion. *** If a word is preceded by the modifier “{*}”, then the word is parsed and substituted as any other word. After substitution, the word is parsed again without substitutions, and its words are added to the command being substituted. For instance, “cmd a {*}{b c} d {*}{e f}” is equivalent to “cmd a b c d e f”. *** Raw data. *** If a word is preceded by the modifier “{data}”, then the first non-whitespace character sequence of the word forms a tag (the rest of the line is ignored). The word is terminated by the first occurrence of this tag in the following lines. No substitution is performed on the characters between the two tags. The word is then replaced by the text data strictly enclosed between the two lines containing the start and end tags. All characters following the start tag up to and including the newline, and preceding the end tag from and including the previous newline, are ignored. For instance: cmd {data}ABCDEF this is ignored foo bar baz #{\"[[$ this is also ignored ABCDEF a b c d is equivalent to “cmd "foo bar baz #\{\\\"\[[\$" a b c d” (notice the lack of terminating newline). This rule allows for the inclusion of arbitrary text data, (for example C code or XML data) without having to perform escapes needed to accomodate with '''''s parsing rules. *** Metadata. *** If a word is preceded by the modifier “{meta}”, then the word is parsed and substituted as any other word. The metadata associated with the result is substituted into the word. If the “meta” string in the modifier is itself followed by a word, then this word is substituted first and associated as metadata with the word, which can be later queried as described above. It replaces any existing metadata. For instance: {meta foo}bar gives “bar” with an associated metadata “foo”. {meta}{meta foo}bar gives “foo”, which is the metadata associated with “bar”. {meta}{meta baz}{meta foo}bar gives “baz”, not “foo”. *** Delayed substitution. *** If a word is preceded by the modifier “{delay}”, then the word is parsed as any other word, but not substituted. Substitution occurs when querying the word value for the first time, and in the context where this substitution occurs. The word whose substitution is delayed can take any form, including command or variable substitution. *** References. *** If a word is preceded by the modifier “{ref id}”, where “id” is an arbitrary identifier string, then the word is parsed and substituted as any other word. If no reference exists with the given identifier, a new one is created using the scoping rules described below, and is associated with the resulting word. If the reference already exists, then the word is simply ignored. References are by default word-local, which means that the identifier is resolved in the context of the current word. Identifiers starting with an ampersand (“&”) designate variables in the current calling context. References that go out of scope (for example, locally scoped references returned from a procedure) are automatically converted to globally scoped references with a unique name in the form “.ref”, where “” is an integer. Applications should avoid defining global references with such names. **[[12]] Order of substitution.** Each character is processed exactly once by the '''' interpreter as part of creating the words of a command. For example, if variable substitution occurs then no further substitutions are performed on the value of the variable; the value is inserted into the word verbatim. If command substitution occurs then the nested command is processed entirely by the recursive call to the '''' interpreter; no substitutions are performed before making the recursive call and no additional substitutions are performed on the result of the nested script. Substitutions take place from left to right, and each substitution is evaluated completely before attempting to evaluate the next. Thus, a sequence like set y [[set x 0]][[incr x]][[incr x]] will always set the variable y to the value, 012. **[[13]] Substitution and word boundaries.** Substitutions do not affect the word boundaries of a command, except for word modifiers as specified in rule [[11]]. For example, during variable substitution the entire value of the variable becomes part of a single word, even if the variable's value contains spaces. ---- !!!!!! %| [Category WalledGarden] |% !!!!!!