This is the set of 13 rules that define the [Cloverfield] project language. For comparison, see Tcl's [Dodekalogue]. References to the language name use the placeholder ''''. ---- **Rules** The following rules define the syntax and semantics of the '''' language: ***[[1]] Commands.*** A '''' script is a string containing one or more commands. Semi-colons and newlines are command separators unless quoted as described below. Close brackets are command terminators during command substitution (see below) unless quoted. ''[FB]: Identical to Tcl rule [[1]].'' ---- ***[[2]] Evaluation.*** A command is evaluated in two steps. First, the '''' interpreter breaks the command into words and performs substitutions as described below. These substitutions are performed in the same way for all commands. The first word is used to locate a command procedure to carry out the command, then all of the words of the command are passed to the command procedure. The first word is implicitly prefixed by the argument expansion word modifier described in rule [[11.3]], and is recursively flattened until it forms an atom (i.e. does not form a valid list with more than one element). For instance, `“{{{cmd a b} c d} e f} g h”` is equivalent to `“cmd a b c d e f g h”`. The command procedure is free to interpret each of its words in any way it likes, such as an integer, variable name, list, or '''' script. Different commands interpret their words differently. ''[FB]: This is Tcl rule [[2]] with auto-expansion of leading word added.'' ---- ***[[3]] Words.*** Words of a command are separated by white space (except for newlines, which are command separators). ---- ***[[4]] Double quotes.*** If the first character of a word is double-quote (`“"”`) then the word is terminated by the next double-quote character. If semi-colons, close brackets, or white space characters (including newlines) appear between the quotes then they are treated as ordinary characters and included in the word. Command substitution, variable substitution, and backslash substitution are performed on the characters between the quotes as described below. However variable references are substituted with their current value at the time of substitution. The double-quotes are not retained as part of the word. ''[FB]: Identical to Tcl rules [[3]] and [[4]].'' ---- ***[[5]] Braces.*** If the first character of a word is an open brace (`“{”`) and rule [[11]] does not apply, then the word is terminated by the matching close brace (`“}”`). Braces nest within the word: for each additional open brace there must be an additional close brace. However, there are instances where characters lose their significance: * if a character is quoted with a backslash; * if a character is located after an unquoted double-quote, up to a matching double-quote, following rule [[4]]; * if a character is located after a hash character, up to the next (unescaped) newline, following rule [[10]]; * if a character is located after a raw data word modifier, up to the matching end tag, following rule [[11.4]]. No substitutions are performed on the characters between the braces except for backslash-newline substitutions described below, nor do semi-colons, newlines, close brackets, or white space receive any special interpretation. The word will consist of exactly the characters between the outer braces, not including the braces themselves. ''[FB]: This is a modified version of Tcl rule [[6]]. Brace matching rules are changed to accommodate with the new line comment rule ([[10]]) and the `{data}` word modifier (rule [[11.4]]). Double quotes must also be balanced, which means that the Tcl expression `{"}` is no longer valid and must be replaced by `\"` (which is Tcl-compatible). This also means that the parser must identify word starts correctly.'' ---- ***[[6]] Parentheses.*** If the first character of a word is an open parenthesis (`“(”`), then the rest of the word is itself broken into subwords until a matching close parenthesis (`“)”`) is found, and the subwords are in turn substituted in such a way that their boundaries are preserved. The resulting word will consist of exactly the characters between the outer parentheses after substitutions, not including the parentheses themselves or the commented characters or words. ''[FB]: This is a new addition. This quoting rules intends to make `[list]` obsolete and free for other uses, for example an ensemble command such as `[string]` or `[dict]`. Parentheses combine some of the properties of double quotes and braces, as they preserve word boundaries and white spaces while allowing deep substitution. Moreover they make line continuations useless. Think of parentheses as word-preserving, recursive double quotes:'' ====== % set a {1 2} % set b {3 4} % set s1 "$a $b" 1 2 3 4 % set s2 [list $a \ $b] % {1 2} {3 4} % set s3 ($a $b) {1 2} {3 4} ====== ---- ***[[7]] Command substitution.*** If a word contains an open bracket (`“[[”`) then '''' performs command substitution. To do this it invokes the '''' interpreter recursively to process the characters following the open bracket as a '''' script. The script may contain any number of commands and must be terminated by a close bracket (`“]]”`). The result of the script (i.e. the result of its last command) is substituted into the word in place of the brackets and all of the characters between them. There may be any number of command substitutions in a single word. Command substitution is not performed on words enclosed in braces. ''[FB]: Identical to Tcl rule [[7]].'' ---- ***[[8]] Variable substitution and reference.*** If a word contains a dollar-sign (`“$”`), not followed by an ampersand (`“&”`), followed by one of the forms described below, then '''' performs variable substitution: the dollar-sign and the following characters are replaced in the word by the value of a variable. If an ampersand immediately follows the dollar-sign, then '''' performs variable reference: the dollar-sign and the following characters are replaced in the word by a reference to a variable, which will be dereferenced each time its value is needed. Note that the referenced variable is the one that existed at the time of substitution. The reference remains valid even if the variable is deleted for some reason (like a local variable when a procedure returns). Variables are specified by two parts: the name part and the index part. The name part of the variable that immediately follows the dollar-sign or dollar-sign plus ampersand may take any of the following forms: `name`: Name is the name of a variable; the name is a sequence of one or more characters that are a letter, digit, underscore, or namespace separators (two or more colons). `"name"`: Name is the name of a variable; the name is a sequence of arbitrary characters which are parsed according to rule [[4]]. `{name}`: Name is the name of a variable; the name is a sequence of arbitrary characters which are parsed according to rule [[5]]. `(name)`: Name is the name of a variable; the name is a sequence of arbitrary characters which are parsed according to rule [[6]]. `[[name]]`: Name is the name of a variable; the name is a sequence of arbitrary characters which are parsed according to rule [[7]]. `$name`: Name is the name of another variable that holds the name of the variable to use. For example `“$$var”` returns the value of the variable whose name is the value of variable `“var”`, while `“$&$var”` returns a reference to the same variable. This form is recursive. The index part immediately follows the name part and may take any of the following forms: `name{index ?index...?}`: Vector access semantics use the name of a variable followed by zero or more ''indices'' enclosed between braces. If no indices are presented, the sequence of characters is simply replaced by the value of the `“name”` variable. When presented with a single index, it is replaced by the `“index”`'th element from it (0 refers to the first element of the vector). If `“index”` is negative or greater than or equal to the number of elements in the variable value, then an empty string is returned. If `“index”` has the value `“end”`, it refers to the last element in the variable value, and `“end-''integer''”` refers to the last element in the variable value minus the specified integer offset. If additional `“index”` arguments are supplied, then each argument is used in turn to select an element from the previous indexing operation, allowing the script to select elements from subvalues. So `“$name{1 2 3}”` is synonymous with `“$name{1}{2}{3}”`. `name{index..index}`: Vector range semantics uses a couple of indices to select an inclusive range of elements in a vector. The sequence of characters is replaced by the matching subrange of elements in the variable value. `name(key ?key...?)`: Keyed access semantics use the name of a variable followed by zero or more ''keys'' enclosed between parentheses. If no keys are presented, the sequence of characters is simply replaced by the value of the `“name”` variable. When presented with a single key, it is replaced by the element designated by `“key”` in the variable value. If additional `“key”` arguments are supplied, then each argument is used in turn to select an element from the previous indexing operation, allowing the script to select elements from subvalues. So `“$name(a b c)”` is synonymous with `“$name(a)(b)(c)”`. In the absence of a recognized index part, the name designates a scalar variable. There can be several index parts that designate subparts of the variable recursively. There may be any number of variable substitutions or references in a single word. ''[FB] 20080310: Following [AMG]'s advice, removed `name(path)` and added `name(key ?key...?)` for symmetry. To use paths, just use the argument expansion modifier, e.g., `$name({*}$path)`. Rewrote the paragraph using the doc for '''[lindex]'''.'' ''[FB]: Here Tcl rule [[8]] is extended to allow variable name forms that are more consistent with the other rules. For example, in Tcl the following code fails:'' ====== % set foo{} bar % puts $foo{} can't read "foo": no such variable % puts ${foo{}} can't read "foo{": no such variable % puts $"foo{}" $"foo{}" ====== ''That's because brace matching rules are different in Tcl rules [[6]] and [[8]]. Rule [[8]] stops at the first close brace, while rule [[6]] keep them balanced.'' ''Moreover the double substitution `$$name` is clarified.'' ''Variable name and index parts are now distinguished. The new index part allows a Tcl array-like syntax using parentheses to access keyed values such as dicts, as well as a vector access semantics using braces. Internally this will use an interface model. This also means that traces must also work in depth (e.g. tracing a given dict key).'' ''Last, variable references are introduced. The goal is to allow cross-references between objects, as well as mixing mutable and immutable content, and blurring the line between mutable and immutable commands. For example, in Tcl:'' ====== % set d [dict create a 1 b 2] a 1 b 2 % dict replace $d a 3 ; # Immutable, does not modify d. a 3 b 2 % set d ; # d is unchanged. a 1 b 2 % dict set d a 3 ; # Mutable, modifies d. a 3 b 2 ====== ''Using references, we no longer needs two separate sets of commands for mutable and immutable operations (the typical `[concat]`/`[lappend]` dichotomy:'' ====== % set d [dict create a 1 b 2] a 1 b 2 % dict replace $d a 3 ; # Using variable value, the operation is immutable. a 3 b 2 % set d ; # d is unchanged. a 1 b 2 % dict replace $&d a 3 ; # Using variable reference, the operation is mutable. Same as [dict set d a 3] a 3 b 2 % set d a 3 b 2 ====== ''Tcl already performs mutable operations on objects that are not shared (i.e. whose refcount is <= 1), and perform copy-on-write otherwise. Passing a variable reference would suspend COW.'' ---- ***[[9]] Backslash substitution.*** If a backslash (`“\”`) appears within a word then backslash substitution occurs. In all cases but those described below the backslash is dropped and the following character is treated as an ordinary character and included in the word. This allows characters such as double quotes, close brackets, and dollar signs to be included in words without triggering special processing. The following table lists the backslash sequences that are handled specially, along with the value that replaces each sequence. \a: Audible alert (bell) (0x7). \b: Backspace (0x8). \f: Form feed (0xc). \n: Newline (0xa). \r: Carriage-return (0xd). \t: Tab (0x9). \v: Vertical tab (0xb). \whiteSpace: A single space character replaces the backslash, newline, and all spaces and tabs after the newline. \\: Backslash (`“\”`). \ooo: The digits ooo (one, two, or three of them) give an octal representation of an eight-bit value for the Unicode character that will be inserted. The upper bits of the Unicode character will be 0. \xhh: The hexadecimal digits hh give an hexadecimal representation of an eight-bit value for the Unicode character that will be inserted. Any number of hexadecimal digits may be present; however, all but the last two are ignored (the result is always a one-byte quantity). The upper bits of the Unicode character will be 0. \uhhhh: The hexadecimal digits hhhh (one, two, three, or four of them) give a hexadecimal representation of a sixteen-bit value for the Unicode character that will be inserted. ''[FB]: Identical to Tcl rule [[9]].'' ---- ***[[10]] Line comments.*** If a hash character (`“#”`) appears at a point where '''' is expecting the first character of a word, then the hash character and the characters that follow it, up through the next (unescaped) newline, are treated as a comment and ignored. ''[FB]: Differs from Tcl rule [[10]] that only expects comments on the first word.'' ---- ***[[11]] Word modifiers.*** If a word starts with a string that obeys rule [[5]] immediately followed by a non-whitespace character, then the leading part is a word modifier. The interpretation of the rest of the word depends on the form taken by this word modifier. Word modifiers serve varying purposes, such as modifying the behavior of the parser, or the interpretation of the word data. Recognized word modifiers are: ****[[11.1]] Null value. **** If a word is preceded by the modifiers `“{null}”` or `“{nil}”`, then the word is parsed as any other word, but not substituted. The word is then replaced by a special null value which is distinct from any other value, including the empty string. ****[[11.2]] Word comments. **** If a word is preceded by the modifier `“{#}”`, then the word is parsed as any other word, but not substituted. The word is finally ignored and removed from the command being substituted. For instance, `“cmd a {#}{b c} d”` is equivalent to `“cmd a d”`. ****[[11.3]] Argument expansion. **** If a word is preceded by the modifier `“{*}”`, then the word is parsed and substituted as any other word. After substitution, the word is parsed again without substitutions, and its words are added to the command being substituted. For instance, `“cmd a {*}{b c} d {*}{e f}”` is equivalent to `“cmd a b c d e f”`. ''[FB]: Identical to Tcl rule [[5]].'' ****[[11.4]] Raw data. **** If a word is preceded by the modifier `“{data}”`, then the first non-whitespace character sequence of the word forms a tag (the rest of the line is ignored). The word is terminated by the first occurrence of this tag in the following lines. No substitution is performed on the characters between the two tags. The word is then replaced by the text data strictly enclosed between the two lines containing the start and end tags. All characters following the start tag up to and including the newline, and preceding the end tag from and including the previous newline, are ignored. For instance: cmd {data}ABCDEF this is ignored foo bar baz #{\"[[$ this is also ignored ABCDEF a b c d is equivalent to `“cmd "foo bar baz #\{\\\"\[[\$" a b c d”` (notice the lack of terminating newline). This rule allows for the inclusion of arbitrary text data, (for example C code or XML data) without having to perform escapes needed to accomodate with '''''s parsing rules. ''[FB]: `{data}` is an implementation of the [here document] concept.'' ****[[11.5]] Metadata. **** If a word is preceded by the modifier `“{meta}”`, then the word is parsed and substituted as any other word. The metadata associated with the result is substituted into the word. If the `“meta”` string in the modifier is itself followed by a word, then this word is substituted first and associated as metadata with the word, which can be later queried as described above. It replaces any existing metadata. For instance: {meta foo}bar gives `“bar”` with an associated metadata `“foo”`. {meta}{meta foo}bar gives `“foo”`, which is the metadata associated with `“bar”`. {meta}{meta baz}{meta foo}bar gives `“baz”`, not `“foo”`. ****[[11.6]] Delayed substitution. **** If a word is preceded by the modifier `“{delay}”`, then the word is parsed as any other word, but not substituted. Substitution occurs when querying the word value for the first time, and in the context where this substitution occurs. The word whose substitution is delayed can take any form, including command or variable substitution. ****[[11.7]] References. **** If a word is preceded by the modifier `“{ref id}”`, where `“id”` is an arbitrary identifier string, then the word is parsed and substituted as any other word. If no reference exists with the given identifier, a new one is created using the scoping rules described below, and is associated with the resulting word. If the reference already exists, then the word is simply ignored. References are by default word-local, which means that the identifier is resolved in the context of the current word. Identifiers starting with an ampersand (`“&”`) designate variables in the current calling context. References that go out of scope (for example, locally scoped references returned from a procedure) are automatically converted to globally scoped references with a unique name in the form `“.ref”`, where `“”` is an integer. Applications should avoid defining global references with such names. ---- ***[[12]] Order of substitution.*** Each character is processed exactly once by the '''' interpreter as part of creating the words of a command. For example, if variable substitution occurs then no further substitutions are performed on the value of the variable; the value is inserted into the word verbatim. If command substitution occurs then the nested command is processed entirely by the recursive call to the '''' interpreter; no substitutions are performed before making the recursive call and no additional substitutions are performed on the result of the nested script. Substitutions take place from left to right, and each substitution is evaluated completely before attempting to evaluate the next. Thus, a sequence like set y [set x 0][incr x][incr x] will always set the variable y to the value, 012. ''[FB]: Identical to Tcl rule [[11]].'' ---- ***[[13]] Substitution and word boundaries.*** Substitutions do not affect the word boundaries of a command, except for for argument expansion as specified in rule [[11.3]]. For example, during variable substitution the entire value of the variable becomes part of a single word, even if the variable's value contains spaces. ''[FB]: Identical to Tcl rule [[12]] with rule [[5]] replaced with the matching rule [[11.3]].'' ---- **Discussion** : I don't know where the word "tridekalogue" comes from, but from the viewpoint of Greek it hurts a bit: 13 in Greek is "dekatria", not "trideka", as far as I know, so perhaps "dekatrialogue" would be better. (I don't know Greek, though, so a better authority should advise.) (An internet search reveals that "trideka" is Esperanto and means "30th", so I guess that was not the intended association...) ''[FB]: It derives from the Tridecagon (AKA Tiskaidecagon) which is "a 13-sided polygon" [http://mathworld.wolfram.com/Tridecagon.html], the Dodecagon being 12-sided. I swapped the 'c' for a 'k' to respect the same convention as Tcl's [Dodekalogue].'' : Thanks a lot for the explanation! I am quite surprised. It turns out there is a difference between ancient Greek (triskaideka (and variations)) and modern (dekatria). (So the hurt was solely due to my own ignorance. Sorry for the noise.) ---- [DKF]: It seems to me that the parts of rule [[11]] would probably be better off being separate rules as they're ''very'' different in nature from each other. ''[FB]]: Good suggestion. However I wanted to limit the number of rules to a small number (here 13). Moreover the many parts of rules [[11]] are not very different syntactically speaking (word modifiers share the same syntactic rules), but on the way they change the parsing, substitution and evaluation rules. Maybe we could use sub-rule numbers, e.g. '''[[11.1]] Null value'''.'' ---- ''[escargo] 10 Mar 2008'' - There are couple of issues with respect to definition of and treatment of white space. First, a uniform nomenclature for whitespace characters should be used uniformly on this page. So, instead of referring to "white space" or "white space characters" or other terms, "whitespace characters" could be used uniformly. ''[FB]: This is a good point, however you should blame the original [Dodekalogue] for that ;-) (I kept the same wording in most places).'' Second, with the advent of Unicode, the definition of whitespace characters needs to be reconsidered. (See, for example, http://en.wikipedia.org/wiki/Whitespace_%28computer_science%29 for a workable definition.) Personally, I think any character for which [[string is space $char]] is 1 should be considered to be a whitespace character. Likewise, the behavior of [[string trim ''string'']] (three arguments) should be to trim all characters for which [[string is space $char]] is 1. These behaviors are not currently true for [Tcl]. ''I beg to disagree. IHMO only the ASCII space chars should be considered white spaces (in the sense of word separators), because other whitespace chars may carry a special meaning. For example, character 160 is a non-breaking space and thus shouldn't be a word separator. However I agree that the behavior of [[string is space]] is inconsistent with the definition of white spaces in the Tcl's Dodekalogue and the present Tridekalogue. So maybe dropping all references to white spaces in favor of a proper definition of word separators in rule [[3]] would address this issue. Consequently [[string is]] would need a `separator`-like class for `[ \t\f\r\v]`.'' ---- **Implementation** [AMG]: I have made a reference interpreter in [Tcl]. It's missing some features, and it hasn't been exhaustively tested, but it does implement much of what is discussed above. See [Cloverfield - Parser]. ---- !!!!!! %| [Category Language] | [Category Cloverfield] |% !!!!!!