[Richard Suchenwirth] 2000-11-16 - Whitespace between parens, brackets, or braces doesn't matter much in real life, nor in some popular programming languages. In Tcl, it matters extremely (see also [Is white space significant in Tcl]). The parser has no rules for "keyword" syntax; it * groups according to braces and brackets; * breaks a script into commands by semicolons and newlines; * breaks a command into words by whitespace. * All commands are treated the same: the first word is the command, the others are its arguments. Ariel Burbaickij wrote in [the comp.lang.tcl newsgroup]: set a (b) [glob a*b] ;#(1) ''is not the same as'' set a (b)[glob a*b] ;#(2) ''Would you be so kind as to explain me how Tcl parses two expressions presented above (set a variations) ?'' Certainly, and I even add set a(b) [glob a*b] ;#(3) A PARSER'S MONOLOG (1) "Aha, I have four words delimited by whitespace: set, a, (b), and something in brackets that I will evaluate first, so recurse: * (1a) "Aha, I have two words, glob and a*b. The first is always the command, so I call 'glob' with the argument a*b. It returns e.g. "afoob abarb", two files that matched the specified pattern. Back in recursion, where I splice that result in the position where the bracketed command stood: (..1) The first is always the command, so I call 'set' with three arguments a, (b), "afoob abarb" (this is one word that contains a space). 'set' raises an error, where the message says 'wrong # args: should be "set varName ?newValue?"' ---- (2) "Aha, I have three words: set, a, and '(b)[[glob a*b]]'. The third has something in brackets, so I first evaluate that and recurse (see 1a above) (2..) So my third word is now "(b)afoob abarb". Call 'set' with these two arguments, it assigns the string value "(b)afoob abarb" to the variable a, and returns "(b)afoob abarb" to me too, just in case I need it." ---- (3) "Aha, I have three words: set, a(b), and [[glob a*b]] (continued as in (2) above..) (3..) Now I call ''set'' with the arguments "a(b)" and "afoob abarb". The set command (not me!) detects the array syntax in a(b) and assigns the string value "afoob abarb" to the element b in array a, creating one if it does not exist, or erroring if a is a scalar variable." "The set command and me would have shared work if the command would have been set a(b) $a(c) ;#(4) In this case, I see the dollar and know I shall substitute a variable. The parens tell me that it's element c in array a. So I retrieve its value (say ''grill'') and take that as the third word, so call the set command with "a(b)" and "grill". Continue as in (3..) above)." "And one more variation: set $a(b) $a(c) ;#(5) Perfectly valid Tcl again. For the second word, I retrieve the value of element b in array a (you may remember it is "grill" now), and the value of element c in array a (also "grill"). So I call ''set'' with the two arguments ''grill'' and ''grill''. ''set'' (not me!) takes the first as a variable name, the second as a string value, so assigns the string "grill" to the variable ''grill'' which it creates if not existent." Puzzled? But when you learn to think like the parser (the few rules on the Tcl manpage), Tcl really flies! Another interesting observation: * A ''line'' can contain ''several words'' (and mostly does, obviously) * A ''word'' (in the Tcl parser's sense) can contain several ''lines'' (and also very often does, for instance [proc] bodies; a ''word'' can extend over several pages of code - for instance if standing behind [namespace] eval $name...) With these two "bi-recursive" rules, a world of software can be built from minimal concepts... Anybody care to write down what a C(++) parser might think? [AMG] To the best of my knowledge, C/C++ only requires whitespace between consecutive ''alphanumeric'' tokens, with the exception that newlines are significant to the preprocessor. For the line "c++;", the monologue might be something like the following: * I'm at the beginning of a statement, so expect an lvalue or maybe a type definition or even the close of a previously-opened block. Oh, and be on the lookout for errors too. * I got the letter "c", which is probably the first letter of a variable name, or there's an outside chance it's the beginning of a line label. * I got a "+", which is not an alphanumeric, so the "c" I got before is the entire word. "+" isn't ":" so that "c" is a variable name, not a line label. Next expect a number, an alphanumeric variable name, or another "+". ",", ";", ")" would all be errors at this point because "+" requires two operands, and only one has been given. * I got another "+"; merge it with the previous character to make a "++" token signifying postincrement on the variable name already given. At this point expect any operator that doesn't require an lvalue. * Now I have ";", an operator which delimits statements. Let's review what I have in the statement. Token #1 is "lvalue c", token #2 is "postincrement an lvalue given in the previous token", resulting in "postincrement(lvalue c)". Spew out some assembly or other intermediate language, let the backend sort it out, and move on to the next character. I took the liberty of merging the lexical and grammatical analyses; as far as I know, they're typically separate, meaning that everything is first ''scanned'' into tokens (that's where my whitespace comment comes in) which are classified/identified ("postincrement" instead of "++") before the grammatical organization of the tokens is ''parsed'' to determine the semantical meaning of the program, resulting in intermediate language and/or syntax error messages. Surely it's much more involved than I describe, but I'm just giving you enough information to underscore the fundamental difference between Tcl and C. Tcl does a very simple lexical analysis (split into words according to whitespace and quoting rules), and since [everything is a string], there's no need for further classification. "Grammar" parsing is performed by the commands themselves when they inspect their arguments. C does all this work inside the language itself. ---- [[ [Arts and crafts of Tcl-Tk programming] | [Category internals] ]]