Version 24 of BNF for Tcl

Updated 2008-01-07 01:06:43 by KJN

For years, newcomers to Tcl have asked for its "BNF grammar" (see "Acronym collection"). The comp.lang.tcl newsgroup then typically hosts a more-or-less unproductive confrontation between Tcl'ers rightly asserting that the questioner doesn't really want a BNF for Tcl, and the "outsider" rightly claiming that, yes, that's exactly what he has in mind. It's time to collect (someone else can organize) the facts on this.


It seems like the first response to a request for the grammer chart would be to ask the purpose? Is it to better understand Tcl? Then reading the rest of this page should point people to documents which more clearly describe how Tcl works.

If, however, the purpose is to write some yacc/lex code for parsing, then perhaps the response should be to point people to code which accomplishes this. Of course, some people prefer (or are required by factors) to not make use of existing code.


It's traditional to describe many languages (ALGOL-derived ones, broadly), including C, Java, and Perl (doesn't [L1 ] look like the artifact of a serious language?), with their BNF. Languages such as Forth, Lisp, and Tcl, though, have degenerate syntaxes, designed to give just enough power to implement extensibility. All the functionality in these languages derives from the application of library elements, not syntactic expression. It's moreover typical of the latter that their apparent semantics are mutable at run-time; Lisp, Forth, and Tcl programmers can freely redefine if or while. CL thinks it important to note, though, that production code in the year 2001 relatively rarely changes such control structures.


Donal Fellows writes on this question, "The problem with a standard static grammar for Tcl stems from the fact that the language isn't static (technically, I believe it belongs to a different class of languages - the context-sensitive languages - that cannot be properly parsed by anything less than a Turing machine, and BNF-based grammars are just not capable of recognising such things.) An example might be a Tcl script that calls [package require Tk] and suddenly becomes extended with a whole bunch of new commands with non-trivial syntaxes like:

  button .b -text "foo bar" -command {puts boo}

Worse still, you can even imagine Tcl scripts that contain fragments from some other language (e.g. (most?) database extensions let you write SQL, and that's a completely different language.)

Still, you might be able to do something with an interesting subset of the language."

morgan mair fheal replies, "all nontrivial languages have a type 1 or 0 grammar but few language definitions give anything formally beyond a type 2 or 3 grammar

so you can say tcl contains a very simple type 2 and then explain rest of it with a type 0 grammar or verbosity or pointing to parser

the difference with c is how much you can load on type 2 grammar and how much you have to define elsewise" ...


Consider the following one-liner to see how changeable the language is:

 proc - args {uplevel 1 [lindex $args end] [lrange $args 0 end-1]}

after which Tcl commands can also be called in Reverse Polish Notation style, e.g.

 - x 5 set
 - {$x<0} {- negative puts} else {- positive puts} if

Doesn't work in procs, though... (RS)


Many times when Tcl outsiders comment on "Tcl's syntax", what they really have in mind is the syntax of particular Tcl commands. Expert Tcl users make a clear distinction between the syntax of the core, "content-free" language, and the library of implemented commands, which can and frankly do all kinds of idiosyncratic and non-uniform things. The usual implementation of clock, in fact, involves a small Lex-Yacc source.


If the real goal is to parse Tcl source, MUCH the best vehicle is Tcl itself. The tclparser extension is particularly apt for this, though there are several other implementations.


In July, 1993, Terrence Monroe Brannon posted message <[email protected]>, which contained flex lexer for a tcl to c compiler he was working on at the time. He also posted some yacc code for the same work <[email protected]>. [CL will googlify these refs when he makes time, so they're "clickable".]


[cultural conflict--Tcl'ers don't argue about grammar the way others do. Action is in library, extensibility definitions, ...]


So what is Tcl's grammar?

  • A script (program source, text) is a sequence of zero or more statements.
  • A statement is either a comment or a command list (why aren't comments just no-op commands? 'Sure seems as though that would simplify things ... -- one reason appears to be to supress command substitution.)
  • A command list is a list of zero or more words.
  • Words are more-or-less arbitrary strings, possibly including white space, lexified with just a few special characters. The traditional reference for parsing Tcl at this level is the "Tcl.n man page" [L2 ].

[Is the description above accurate? Does it handle comments and newlines at least consistently?]

Peter Lewerin 2001-05-25:

I found this syntax summary on the web in March 2001, but couldn't find it again now: ( in 2002, this summary has been added in the wiki: An Introduction to Tcl Scripting)

  • Script = commands separated by newlines or semicolons.
  • Command = words separated by white space.
  • $ causes variable substitution.
  • [] causes command substitution.
  • "" quotes white space and semi-colons.
  • {} quotes all special characters.
  • \ quotes next character, provides C-like substitution.
  • # for comments (must be at beginning of command).

The following hints about the parsing process are probably helpful too:

  • Word separation (grouping), and variable and command substitution, are performed in a single pass through a command.
  • As braces ({}) and double quotes ("") quote white space, their contents are regarded as a single word by the parser.
  • The first word in a command is read as the name of a built-in or defined command procedure.

RS: I would put it this way:

 {}     group (keep contents together as one word)
 ""     group and substitute (\,$,[])
 []     group(substring), substitute, eval, replace [..] with results

Peter Lewerin: but doesn't group in the formal sense, only in the sense that the result does not break grouping.


KBK 04 April 2002 - The complete syntax of the core Tcl language fits on a single page. Peter Lewerin calls it the Endekalogue, and it's in the manual page labelled, 'Tcl' [L3 ].

It has the rules that Peter spells out above, presented more formally. If you understand it fully, you understand Tcl. That's the beauty of the language. Commands add their own interpretations of arguments, but the basic syntax of the language is always covered by the same eleven rules.


Semi-formal definitions for strings as lists

The way in which strings are parsed as lists is documented in the lindex manual page.

Peter Lewerin: During a April 2002 thread on news:comp.lang.tcl : [L4 ], some confusion arose (mostly in my head, but anyway) on the nature of strings parsed as lists or as scripts. I'd like to suggest the following simple definitions for lists:

Well-formed list: any string s for which

    string equal "{$s}" [list $s]

returns 1. This definition excludes strings without whitespace.

    proc test-well-formed s {
        string equal "{$s}" [list $s]
    }
    test-well-formed "a b c"    ;# => 1
    test-well-formed "a { c"    ;# => 0
    test-well-formed "a b\nc"   ;# => 1

RS uses this well-formedness test:

   proc isList x {expr {![catch {llength $x}]}}

Canonical list: any well-formed list s for which

    string equal $s [split $s]

returns 1.

    proc test-canonical s {
        string equal $s [split $s]
    }
    test-canonical "a b c"      ;# => 1
    test-canonical "a { c"      ;# => 0
    test-canonical "a b\nc"     ;# => 0

DKF: That's a really really really funky definition of "canonical" you're using there! Try this one.

   proc test-canonical s {
      if [catch {llength $s}] {return 0}
      string equal $s [list {*}$s]
   }

From one perspective, "Funky Tcl extensibility" expresses one limit to the ability of conventional parsing to encompass Tcl.


13may04 jcw - Tcl has no BNF, it uses more a macro-processing model for its language style. But I was wondering: is there a deep reason for that? Is there anything that prevents having a classical syntax parsing step, while still maintaining the "everything is a string" and "copy on write" mantra? A bit like "expr", but at the statement level, with control structures and all?

CMCc - isn't it the case that every statement is a list, terminated by ; or \n. The simpler question is then: is BNF sufficiently powerful to represent any possible list?

DKF: The endekalogue can be written using BNF quite easily. It just doesn't tell you a vast amount the deeper meaning of Tcl.

KJN: One way to avoid the "more-or-less unproductive confrontation" mentioned at the start of the page is to give the questioner the BNF for the endelalogue/dodekalogue.

jcw - Whoops, I think I asked the wrong question, sorry. I mean would it be possible to fit a traditional BNF-type language onto the Tcl core? Like, say: "if (a%10==0) { b = 12; print('b = ' + b, newline=0) }"? IOW, a completely different parsing step. Due to Tcl's dynamism, it could all happen at runtime.

Lars H: Isn't expr the example of doing this? It's small, but it has a very traditional BNF grammar. Some other command could be created which essentially evaluated code in language X rather than mathematical expressions. Some companion of proc could be created which took bodies in language X rather than in Tcl. And so on.

In 2006, the L language became an example of this "language X".


Arts and crafts of Tcl-Tk programming