Version 39 of Yeti

Updated 2014-04-18 20:21:49 by AMG

Yet anothEr Tcl Interpreter - Generate an itcl parser for a BNF-like grammar

  • What: Yeti
  • Where: https://bitbucket.org/smh377/yeti
  • Description: Yeti allows you to create a parser via Tcl. You specify a grammar in a form similar to the BNF (Backus-Naur Form) and define Tcl scripts that are executed whenever a rule is matched. Generates an itcl parser to recognise the BNF.
  • Depends on Tcl/itcl/tcllib.
  • Currently at version 0.4.2.
  • Updated: 2/2013
  • Contact: mailto:[email protected] (Frank Pilhofer) or see bitbucket page

NOTE: Yeti was first written with Tcl 8.3 in mind, but hasn't seen updates in a long time and is now a bit out of date. I've forked it in order to remove references to outdated packages, but may eventually do some more serious optimization work. -st3ve

Usage example: Parsing C.


CMCc: Converted a YACC grammar to Yeti. It's a really great program - well written and very capable. I'm running into problems, though: how do I use Yeti to parse ambiguous grammars? Yeti doesn't seem to have any equivalent to YACC's %precedence or associativity operators, so it's hard to get it to parse something like '(a * b + x * y)' as '((a *b) + (x * y))'

AK: I don't know what Yeti does internally, LL(1), LR(1), LALR(1), ... It might be necessary to disambiguate the grammar to deal with such constructions.

AK, Aug 15: Yeti implements a basic LALR(1) parser. Reduce/Reduce conflicts are resolved via lookahead, shift/reduce conflicts are resolved in favor of shift (seems to be).

In contrast to Yacc and Bison no way to declare operator precedence and associativity information. This information has to be explicitly encoded into the grammar.

References:

IIRC (it's a long time since I looked at the dragon book) some commonly used language constructs (in, say, C) are inherently ambiguous. An example is if/then/else. YACC uses associativity to capture the fact that an else associates with its closest preceding if. CMcC

AK - Sure ? I thought that this was a classic example of a shift/reduce conflict which is solved in favor shifting. I associated (sic) associativity more with expressions and operations in them.

AM Most books I have read about the subject say that if/then/else constructs in C-like languages require some interference with the grammar - that is, the ambiguity needs to be resolved by means outside the theory...


AMG: I'm having trouble with processing typedef in C. I'm using a grammar derived from what I have on Parsing C, though quite a bit more fleshed out. If I wait until my external_declaration handler to call [$scanner addTypeName], then things mostly work except for this fatal sequence:

typedef int int_t;
int_t variable;

If I put anything (legal) between the two lines, for instance const, it works. The problem is with the single token of lookahead. Before invoking my external_declaration handler, yeti already read int_t from the scanner. Trouble is, [$scanner addTypeName] hasn't been called yet, so the scanner calls it IDENTIFIER and not TYPE_NAME.

Seems like a fundamental design limitation to me... any ideas?

My workaround is to do like the published code does and call [$scanner addTypeName] as early as possible, which would be the init_declarator handler. But since init_declarator isn't above storage_class_specifier in the tree, it won't know from its argument if typedef was specified. Solution to that is again like the published code: set a variable the very moment typedef is encountered.

Is this the best possible solution?