Yeti

Difference between version 50 and 51 - Previous - Next
'''Y'''et anoth'''E'''r '''T'''cl '''I'''nterpreter - Generate an itcl parser for a BNF-like grammar

    * What: Yeti    * Where (1): https://bitbucket.org/smh377/yeti ([AMG]: Unavailable, see below)
    * Where (2): https://github.com/mittelmark/yeti (Version 0.5.0, 2021, with some fixes)
    * Description: Yeti allows you to create a parser via Tcl.  You specify a grammar in a form similar to the BNF (Backus-Naur Form) and define Tcl scripts that are executed whenever a rule is matched. Generates an itcl parser to recognise the BNF.
    * Depends on Tcl/[itcl]/[tcllib].
    * Currently at version 0.4.2.
    * Updated: 2014-04-18
    * Contact 1: mailto:[email protected] ([st3ve])
    * Contact 2: mailto:[email protected] ([Frank Pilhofer]) or see bitbucket page    * Contact 3: [DDG]

'''NOTE:''' Yeti was first written with Tcl 8.3 in mind, but hasn't seen updates in a long time and is now a bit out of date.  I've forked it in order to remove references to outdated packages, but may eventually do some more serious optimization work.  -[st3ve]

Usage examples: [Parsing C], [Parsing C Types], [calc].

[AMG]: The yeti distribution also includes [ylex].
**Discussion**

----
[CMCc]: Converted a YACC grammar to Yeti.  It's a really great program - well written and very capable.  I'm running into problems, though:  how do I use Yeti to parse ambiguous grammars?  Yeti doesn't seem to have any equivalent to YACC's %precedence or associativity operators, so it's hard to get it to parse something like '(a * b + x * y)' as '((a *b) + (x * y))'

[AK]: I don't know what Yeti does internally, LL(1), LR(1), LALR(1), ... It might be necessary to disambiguate the grammar to deal with such constructions.

[AK], Aug 15: Yeti implements a basic LALR(1) parser. Reduce/Reduce conflicts are resolved via lookahead, shift/reduce conflicts are resolved in favor of shift (seems to be).

In contrast to Yacc and Bison no way to declare operator precedence and associativity information. This information has to be explicitly encoded into the grammar.

References:

   * [http://lambda.uta.edu/cse5317/notes/node20.html%|%SLR(1), LR(1), and LALR(1) Grammars]
   * [http://lambda.uta.edu/cse5317/notes/short.html%|%Design and Construction of Compilers]
   * [http://dickgrune.com/Books/PTAPG_1st_Edition/%|%Parsing Techniques - A Practical Guide]

IIRC (it's a long time since I looked at the dragon book) some commonly used language constructs (in, say, C) are inherently ambiguous.  An example is ''if/then/else''.  YACC uses associativity to capture the fact that an ''else'' associates with its closest preceding ''if''. [CMcC]

[AK] - Sure ? I thought that this was a classic example of a shift/reduce conflict which is solved in favor shifting. I associated (sic) associativity more with expressions and operations in them.

[AM] Most books I have read about the subject say that if/then/else constructs in C-like languages require some interference with the grammar - that is, the ambiguity needs to be resolved by means outside the theory...

[AMG]: Neat to see [AK] referencing UTA, my alma mater. :^)

----
[AMG]: I'm having trouble with processing typedef in C.  I'm using a grammar derived from what I have on [Parsing C], though quite a bit more fleshed out.  If I wait until my `external_declaration` handler to call [[$scanner addTypeName]], then things mostly work except for this fatal sequence:

======c
typedef int int_t;
int_t variable;
======

If I put anything (legal) between the two lines, for instance `const`, it works.  The problem is with the single token of lookahead.  Before invoking my `external_declaration` handler, yeti already read `int_t` from the scanner.  Trouble is, [[$scanner addTypeName]] hasn't been called yet, so the scanner categorizes it `IDENTIFIER` and not `TYPE_NAME`.

Seems like a fundamental design limitation to me... any ideas?

My workaround is to do like the published code does and call [[$scanner addTypeName]] as early as possible, which would be the `init_declarator` handler.  But since `init_declarator` isn't above `storage_class_specifier` in the tree, it won't know from its argument if `typedef` was specified.  Solution to that is again like the published code: set a variable the very moment `typedef` is encountered, thereby establishing a communication link from one subtree to another.

Is this the best possible solution?

----
[AMG]: Has anyone considered putting yeti and [ylex] into [tcllib]?

----
[AMG]: The [st3ve] versions of Yeti have vanished from the Internet, along with [https://bitbucket.org/smh377].  The [Frank Pilhofer] 0.4.1 version is still available at [http://www.fpx.de/fp/Software/Yeti/].  I have a copy of Yeti version 4c010893cd39, which is 0.4.2 or thereabouts, originally retrieved from [https://bitbucket.org/smh377/yeti/get/4c010893cd39.zip].  I think the last change was 29 April 2014, to ylex.tcl.  For now, I'm hosting the archive at [https://andy.junkdrome.org/tmp/yeti-4c010893cd39.tar.gz], but as I said above, this ought to move into tcllib.
----

[DDG] - 2021-10-02: I did a few fixes (version mismatches and a method fix) and uploaded my version as 0.5.0 to https://github.com/mittelmark/yeti - a tcl++ version will be added as well later. Used the package with success in a PGN notation parser recently.

<<categories>> Package | Glossary | Parsing