grammar::peg can be used to create and manipulate parsing expression grammars. Superseded by pt::peg::container
Parsing expression grammers are similar but not equivalent to Context-Free Grammars, which may be familiar from BNF syntax specifications. Surprisingly, PEGs are both faster when matching (linear time as opposed to cubic time) and more powerful (at least no language has been shown context-free but not parsing expression) than CFGs.
Documentation:
2008-03-09: I hope the following may help others who are trying to use the "grammar::peg" package. These were tested under "etcl" (Evolane Tcl/Tk Engine Version 1.0-rc26).
1% # For now, just load the 'grammar::peg^ package 2% package require grammar::peg 0.1 3% #initialize a new (abeit empty) grammar. The start symbol will be, by default, 'epsilon^. 4% grammar::peg myGrammar ::myGrammar 5% # add a nonterminal to it 6% ::myGrammar nonterminal add myDigit {/ {t 0} {t 1} {t 2}} 7% # show its serialization 8% ::myGrammar serialize grammar::pegc {myDigit {/ {t 0} {t 1} {t 2}}} {myDigit value} epsilon 9% # change the start symbol to the nonterminal 'myDigit^ 10% ::myGrammar start {/ {n myDigit}} 11% # show its serialization 12% ::myGrammar serialize grammar::pegc {myDigit {/ {t 0} {t 1} {t 2}}} {myDigit value} {/ {n myDigit}} 13% # Now we show how you could, instead, have started with a list representing 14% # the grammar. Since "myGrammar" exists, we will make "myGrammarFromList". 15% package require struct::list 1.6.1 16% # (Incidentally, Andreas Kupries states: 'struct::list' is a package in Tcllib, like grammar::peg, and 17% # guesses that grammar::peg forgot to 'package require' it.) 18% set mySerialization [list grammar::pegc \ > {myDigit {/ {t 0} {t 1} {t 2}}} {myDigit value} {/ {n myDigit}} \ > ] grammar::pegc {myDigit {/ {t 0} {t 1} {t 2}}} {myDigit value} {/ {n myDigit}} 19% grammar::peg myGrammarFromList deserialize $mySerialization ::myGrammarFromList 20% # Next we attempt to use (interpret) the grammar on an input string. 21% #Load all relevant packages (this may be overkill) 22% set pattern {grammar::me|grammar::peg}; foreach packageName [package names] {if [regexp $pattern $packageName] {puts "[catch {package require $packageName}] $packageName"}} 0 grammar::peg::interp 0 grammar::peg 0 grammar::me::tcl 0 grammar::me::cpu 0 grammar::me::util 0 grammar::me::cpu::core 0 grammar::me::cpu::gasm 23% # Initialize the interpreter. 24% grammar::peg::interp::setup ::myGrammar 25% # As of 3-9-8, that's as far as I have gotten. The documentation at http://tcllib.sourceforge.net/doc/peg_interp.html 26% # suggests using 27% # ::grammar::peg::interp::parse nextcmd errorvar astvar 28% #" which "interprets the loaded grammar and tries to match it against the stream of characters represented by the command prefix nextcmd". 29% # so I put the string "102" in "myFile.txt" and tried the following, but, as you can see, was unsuccessful. 30% set f [open c:/myFile.txt r+] file37bf6c8 31% chan configure $f -encoding cp1252 32% set offset 0 0 33% ::grammar::peg::interp::parse $f myErrorVar myAstVar invalid command name "file37bf6c8" 34% chan close $f
JBR: I've made my own attempt to create a grammar::peg example. Maybe someone who knows can make this example work. Thanks.
#!/usr/bin/env tclsh8.6 package require grammar::peg package require grammar::peg::interp proc parse-string string { coroutine next-char apply {string { yield set i 1 foreach ch [split $string {}] { yield [list $ch 1 1 $i] incr i } while 1 {yield {}} }} $string } ::grammar::peg parser parser nonterminal add Digit {/ {t 1} {t 2} {t 3} {t 4} {t 5}} parser nonterminal add Int {+ {n Digit}} parser start {n Int} parse-string 54321 grammar::peg::interp::setup parser grammar::peg::interp::parse next-char err ast
When run I get this:
wrong # args: should be "ict_match_token tok msg" while executing "ict_match_token "Expected $ch"" (procedure "MatchExpr" line 40) invoked from within "MatchExpr $e" (procedure "MatchExpr" line 236) invoked from within "MatchExpr $ru($nt)" (procedure "MatchExpr" line 75) invoked from within "MatchExpr $sub" (procedure "MatchExpr" line 159) invoked from within "MatchExpr $ru($nt)" (procedure "MatchExpr" line 75) invoked from within "MatchExpr $se" (procedure "grammar::peg::interp::parse" line 9) invoked from within "grammar::peg::interp::parse next-char err ast" (file "./ISBL" line 30)
This appears to be a bug in grammar::peg::interp I changed line 116 of peg_interp.tcl to:
ict_match_token $ch "Expected $ch"
tomas 2010-01-25: This bug is fixed from release 1.13 onwards (thanks, AK, you are my hero).
AK: The above sentence is a bit ambiguous. To clarify, the bug fix is not in Tcllib 1.13. It is in the CVS, just after the release, and will be found in the next release after 1.13. That being said, the 'Parser Tools' supercede grammar_peg and grammar_me.
Then changing my coroutine to return EOF multiple times after the input is exhausted the string parses returning:
ALL 0 4 {Int 0 4 {Digit 0 0 {{} 0 0}} {Digit 1 1 {{} 1 1}} {Digit 2 2 {{} 2 2}} {Digit 3 3 {{} 3 3}} {Digit 4 4 {{} 4 4}}}
Now to learn how to do something useful with this. The input tokens don't appear to be represented in the output AST?
AK 2011-01-21 14:40:21:
Regarding the representation of input tokens. No, they are not represented directly. The numbers in the AST tell you the character range covered by the symbol, as offsets from the beginning of the string. This allows you to extract the lexemes/token from the input string. See http://docs.activestate.com/activetcl/8.5/tcllib/pt/pt_parser_api.html for an example of the tree and its contents.
tomas 2011-01-23:
Here's a more complete example. It tries to implement the "classical" expression grammar, with (integer decimal) numbers, the four arithmetical operations.
To note:
Comments very welcome!
#!/usr/bin/tclsh # peg: playing around with peg package require grammar::peg package require grammar::me::tcl package require grammar::peg::interp grammar::peg expression # PEGs ain't BNF, but the similarities are striking # One important difference: the choices {/ ...} are _ordered_ # choices; no backtracking in * and +. See # <http://en.wikipedia.org/wiki/Parsing_expression_grammar> # for te gory details # Therefore the comments evoking BNF are just to tickle the # reader's gentle associative memory (or was it the associative's # reader gentle memory? Ah -- you know what I mean. # Sign ::= '+' | '-' expression nonterminal add Sign {/ {t +} {t -}} # Digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' expression nonterminal add Digit {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}} # Addop ::= '+' | '-' expression nonterminal add Addop {/ {t +} {t -}} # Mulop ::= '*' | '/' expression nonterminal add Mulop {/ {t *} {t /}} # Number ::= Sign? Digit+ expression nonterminal add Number {x {? {n Sign}} {+ {n Digit}}} # Factor ::= '(' Expression ')' | Number expression nonterminal add Factor {/ {x {t (} {n Expression} {t )}} {n Number}} # Term ::= Factor ( Mulop Factor )* expression nonterminal add Term {x {n Factor} {* {x {n Mulop} {n Factor}}}} # Expression ::= Term ( Addop Term )* expression nonterminal add Expression {x {n Term} {* {x {n Addop} {n Term}}}} # Start here: expression start {n Expression} puts [expression serialize] grammar::peg::interp::setup expression set i -1 # The expression to be parsed. No whitespaces! set src {1024+256} proc nexttok {} { global src global i return [list [string index $src [incr i]] {} $i 0] } set errs {} set ast {} grammar::peg::interp::parse nexttok errs ast puts "errs = $errs" puts "ast = $ast" proc print_nested {ast {prefix {}}} { global src puts "$prefix [lindex $ast 0]: [ string range $src [lindex $ast 1] [lindex $ast 2]]" foreach sub [lrange $ast 3 end] { print_nested $sub "$prefix " } } print_nested $ast
YOSIFOV: My example with human-readable PEG definition and without import from file:
package require starkit starkit::startup package require pt::peg::from::peg package require pt::peg::container package require pt::peg::interp set PEG { PEG myapp (Exp) IfKw <- 'i' 'f' ; ThenKw <- 't' 'h' 'e' 'n' ; ElseKw <- 'e' 'l' 's' 'e' ; OSp <- ('\t' / '\n' / ' ')? ; Sp <- ('\t' / '\n' / ' ')+ ; Block <- (Exp / 't') &';' ; Cond <- 'c' / 'o' ; If <- IfKw Sp Cond OSp ':' OSp Block ; Exp <- If ; END; } set CPEG [pt::peg::from::peg convert $PEG] pt::peg::container cont cont deserialize = $CPEG set src "if c: if o: t;" pt::peg::interp myint myint use cont puts [myint parset $src]
Has bugs, I suppose, but shows main trick :) Parses nested IF (see $src): CONDITION is 'c' or 'o', BLOCK is EXP or 't'. See more HERE