A Tcl parser in Tcl

Lars H? For a more robust (but also heavier) Tcl parser in Tcl, see parsetcl.

GPS: Yours is much more complex than mine. I won't comment on the issue of robustness. I made some changes and now version 5 passes the test you proposed at the end of this page. I also added an optional parray test, which it also passes now.

----
 #Copyright 2003 George Peter Staplin
 #You may use this under the same terms as Tcl.
 #A Mini Tcl Parser in Tcl.
 #version 5
 
 interp alias {} loop {} while 1
 
 set s {
  A [B 123] [C 456 [C2 789]]
  set x [+ 20 [* 30 200]] 
  list a b {a b} $c {$c} [d e; f g]
 }
 
 
 if 0 {
 #We need to have autoloading source the parray code.
 array set ar {}
 parray ar
 append s "\n[info body parray]"
 }
 
 #1 or 0
 proc get.token {s iPtr tokPtr typePtr} {
  upvar $iPtr i
  upvar $tokPtr tok
  upvar $typePtr type
  set tok ""
  set sLen [string length $s]
  set lastChar ""
  #unknown, brace, bracket, quote, end
  set type unknown
 
  set braceCount 0
  set bracketCount 0
  set quote 0
 
  for {} {$i < $sLen} {incr i} {
   set c [string index $s $i]
    
   if {$braceCount} {
    if {("\{" == $c) && ("\\" != $lastChar)} {
     incr braceCount
    } elseif {("\}" == $c) && ("\\" != $lastChar)} {
     incr braceCount -1
    }
    if {0 == $braceCount} {
     incr i
     return 1
    }
    append tok $c
   } elseif {$bracketCount} {
    if {("\[" == $c) && ("\\" != $lastChar)} {
     incr bracketCount
    } elseif {("\]" == $c) && ("\\" != $lastChar)} {
     incr bracketCount -1
    }   
    if {0 == $bracketCount} {
     incr i
     return 1
    }
    append tok $c
   } elseif {$quote} {
    if {("\"" ==  $c) && ("\\" != $lastChar)} {
     incr i
     return 1
    }
    append tok $c
   } else {
    if {("\{" == $c) && ("\\" != $lastChar)} {
     set type brace
     incr braceCount
    } elseif {("\[" == $c) && ("\\" != $lastChar)} {
     set type bracket
     incr bracketCount
    } elseif {("\"" == $c) && ("\\" != $lastChar)} {
     set type quote
     set quote 1
    } elseif {(" " == $c) || ("\t" == $c)} {
     if {[string length $tok]} {
      return 1
     }
    } elseif {("\n" == $c) || ("\r" == $c) || (";" == $c)} {
     if {[string length $tok]} {
      return 1
     } else {
      set type end
      set tok $c
      incr i
      return 1
     }
    } else {
     append tok $c
    }
   }
   set lastChar $c
  }
 
  if {"unknown" ne $type} {
   puts stderr "incomplete command: still in state of $type"
   return 0
  }
 
  if {[string length $tok]} {
   return 1
  }
  
  return 0
 }
 
 set ::level 0
 proc parse s {
  global level
  set i 0
  set tok ""
  set type ""
 
  loop {
   set r [get.token $s i tok type]
   if {!$r} break
   #puts "TOK:$tok TYPE:$type"
 
   if {"end" == $type} {
    puts [string repeat " " $level]SEP
   } elseif {"bracket" == $type} {
    incr level
    puts [string repeat " " $level]BRACK
    parse $tok
    incr level -1
   } else {
    puts [string repeat " " $level]TOK:$tok 
   }
  }
 }
 
 proc main {} {
  parse $::s
 }
 main

Example output:

 $ tclsh85g.exe mini_tcl_parser-3.tcl
 TOK:A
  TOK:B
  TOK:123
   TOK:C
   TOK:456
    TOK:C2
    TOK:789
 TOK:set
 TOK:x
  TOK:+
  TOK:20
   TOK:*
   TOK:30
   TOK:200

Lars H (19 aug 2003): It seems a bit simplistic. Consider:

 % parse {list a b {a b} $c {$c} [d e; f g]}
 TOK:list
 TOK:a
 TOK:b
 TOK:a b
 TOK:$c
 TOK:$c
  TOK:d
  TOK:e
 TOK:f
 TOK:g

That $c and {$c} are the same can be an artifact of the way that the parse procedure presents the result (type information not shown), but the level of the f and g tokens is simply wrong.


GPS: I've fixed this problem. Thanks for pointing it out. Here's the output with version 5:

 $ tclsh85g.exe mini_tcl_parser-5.tcl
 SEP
 TOK:A
  BRACK
  TOK:B
  TOK:123
  BRACK
  TOK:C
  TOK:456
   BRACK
   TOK:C2
   TOK:789
 SEP
 TOK:set
 TOK:x
  BRACK
  TOK:+
  TOK:20
   BRACK
   TOK:*
   TOK:30
   TOK:200
 SEP
 TOK:list
 TOK:a
 TOK:b
 TOK:a b
 TOK:$c
 TOK:$c
  BRACK
  TOK:d
  TOK:e
  SEP
  TOK:f
  TOK:g
 SEP

Information for people who are interesting in parsing Tcl in Tcl. XOTclIDE have own Tcl Parser (Component IDETclParser) that is implemented object oriented in XOTcl. It is used to make syntax hightligting and syntax checker. The Parse Tree is build as nested XOTcl Objects that can be directly used to make operations on it.

The XOTcIDE Parser also parse scripts from known Tcl commands. For example command (foreach variable list script). The parser knows that the last element of command is script and parse it as script. In this case the parser tries to emulate Tcl interpreter. It knows how some commands work. For example it knows that (set a 32) will define new variable a. Of course it will not work for special Tcl scripts as

   set a {puts script}
   if {$condition} $a

But normal case is

   if {$condition} {puts script}

For such scripts the syntax checker can proof "puts" to be valid command.

The nature of Tcl make it very easy to build parser. But only for "first evalutation level". At this level there are only commands and words. (Also internal Tcl Parser knows special commands as if and for and parse command arguments (words) as scripts)