Version 2 of A Tcl parser in Tcl

Updated 2003-08-21 09:18:23

Lars H? For a more robust (but also heavier) Tcl parser in Tcl, see parsetcl.

GPS: Yours is much more complex than mine. I won't comment on the issue of robustness. I made some changes and now version 5 passes the test you proposed at the end of this page. I also added an optional parray test, which is also passes now.


 #Copyright 2003 George Peter Staplin
 #You may use this under the same terms as Tcl.
 #A Mini Tcl Parser in Tcl.
 #version 5

 interp alias {} loop {} while 1

 set s {
  A [B 123] [C 456 [C2 789]]
  set x [+ 20 [* 30 200]] 
  list a b {a b} $c {$c} [d e; f g]
 }


 if 0 {
 #We need to have autoloading source the parray code.
 array set ar {}
 parray ar
 append s "\n[info body parray]"
 }

 #1 or 0
 proc get.token {s iPtr tokPtr typePtr} {
  upvar $iPtr i
  upvar $tokPtr tok
  upvar $typePtr type
  set tok ""
  set sLen [string length $s]
  set lastChar ""
  #unknown, brace, bracket, quote, end
  set type unknown

  set braceCount 0
  set bracketCount 0
  set quote 0

  for {} {$i < $sLen} {incr i} {
   set c [string index $s $i]

   if {$braceCount} {
    if {("\{" == $c) && ("\\" != $lastChar)} {
     incr braceCount
    } elseif {("\}" == $c) && ("\\" != $lastChar)} {
     incr braceCount -1
    }
    if {0 == $braceCount} {
     incr i
     return 1
    }
    append tok $c
   } elseif {$bracketCount} {
    if {("\[" == $c) && ("\\" != $lastChar)} {
     incr bracketCount
    } elseif {("\]" == $c) && ("\\" != $lastChar)} {
     incr bracketCount -1
    }   
    if {0 == $bracketCount} {
     incr i
     return 1
    }
    append tok $c
   } elseif {$quote} {
    if {("\"" ==  $c) && ("\\" != $lastChar)} {
     incr i
     return 1
    }
    append tok $c
   } else {
    if {("\{" == $c) && ("\\" != $lastChar)} {
     set type brace
     incr braceCount
    } elseif {("\[" == $c) && ("\\" != $lastChar)} {
     set type bracket
     incr bracketCount
    } elseif {("\"" == $c) && ("\\" != $lastChar)} {
     set type quote
     set quote 1
    } elseif {(" " == $c) || ("\t" == $c)} {
     if {[string length $tok]} {
      return 1
     }
    } elseif {("\n" == $c) || ("\r" == $c) || (";" == $c)} {
     if {[string length $tok]} {
      return 1
     } else {
      set type end
      set tok $c
      incr i
      return 1
     }
    } else {
     append tok $c
    }
   }
   set lastChar $c
  }

  if {"unknown" ne $type} {
   puts stderr "incomplete command: still in state of $type"
   return 0
  }

  if {[string length $tok]} {
   return 1
  }

  return 0
 }

 set ::level 0
 proc parse s {
  global level
  set i 0
  set tok ""
  set type ""

  loop {
   set r [get.token $s i tok type]
   if {!$r} break
   #puts "TOK:$tok TYPE:$type"

   if {"end" == $type} {
    puts [string repeat " " $level]SEP
   } elseif {"bracket" == $type} {
    incr level
    puts [string repeat " " $level]BRACK
    parse $tok
    incr level -1
   } else {
    puts [string repeat " " $level]TOK:$tok 
   }
  }
 }

 proc main {} {
  parse $::s
 }
 main

Example output:

 $ tclsh85g.exe mini_tcl_parser-3.tcl
 TOK:A
  TOK:B
  TOK:123
   TOK:C
   TOK:456
    TOK:C2
    TOK:789
 TOK:set
 TOK:x
  TOK:+
  TOK:20
   TOK:*
   TOK:30
   TOK:200

Lars H (19 aug 2003): It seems a bit simplistic. Consider:

 % parse {list a b {a b} $c {$c} [d e; f g]}
 TOK:list
 TOK:a
 TOK:b
 TOK:a b
 TOK:$c
 TOK:$c
  TOK:d
  TOK:e
 TOK:f
 TOK:g

That $c and {$c} are the same can be an artifact of the way that the parse procedure presents the result (type information not shown), but the level of the f and g tokens is simply wrong.


GPS: I've fixed this problem. Thanks for pointing it out. Here's the output with version 5:

 $ tclsh85g.exe mini_tcl_parser-5.tcl
 SEP
 TOK:A
  BRACK
  TOK:B
  TOK:123
  BRACK
  TOK:C
  TOK:456
   BRACK
   TOK:C2
   TOK:789
 SEP
 TOK:set
 TOK:x
  BRACK
  TOK:+
  TOK:20
   BRACK
   TOK:*
   TOK:30
   TOK:200
 SEP
 TOK:list
 TOK:a
 TOK:b
 TOK:a b
 TOK:$c
 TOK:$c
  BRACK
  TOK:d
  TOK:e
  SEP
  TOK:f
  TOK:g
 SEP