if 0 { [Lars H]: '''parsetcl''' is a package I've written to parse Tcl scripts. My original need for it was for cross-referencing Tcl code (to generate an index which answers the question: on which lines is this command/variable/whatever used?), but there are many other interesting applications. One is to write a command that works like a "[proc] with preprocessor": The body is parsed, then the preprocessor examines the code (probably replacing some special construction, e.g. syntactic sugar with the corresponding byte-compilable raw Tcl), the result is translated back to a Tcl script, and finally that is given as the body to the normal [proc] command. I currently (2003-08-19) haven't implemented the "back" (from parser output format to Tcl script) part of this, but the parser is strict enough that this should be possible. The output format of the parser is something which I call a "parser tree". It has the format : ''type'' ''interval'' ''text'' [[''subtree'' ...]] where the ''type'' is the type of this node in the tree, ''interval'' is the range of characters in the original script to which the node corresponds, and ''text'' is the raw text as which this code could be parsed or an empty string if that is not possible. Nodes for composite things in have one or more subtrees have subtrees for each component. The currently defined types are: Rs: Root for a parsed script. Rx: Root for a parsed expression (not implemented yet). Cd: A command (subtrees are the words). Sv: Scalar variable substitution. Sa: Array variable substitution. Sc: Command substitution. Sb: Backslash substitution. Lr: Raw ("literate") text. Lq: Raw text, in quotes. Lb: Raw text, in braces. Mr: Text of which a part is generated by substitution. Mq: Like Mr, but in quotes. Nc: A comment. Ne: A syntax error detected by the parser. Np: A "placeholder" (used internally). The elementary parser procedure available is '''parsetcl::basic_parse_script''', but '''parsetcl::simple_parse_script''' may be more convenient. Here is the code: } ## ## This is file `parsetcl.tcl', ## generated with the docstrip utility. ## ## The original source files were: ## ## parsetcl.dtx (with options: `pkg') ## ## In other words: ## *************************************** ## * This Source is not the True Source. * ## *************************************** ## The True Source is parsetcl.dtx in ## http://ftp.ctan.org/tex-archive/macros/latex/contrib/tclldoc/examples ## ## (c) 2003 Lars Hellstr\"om ## ## It is preferred that you apply the distribution and modification ## conditions of the LaTeX Project Public License (LPPL) for this file, ## but you may alternatively choose to apply BSD/Tcl-style license ## conditions (either is OK). The latest version of the LPPL is in ## http://www.latex-project.org/lppl.txt ## and version 1.2 or later is part of all distributions of LaTeX ## version 1999/12/01 or later. ## namespace eval parsetcl {} package require Tcl 8.4 package provide parsetcl 0.1 proc parsetcl::flush_whitespace {script index_var cmdsep} { upvar 1 $index_var index if {[ if {$cmdsep} then { regexp -start $index -- {\A([ \t-\r;]|\\\n)+} $script match } else { regexp -start $index -- {\A([ \t\v\f\r]|\\\n)+} $script match } ]} then { incr index [string length $match] return [string length $match] } else { return 0 } } proc parsetcl::parse_command {script index_var nested} { upvar 1 $index_var index flush_whitespace $script index 1 switch -- "[string index $script $index]$nested" {#0} - {#1} { regexp -start $index -indices -- {\A#([^\n\\]|\\.)*(\\$)?}\ $script interval incr index regsub -all -- {\\\n[ \t]*}\ [string range $script $index [lindex $interval 1]]\ { } text set index [expr {[lindex $interval 1] + 1}] return [list Nc $interval $text] } 0 - 1 - \]1 { return [list Np "" ""] } set res [list Cd [list $index ""] ""] set next [parse_word $script index $nested] while {[lindex $next 0] ne "Np"} { lappend res $next set next [parse_word $script index $nested] } lset res 1 1 [lindex $res end 1 1] return $res } proc parsetcl::basic_parse_script {script} { set index 0 set res [list Rs [list $index ""] ""] while {[lindex [set next [parse_command $script index 0]] 0] ne "Np"} { lappend res $next } incr index -1 lset res 1 1 $index return $res } proc parsetcl::parse_word {script index_var nested} { upvar 1 $index_var index switch -- [string index $script $index] \{ { parse_braced_word $script index $nested } \" { parse_quoted_word $script index $nested } "" - \; - \n { list Np "" "" } \] { if {$nested} then { list Np "" "" } else { parse_raw_word $script index $nested } } default { parse_raw_word $script index $nested } } proc parsetcl::parse_braced_word {script index_var nested} { upvar 1 $index_var index set res [list Lb [list $index ""]] set depth 1 set text "" incr index while {$depth>0} { regexp -start $index -- {\A([^{}\\]|\\[^\n])*} $script match append text $match incr index [string length $match] switch -- [string index $script $index] \{ { incr depth incr index append text \{ } \} { incr depth -1 incr index if {$depth} { append text \} } } \\ { if {[regexp -start $index -- {\A\\\n[ \t]*} $script match]}\ then { incr index [string length $match] append text { } } else { append text \\ break } } "" { break } } if {$depth>0} then { lset res 1 1 $index lappend res $text [list Ne [list "" $index] {missing close-brace}] lset res 3 1 0 [incr index] return $res } lset res 1 1 [expr {$index - 1}] lappend res $text if {[flush_whitespace $script index 0]} then {return $res} switch -- [string index $script $index] \n - \; - {} { return $res } \] { if {$nested} then {return $res} } lappend res [list Ne [list $index [expr {$index - 1}]]\ {missing space after close-brace}] return $res } proc parsetcl::parse_quoted_word {script index_var nested} { upvar 1 $index_var index set res [list Lq [list $index ""] ""] set text "" incr index while {1} { switch -- [string index $script $index] \\ { lappend res [parse_backslash $script index] append text [lindex $res end 2] } \$ { lappend res [parse_dollar $script index] lset res 0 Mq } \[ { lappend res [parse_bracket $script index] lset res 0 Mq } \" { incr index break } "" { lappend res [list Ne [list $index [expr {$index - 1}]]\ {missing close-quote}] break } default { regexp -start $index -- {[^\\$\["]*} $script match set t $index incr index [string length $match] lappend res [list Lr [list $t [expr {$index - 1}]] $match] append text $match } } lset res 1 1 [expr {$index - 1}] if {[lindex $res 0] eq "Lq"} then { lset res 2 $text if {[llength $res] == 4 && [lindex $res 3 0] eq "Lr"} then { set res [lrange $res 0 2] } } if {[flush_whitespace $script index 0]} then {return $res} switch -- [string index $script $index] \n - \; - {} { return $res } \] { if {$nested} then {return $res} } lappend res [list Ne [list $index [expr {$index - 1}]]\ {missing space after close-quote}] return $res } proc parsetcl::parse_raw_word {script index_var nested} { upvar 1 $index_var index set res [list] set type Lr set interval [list $index] set text "" while {1} { switch -- [string index $script $index] \\ { if {[string index $script [expr {$index+1}]] eq "\n"} then { break } lappend res [parse_backslash $script index] append text [lindex $res end 2] continue } \$ { lappend res [parse_dollar $script index] set type Mr continue } \[ { lappend res [parse_bracket $script index] set type Mr continue } \t - \n - \v - \f - \r - " " - \; - "" { break } if {$nested} then { if {![ regexp -start $index -- {\A[^\\$\[\]\t-\r ;]+} $script match ]} then {break} } else { regexp -start $index -- {\A[^\\$\[\t-\r ;]+} $script match } set t $index incr index [string length $match] lappend res [list Lr [list $t [expr {$index - 1}]] $match] append text $match } if {[llength $res]==1} then { set res [lindex $res 0] } else { lappend interval [expr {$index - 1}] if {$type ne "Lr"} then {set text ""} set res [linsert $res 0 $type $interval $text] } flush_whitespace $script index 0 return $res } proc parsetcl::parse_backslash {script index_var} { upvar 1 $index_var index set start $index incr index set ch [string index $script $index] set res [list Lr [list $index $index] $ch] switch -- $ch a { set res [list Sb [list $start $index] \a $res] } b { set res [list Sb [list $start $index] \b $res] } f { set res [list Sb [list $start $index] \f $res] } n { set res [list Sb [list $start $index] \n $res] } r { set res [list Sb [list $start $index] \r $res] } t { set res [list Sb [list $start $index] \t $res] } v { set res [list Sb [list $start $index] \v $res] } x { if {[regexp -start [expr {$index + 1}] -- {\A[0-9A-Fa-f]+}\ $script match]} then { scan [string range $match end-1 end] %x code incr index [string length $match] lset res 1 1 $index lset res 2 "x$match" set res [list Sb [list $start $index]\ [format %c $code] $res] } else { set res [list Sb [list $start $index] x $res] } } u { if {[regexp -start [expr {$index + 1}] -- {\A[0-9A-Fa-f]{1,4}}\ $script match]} then { scan $match %x code incr index [string length $match] lset res 1 1 $index lset res 2 "u$match" set res [list Sb [list $start $index]\ [format %c $code] $res] } else { set res [list Sb [list $start $index] u $res] } } \n { regexp -start [expr {$index + 1}] -- {\A[ \t]*} $script match incr index [string length $match] lset res 1 1 $index lset res 2 "\n$match" set res [list Sb [list $start $index] " " $res] } "" { return [list Sb [list $start $start] \\] } default { if {[regexp -start $index -- {\A[0-7]{1,3}} $script match]} then { scan $match %o code incr index [expr {[string length $match]-1}] lset res 1 1 $index lset res 2 $match set res [list Sb [list $start $index] [format %c $code] $res] } else { set res [list Sb [list $start $index] $ch $res] } } incr index return $res } proc parsetcl::parse_bracket {script index_var} { upvar 1 $index_var index set res [list Sc [list $index ""] ""] incr index while {[lindex [set next [parse_command $script index 1]] 0] ne "Np"} { lappend res $next } if {[string index $script $index] eq "\]"} then { lset res 1 1 $index incr index return $res } else { lappend res [list Ne [list $index [expr {$index-1}]]\ {missing close-bracket}] lset res 1 1 [expr {$index-1}] return $res } } set parsetcl::varname_RE {\A(\w|::)+} proc parsetcl::parse_dollar {script index_var} { upvar 1 $index_var index set res [list "" [list $index ""] ""] incr index if {[string index $script $index] eq "\{"} then { lset res 0 Sv set end [string first \} $script $index] if {$end<0} then { set end [expr {[string length $script] - 1}] lappend res [list Lb [list $index $end]\ [string range $script [expr {$index + 1}] end]]\ [list Ne [list [expr {$end+1}] $end]\ {missing close-brace for variable name}] } else { lappend res [list Lb [list $index $end]\ [string range $script [expr {$index + 1}] [expr {$end-1}]]] } lset res 1 1 $end set index [expr {$end + 1}] return $res } variable varname_RE if {![regexp -start $index -- $varname_RE $script match]} then { if {[string index $script $index] eq "("} then { set match "" } else { return [list Lr [list [lindex $res 1 0] [lindex $res 1 0]] \$] } } set t $index incr index [string length $match] lappend res [list Lr [list $t [expr {$index-1}]] $match] if {[string index $script $index] ne "("} then { lset res 0 Sv lset res 1 1 [lindex $res 3 1 1] return $res } lset res 0 Sa incr index set subres [list Lr [list $index ""] ""] lappend res "" set text "" while {1} { switch -- [string index $script $index] \\ { lappend subres [parse_backslash $script index] append text [lindex $subres end 2] } \$ { lappend subres [parse_dollar $script index] lset subres 0 Mr } \[ { lappend subres [parse_bracket $script index] lset subres 0 Mr } ) { lset subres 1 1 [expr {$index - 1}] break } "" { lappend res\ [list Ne [list $index [incr index -1]] {missing )}] lset subres 1 1 $index break } default { regexp -start $index -- {[^\\$\[)]*} $script match set t $index incr index [string length $match] lappend subres [list Lr [list $t [expr {$index - 1}]] $match] append text $match } } if {[lindex $subres 0] eq "Lr"} then {lset subres 2 $text} if {[llength $subres] == 4} then {set subres [lindex $subres 3]} lset res 1 1 $index incr index lset res 4 $subres return $res } # # The following are utility procedures: # proc parsetcl::format_tree {tree base step} { set res $base append res \{ [lrange $tree 0 1] { } if {[regexp {[\n\r]} [lindex $tree 2]]} then { append res [string range [list "[lindex $tree 2]\{"] 0 end-2] } else { append res [lrange $tree 2 2] } if {[llength $tree]<=3} then { append res \} return $res } elseif {[llength $tree] == 4 &&\ [string match {S[bv]} [lindex $tree 0]]} then { append res " " [format_tree [lindex $tree 3] "" ""] \} return $res } append res \n foreach subtree [lrange $tree 3 end] { append res [format_tree $subtree $base$step $step] \n } append res $base \} } proc parsetcl::offset_intervals {tree offset} { set res [lrange $tree 0 2] foreach i {0 1} { lset res 1 $i [expr {[lindex $res 1 $i] + $offset}] } foreach subtree [lrange $tree 3 end] { lappend res [offset_intervals $subtree $offset] } return $res } proc parsetcl::reparse_Lb_as_script {tree_var index parsed} { upvar 1 $tree_var tree set node [lindex $tree $index] switch -- [lindex $node 0] Lb - Lr - Lq { set base [expr {[lindex $node 1 0] + 1}] if {[lindex $node 0] eq "Lb"} then { set script [string range $parsed $base\ [expr {[lindex $node 1 1] - 1}]] } else { set script [lindex $node 2] } lset tree $index\ [offset_intervals [basic_parse_script $script] $base] if {[lindex $node 0] eq "Lb"} then { return 2 } else { return 1 } } default { return 0 } } proc parsetcl::walk_tree {tree_var index_var args} { upvar 1 $tree_var tree $index_var idxL set idxL [list] set i 0 while {$i>=0} { if {$i==0} then { uplevel 1 [list switch -regexp --\ [lindex [lindex $tree $idxL] 0] $args] set i 3 } elseif {$i < [llength [lindex $tree $idxL]]} then { lappend idxL $i set i 0 } elseif {[llength $idxL]} then { set i [lindex $idxL end] set idxL [lrange $idxL 0 end-1] incr i } else { set i -1 } } } proc parsetcl::simple_parse_script {script} { set tree [parsetcl::basic_parse_script $script] walk_tree tree indices Cd { switch -- [lindex [lindex $tree $indices] 3 2] if { for {set i 3} {$i < [llength [lindex $tree $indices]]}\ {incr i} { switch -- [lindex [lindex $tree $indices] $i 2]\ if - elseif { incr i; continue } then - else { incr i } parsetcl::reparse_Lb_as_script tree\ [linsert $indices end $i] $script } } while { parsetcl::reparse_Lb_as_script tree [linsert $indices end 5]\ $script } for { parsetcl::reparse_Lb_as_script tree [linsert $indices end 4]\ $script parsetcl::reparse_Lb_as_script tree [linsert $indices end 6]\ $script parsetcl::reparse_Lb_as_script tree [linsert $indices end 7]\ $script } foreach { parsetcl::reparse_Lb_as_script tree [linsert $indices end end]\ $script } catch { parsetcl::reparse_Lb_as_script tree [linsert $indices end 4]\ $script } proc { parsetcl::reparse_Lb_as_script tree [linsert $indices end 6]\ $script } } return $tree } ## ## ## End of file `parsetcl.tcl'. if 0 { As an example of how this works, consider applying '''parsetcl::simple_parse_script''' to the body of the [parray] procedure: } auto_load parray parsetcl::simple_parse_script [info body parray] This returns a long list whose structure is rather hard to follow. However, the utility procedure '''parsetcl::format_tree''' can make the structure more visible. The command parsetcl::format_tree [parsetcl::simple_parse_script [info body parray]] { } { } returns {Rs {0 467} {} {Cd {5 20} {} {Lr {5 9} upvar} {Lr {11 11} 1} {Sv {13 14} {} {Lr {14 14} a}} {Lr {16 20} array} } {Cd {26 90} {} {Lr {26 27} if} {Lb {29 51} {![array exists array]}} {Rs {54 89} {} {Cd {56 84} {} {Lr {56 60} error} {Mq {62 84} {} {Sb {63 64} {"} {Lr {64 64} {"}}} {Sv {65 66} {} {Lr {66 66} a}} {Sb {67 68} {"} {Lr {68 68} {"}}} {Lr {69 83} { isn't an array}} } } } } {Cd {96 105} {} {Lr {96 98} set} {Lr {100 103} maxl} {Lr {105 105} 0} } {Cd {111 244} {} {Lr {111 117} foreach} {Lr {119 122} name} {Sc {124 159} {} {Cd {125 158} {} {Lr {125 129} lsort} {Sc {131 158} {} {Cd {132 157} {} {Lr {132 136} array} {Lr {138 142} names} {Lr {144 148} array} {Sv {150 157} {} {Lr {151 157} pattern}} } } } } {Rs {162 243} {} {Cd {164 238} {} {Lr {164 165} if} {Lb {167 197} {[string length $name] > $maxl}} {Rs {200 237} {} {Cd {206 235} {} {Lr {206 208} set} {Lr {210 213} maxl} {Sc {215 235} {} {Cd {216 234} {} {Lr {216 221} string} {Lr {223 228} length} {Sv {230 234} {} {Lr {231 234} name}} } } } } } } } {Cd {250 297} {} {Lr {250 252} set} {Lr {254 257} maxl} {Sc {259 297} {} {Cd {260 296} {} {Lr {260 263} expr} {Lb {265 296} {$maxl + [string length $a] + 2}} } } } {Cd {303 466} {} {Lr {303 309} foreach} {Lr {311 314} name} {Sc {316 351} {} {Cd {317 350} {} {Lr {317 321} lsort} {Sc {323 350} {} {Cd {324 349} {} {Lr {324 328} array} {Lr {330 334} names} {Lr {336 340} array} {Sv {342 349} {} {Lr {343 349} pattern}} } } } } {Rs {354 465} {} {Cd {356 394} {} {Lr {356 358} set} {Lr {360 369} nameString} {Sc {371 394} {} {Cd {372 393} {} {Lr {372 377} format} {Lr {379 384} %s(%s)} {Sv {386 387} {} {Lr {387 387} a}} {Sv {389 393} {} {Lr {390 393} name}} } } } {Cd {397 460} {} {Lr {397 400} puts} {Lr {402 407} stdout} {Sc {409 460} {} {Cd {410 459} {} {Lr {410 415} format} {Lq {417 427} {%-*s = %s}} {Sv {429 433} {} {Lr {430 433} maxl}} {Sv {435 445} {} {Lr {436 445} nameString}} {Sa {447 459} {} {Lr {448 452} array} {Sv {454 458} {} {Lr {455 458} name}} } } } } } } } This is horrible reading, but much easier for a program to do things with than the original script. ---- Comments? ---- [JJS] - It would be enormously helpful if you provided a script which took the parse tree and turned it back into the original code. Doing such an identity transformation on a large group of tcl scripts would be a very effective unit test for your code, and having that transformation script as a starting point would make it much simpler for folks looking to make use of your code. I realize you acknowledge that'd be a useful next step, but consider this encouragement to actually follow through :-). ----- Here's a partial inversion of the parser - [CMcC] namespace eval parsetcl {} proc ::parsetcl::unparse {tree} { eval $tree } proc ::parsetcl::Lr {interval text args} { return $text } proc ::parsetcl::Lb {interval text args} { return \{$text\} } proc ::parsetcl::Lq {interval text args} { return \"$text\" } proc ::parsetcl::Sb {interval text args} { return "\\$text" } proc ::parsetcl::Sv {interval text args} { return "\$[eval [lindex $args 0]]" } proc ::parsetcl::Sa {interval text args} { foreach a [lrange $args 1 end] { append result [eval $a] } return "\$[eval [lindex $args 0]]($result)" } proc ::parsetcl::Sc {interval text args} { foreach a $args { append cmd " " [eval $a] } return \[${cmd}\] } proc ::parsetcl::Mr {interval text args} { foreach a $args { append result [eval $a] } return ${result} } proc ::parsetcl::Mq {interval text args} { foreach a $args { append result [eval $a] } return \"${result}\" } proc ::parsetcl::Cd {interval text args} { foreach a $args { append cmd [eval $a] " " } return ${cmd} } proc ::parsetcl::Rs {interval text args} { foreach a $args { append cmd [eval $a]\n } return \{\n$cmd\n\} } ---- [TV] ''(May 12 2004)'' Interesting page and comment. Reminds me of a text-to-3Dobject_database program I once made and then added a text generator for editing hierarchical 3D object source files after the OO rep got transformed. Which leads me to the questions: is the parser complete enough ? That could make it interesting to visualize the results in [bwise], and possibly lead to speed improvements and compilation analysis/speedup. ---- [AM] (30 march 2005) I used the above code to create a prototype of a tool to generate a "call-tree". While it is far from complete, it does show the capabilities (as far as I am concerned) of such a parser. Other uses I can think of: * Slicing - flesh out those parts of the code influenced by a particular variable * Instrumentation for test coverage * Generating structure graphs (or whatever the appropriate name is) that show the complexity of a procedure * Determining all kinds of metrics Some remarks about the above code though: * The simple_parse_script procedure does not parse [[switch]] bodies ''[Lars H]: Yes, I know. What prevented me from adding that capability was that then I would also need a list parser (in one form of [switch], the bodies are elements of a list), and at the time I didn't have the time to write that. Anyway '''simple'''_parse_script was never meant to be "the real thing", but only a testing aid and proof of concept.'' -- [AM] I quite understand. I realised the other day that another construct that is missing is [[uplevel]], which behaves more or less like a [[foreach]] loop in the sense of the parser. It is just so close to the "real thing" that one starts to complain about everything :) -- ''[Lars H]: [uplevel] is really most like [eval] (I think at the byte-code level [eval] is equivalent to '''uplevel 0'''), which in general is anther can of worms. My uplevels are quite often on the form "uplevel 1 [[list ::set $var $val]]"...'' * Another thing that needs looking into, I guess, is that not all parsed code is accepted by the [[format_tree]] procedure. I have not looked into this more closely, but the list commands complained over at least one piece of code I have. -- ''|Lars H]: Rereading the code, I cannot see why. Please give an example.'' (More to come ...) ---- [Lars H], 2005-04-02: It occurs to me that this should probably be contributed to [tcllib], but right now I have other things to do. Maybe next month. If in the meantime someone wants to contribute however, then some tests would be nice. ---- [Category Dev. Tools]