Config file using slave interp

A slave interpreter can be put to work as a parser for data in script format.

See Also

TDL
Matthias Hoffmann - Tcl-Code-Snippets - Misc - Readprof
scriptSplit
Also splits a script into commands, since it isn't actinig as an interpreter, doesn't perform substitutions and doesn't allow for the insertion of commands that could process commands parsed from the script. Since it is a Tcl script itself, scriptSplit is not as performant.
Techniques for reading and writing application configuration files
ycl parser interp
Adapted from AMG's example below.

Description

AMG: While there are many possible uses for this functionality, I think one of its most frequently useful practical applications is for reading configuration or database files. The configuration file is written according to the Dodekalogue, so comments, quoting, whitespace, semicolon, and other goodies can be used to spruce up the formatting. More interesting possibilities arise when variables and commands are thrown into the mix. For example, if there's an if command, only the portion of the script inside the chosen branch will be echoed to the result. Looping commands can cause multiple copies of the script to be emitted, with variations due to variable substitution. Backslashes work. And so on.

package require Tcl 8.6
proc parse {script} {
    set int [interp create -safe]
    try {
        $int eval {unset {*}[info vars]}
        $int eval {rename ::tcl::info::frame infoframe~}
        foreach command [$int eval {info commands}] {$int hide $command}
        $int invokehidden namespace delete\
                {*}[$int invokehidden namespace children]
        $int alias unknown apply {{int args} {
            upvar 1 result result
            lappend result [dict get [$int invokehidden infoframe~ 0] cmd]
            list
        }} $int
        set result {}
        $int eval $script
        return $result
    } finally {
        interp delete $int
    }
}

Also, here is a version that's much simpler but strips out the first word of the command line if that word is "unknown". It also doesn't permit exposing Tcl commands that do things other than get echoed to the output, for example if or foreach.

proc parse {script} {
    set int [interp create -safe]
    try {
        $int eval {namespace delete ::}
        $int alias unknown apply {args {upvar 1 r r; lappend r $args; list}}
        set r {}
        $int eval $script
        return $r
    } finally {
        interp delete $int
    }
}

The only validation this code does is to make sure that its input satisfies the Dodekalogue. It doesn't attempt to interpret the meaning of its input. That is left to the code that calls [parse].

When the data has a nested structure, [parse] can be called repeatedly to pull it apart. For example, if the configuration file has a "command" that takes an argument which itself is structured like a Tcl script, just call [parse] on its argument to turn it into an easily-processed list.

This code works very well with a templating system, so you can write templates with variables that are filled in with data from user-supplied Tcl-like configuration files. I do this at work, with great results. I made a "compiler.tcl" script which (repeatedly) calls [parse], analyzes the results, checks syntax (e.g. invalid commands), then prints a dict that is accepted by the -file argument of my Templates and subst script.


This alternate formation of unknown changes the output to be a dict-like alternating list of commands and argument lists, such as is described above. Merge this into the above script to make it capable of producing output that is directly compatible with the -file argument of my Templates and subst script.

$int alias unknown apply {{int args} {
    upvar 1 result result
    set args [lassign [dict get [$int invokehidden infoframe~ 0] cmd] name]
    lappend result $name $args
    list
}

Or instead use code similar to the following to format and print the output of [parse]:

foreach line [parse [read stdin]] {
    puts [list [lindex $line 0] [lrange $line 1 end]]
}

One interesting possibility is to selectively expose Tcl commands (or custom procs) to the child interpreter so that the configuration files can use them. For example, if [foreach] is exposed, then this configuration file:

# First specify the foo.
foo bar bas
# Use lots of quux!
foreach a {1 2 3} b {7 8 9} {c d} {p q r s t u} {
   quux $a$b $c$d
}

"compiles" into this list:

{foo bar bas} {quux 17 pq} {quux 28 rs} {quux 39 tu}

A drawback of my approach is that anything can be put into square brackets with nonsensical results:

[] [foo [bar]] [bas]

results in the following:

{bar} {foo ""} {bas} {"" "" ""}

At least this clearly demonstrates Tcl's substitution order rules. :^) Plus I bet you didn't know that [] is yet another way to obtain the empty string.


AMG: Here's a Tcl 8.4-compatible version of the "simpler" variation above. (The more advanced variation isn't possible in Tcl 8.4 due to lack of [info frame].) The [try] command makes things so much easier... :^(

proc parse_helper {args} {
    upvar 1 r r
    lappend r $args
    list
}
proc parse {script} {
    set int [interp create -safe]
    set r {}
    set code [catch {
        $int eval {namespace delete ::}
        $int alias unknown parse_helper
        $int eval $script
    } result]
    interp delete $int
    if {$code} {
        return -code $code $result
    } else {
        return $r
    }
}

AMG: Why did I use [info frame]??? I really don't remember! It's much simpler to just use the arguments to [unknown]. I guess for whatever reason I wanted the unsubstituted form of the data. Well, now I want the substituted form!

PYK 2017-03-30: The info frame method doesn't swallow the word "unknown" when it is an explicit command in the data. Also, the substituted form isn't much use unless one can differentiate between "normal" commands in the data and substitutions, a feature that's high on my wishlist.

[prettyparse] and [prettyprint]

AMG: Here's code both to read and write a configuration or data file using a TDL-like format. The internal representation is a list of dicts, where each dict is a single data object. The "tag" and "body" keys are reserved to identify the tag name and the child data object list, respectively; remaining keys are for whatever attributes you want. There's no schema validation; anything goes. Uncomment the commented-out lines to start with a totally clean interpreter, or leave them commented-out to make macros possible. More on that later.

package require Tcl 8.6

proc prettyparse {script} {
    set i [interp create -safe]
    try {
#       $i eval {unset {*}[info vars]}
#       foreach command [$i eval {info commands}] {$i hide $command}
#       $i invokehidden namespace delete {*}[$i invokehidden namespace children]
        $i alias unknown apply {{i tag args} {
            upvar 1 result result
            set e [concat [list tag $tag]\
                          [lrange $args 0 [expr {([llength $args] & ~1) - 1}]]]
            if {[llength $args] % 2} {
                set saved $result
                set result {}
                $i eval [lindex $args end]
                lappend e body $result
                set result $saved
            }
            lappend result $e
            list
        }} $i
        set result {}
        $i eval $script
        return $result
    } finally {
        interp delete $i
    }
}

proc prettyprint {data {level 0}} {
    set ind [string repeat "    " $level]
    incr level
    set result {}
    foreach e $data {
        set line $ind[concat [list [dict get $e tag]] [dict remove $e tag body]]
        if {[dict exists $e body] && [llength [dict get $e body]]} {
            append line " {\n[prettyprint [dict get $e body] $level]\n$ind}"
        }
        lappend result $line
    }
    join $result \n
}

If the data file looks like this (example by JAL):

# Define some meats.
meat -name ham -calories 200 -usda_grade AA
meat -name turkey -calories 150
meat -name bacon -calories 100 -usda_grade B
# Now make some sandwiches.
sandwich -name italian_bmt {
   meat -name ham
   meat -name turkey
   cheese -name provolone
   extra -name tomato
}
sandwich -name nuclear_sub {
   meat -name turkey
   cheese -name provolone
   extra -name horseradish
}

Then here's the output of [prettyparse] (whitespace added for readability):

{tag meat -name ham -calories 200 -usda_grade AA}
{tag meat -name turkey -calories 150}
{tag meat -name bacon -calories 100 -usda_grade B}
{tag sandwich -name italian_bmt body {
    {tag meat -name ham}
    {tag meat -name turkey}
    {tag cheese -name provolone}
    {tag extra -name tomato}
}}
{tag sandwich -name nuclear_sub body {
    {tag meat -name turkey}
    {tag cheese -name provolone}
    {tag extra -name horseradish}
}}

Running this though [prettyprint] gives the original back, sans comments.

Now, the really exciting part: macros! Here's a new data file that defines and uses macros:

# Define some macros.
proc mymeat {name calories {usda_grade ""}} {
    if {$usda_grade eq ""} {
        meat -name $name -calories $calories
    } else {
        meat -name $name -calories $calories -usda_grade $usda_grade
    }
}
proc mysandwich {name meats cheeses extras} {
    # This macro makes one sandwich.
    sandwich -name $name {
        foreach meat $meats {
            meat -name $meat
        }
        foreach cheese $cheeses {
            cheese -name $cheese
        }
        foreach extra $extras {
            extra -name $extra
        }
    }
}
# Use the macros.
set define_meats [expr {2 + 2 == 4}]
if {$define_meats} {
    mymeat ham 200 AA; mymeat turkey 150; mymeat bacon 100 B
}
mysandwich italian_bmt {ham turkey} provolone tomato
mysandwich nuclear_sub turkey provolone horseradish

Passing this text as the argument to [prettyparse] gives the same result as before, which is totally awesome!

Also I should add that the macro (proc) definitions can be put in a separate file which the configuration file sources. Or the source'ing, etc. can be done by the interpreter before evaluating the configuration file.

Misc

PYK 2016-11-05: One thing is missing that's needed to really unleash this technique: A way for the handler for unknown commands to determine that the interpreter is currently performing a substitution, as opposed to evaluating a command in the main script. Perhaps info frame or a similar command can someday indicate "substitution level", or even better, a list offsets into the script for the currently-active commands.

PYK 2018-03-29: Although AMG's implementation above nicely sidesteps this issue by inspecting [info frame 0].