grok

AMG: Here I present a Tcl version of some configuration reading code I describe in Comparing Tcl with Python. It is very flexible about the syntax of the configuration files it reads. It uses apply and child interps to get the job done, and the syntax definitions are built up by a sort of functional composition.

It's also slow... *grumble*.

There are four parts to the program given below:

  • First is the package itself. If you want to use grok, grab this part.
  • Second is a sample syntax definition. Unless you're also writing a flight simulator, you won't need this. :^) It is a fairly exhaustive example, and it's also realistic because it's patterned after something I actually use at work.
  • Third is a hunk of sample configuration data to be read by grok. Most of the numbers and other specifics are made up, so don't bother correcting me if I got the wingspan wrong or anything like that.
  • Fourth is a sample application using the parsed data. It just prints out a list of models formatted as a C enum.

Scroll past the code to find discussion.


 #!/bin/sh
 # The next line restarts with tclsh.\
 exec tclsh "$0" ${1+"$@"}

 ######### Package for reading complex configuration or database files. #########
 namespace eval grok {
     # List of generally useful namespace procedures.
     namespace export interpret element block default_adapter multi_adapter\
                      list_adapter restrict_length restrict_value

     # Next instance ID to assign.
     variable next_id 0

     # [subst] variant that only processes substitution characters preceded by
     # backticks (`).  When nesting multiple invocations of this command, use
     # multiple backticks to select at the level to perform the substitution.
     proc sub {str} {
         set str [string map [list {`[} {[} {[} {\[} {`]} {]} {]} {\]}\
                                   {`$} {$} {$} {\$} {``} {`} \\ \\\\] $str]
         uplevel 1 [list subst $str]
     }

     # Modify a script to link $self against a particular variable.
     proc link {code selfname} {
         return "[list upvar 1 $selfname self]\n$code"
     }

     # Invoke a script with $self linked against a variable.
     proc invoke {arg_list code selfname args} {
         apply [list $arg_list [link $code $selfname]] {*}$args
     }

     # Instantiate a new element.
     proc instantiate {name init} {
         # Create an array in which to store instance data.
         variable next_id
         set id $next_id
         incr next_id
         set new ::grok::elem_$id

         # Pre-seed it with the instance name.
         array set $new [list name $name]

         # Call the initialization code.
         invoke {} $init $new

         # Return the array name to the caller for use with the other methods.
         return $new
     }

     # Interpret a string of configuration data, returning the parsed result.
     proc interpret {def args} {
         set new [instantiate "(toplevel)" [dict get $def init]]
         invoke [dict get $def set_args] [dict get $def set_body] $new {*}$args
         set result [invoke {} [dict get $def get] $new]
         invoke {} [dict get $def destroy] $new
         unset $new
         return $result
     }

     # Install a copy of this namespace to a child interp.  A copy is made to
     # ensure that the new procedures execute inside the child interp.
     proc install {interp} {
         $interp eval {
             namespace eval grok {
                 variable next_id 0
             }
         }
         foreach name [info procs ::grok::*] {
             $interp eval [list proc $name [info args $name] [info body $name]]
         }
     }

     # Return the most basic element syntax definition.
     proc element {} {
         return {
             init {} destroy {} set_args {value}
             set_body {
                 if {[info exists self(value)]} {
                     error "multiple definitions of \"$self(name)\""
                 }
                 set self(value) $value
             } get {
                 if {![info exists self(value)]} {
                     error "no value specified for \"$self(name)\""
                 }
                 return $self(value)
             }
         }
     }

     # Return a block syntax definition.  The argument is a dictionary mapping
     # from element names to element syntax definitions.
     proc block {syntax} {
         # Assemble the initialization, destruction, and get scripts for the
         # child interp.
         set b_init {}
         set b_destroy {}
         set b_get {dict create}
         set index 0
         dict for {cmd def} $syntax {dict with def {
             set b_init [sub {
                 `$b_init
                 set new [grok::instantiate `[list `$cmd`] `[list `$init`]]
                 lappend self(value) $new
                 proc `[list `$cmd`] `[list `$set_args`]\
                         [grok::link `[list `$set_body`] $new]
             }]

             set b_destroy [sub {
                 `$b_destroy
                 grok::invoke {} `[list `$destroy`] [lindex $self(value) `$index]
                 unset [lindex $self(value) `$index]
             }]

             set b_get [sub {`$b_get `[list `$cmd`]\
                 [grok::invoke {} `[list `$get`] [lindex $self(value) `$index]]}]

             incr index
         }}

         # Generate and return the block's syntax definition, including code to
         # run both inside and outside the child interp.
         return [dict create\
             init [sub {
                 set self(value) {}
                 set self(interp) [interp create]
                 grok::install $self(interp)
                 $self(interp) eval `[list `$b_init`]
                 $self(interp) eval [grok::sub {
                     proc unknown {name args} {
                         if {[list ``[list ``$self(name)``]] eq "(toplevel)"} {
                             error "invalid command \"$name\" at top level"
                         } else {
                             error "invalid command \"$name\" inside\
                                     ``$self(name) block"
                         }
                     }
                 }]
             }]\
             destroy [sub {
                 $self(interp) eval `[list `$b_destroy`]
                 interp delete $self(interp)
             }]\
             get [sub {
                 $self(interp) eval `[list `$b_get`]
             }]\
             set_args def\
             set_body {
                 $self(interp) eval $def}\
         ]
     }

     # Modify a syntax definition to have a default value in case none was
     # supplied.
     proc default_adapter {def default} {
         dict with def {
             set init [sub {
                 `$init
                 set self(is_set) false
             }]
             set set_body [sub {
                 `$set_body
                 set self(is_set) true
             }]
             set get [sub {
                 if {!$self(is_set)} {
                     return `[list `$default`]
                 }
                 `$get
             }]
         }
         return $def
     }

     # Modify a syntax definition to support being invoked multiple times.  Each
     # invocation generates a new list element in its value.
     proc multi_adapter {def} {
         dict with def {
             set set_body [sub {
                 set new [grok::instantiate $self(name) `[list `$init`]]
                 lappend self(value) $new
                 grok::invoke {} `[list `$init`] $new
                 grok::invoke `[list `$set_args`]\
                         `[list `$set_body`] $new {*}$args
             }]
             set init {set self(value) {}}
             set set_args args
             set get [sub {
                 set result [list]
                 foreach child $self(value) {
                     lappend result [grok::invoke {} `[list `$get`] $child]
                 }
                 return $result
             }]
             set destroy [sub {
                 foreach child $self(value) {
                     grok::invoke {} `[list `$destroy`] $child
                 }
             }]
         }
         return $def
     }

     # Modify a syntax definition to support accepting multiple sets of
     # arguments.  Each argument set is used to construct a new list element in
     # the element's value.
     proc list_adapter {def} {
         dict with def {
             # Check for some things unsupported by [list_adapter].
             if {[lindex $set_args end] eq "args"} {
                 error "variadic argument lists not supported"
             }
             foreach arg $set_args {
                 if {[llength $arg] == 2} {
                     error "default arguments not supported"
                 }
             }
             set group [llength $set_args]

             set set_body [sub {
                 if {[llength $args] % `$group != 0} {
                     error "wrong # args for element: should be\
                             \"$self(name) `[join `$set_args " "`] ?...?\""
                 }

                 while {[llength $args] != 0} {
                     set batch [lrange $args 0 `[expr {`$group - 1}`]]
                     set args [lrange $args `[list `$group`] end]

                     set new [grok::instantiate $self(name) `[list `$init`]]
                     lappend self(value) $new
                     grok::invoke {} `[list `$init`] $new
                     grok::invoke `[list `$set_args`] `[list `$set_body`] $new\
                             {*}$batch
                 }
             }]

             set init {set self(value) {}}
             set set_args args
             set get [sub {
                 set result [list]
                 foreach child $self(value) {
                     lappend result [grok::invoke {} `[list `$get`] $child]
                 }
                 return $result
             }]
             set destroy [sub {
                 foreach child $self(value) {
                     grok::invoke {} `[list `$destroy`] $child
                 }
             }]
         }
         return $def
     }

     # Modify a syntax definition to fail if an incorrect number of elements is
     # present in the value.
     proc restrict_length {def {min 0} {max inf}} {
         set test "false"
         set msg {}
         if {$min != 0} {
             append test " || \[llength \$self(value)\] < $min"
             lappend msg "at least $min"
         }
         if {$max ne "inf"} {
             append test " || \[llength \$self(value)\] > $max"
             lappend msg "at most $max"
         }
         if {$min == $max} {
             set msg $min
         }

         dict with def {
             set get [sub {
                 if `[list `$test`] {
                     error "wrong # of values supplied for \"$self(name)\":\
                             must be `[join `$msg " and "`]"
                 }
                 `$get
             }]
         }
         return $def
     }

     # Modify a syntax definition to fail if the value is not in a domain.
     proc restrict_value {def valid} {
         set msg "invalid value supplied for \"\$self(name)\": must be\
                  [join $valid ", "]"
         dict with def {
             set get [sub {
                 if {$self(value) ni `[list `$valid`]} {
                     error "invalid value supplied for \"$self(name)\": must be\
                            `[join `$valid ", "`]"
                 }
                 `$get
             }]
         }
         return $def
     }
 }

 ########################### Define a sample syntax. ############################
 namespace import grok::*

 # Basic element, used for scalars and strings.
 set element [element]

 # 3D vector.
 set vector [restrict_length [list_adapter $element] 3 3]

 # Damage characteristics.
 set damage [block [dict create\
     switch_id       [default_adapter $element none]\
     offset          [default_adapter $vector {0 0 0}]\
     fire_scale      [default_adapter $element 1]\
     explosion_scale [default_adapter $element 1]\
 ]]

 # On-rail identification code.
 set code [block [dict create\
     name        $element\
     description $element\
     id          [restrict_length [list_adapter $element] 1 inf]\
 ]]

 # Forward/right/down/heading/pitch/roll offset.
 set offset [block [dict create\
     offset      [default_adapter $vector {0 0 0}]\
     heading     [default_adapter $element 0]\
     pitch       [default_adapter $element 0]\
     roll        [default_adapter $element 0]\
 ]]
 set offset [default_adapter $offset [interpret $offset ""]]

 # Store definition, including offsets from the parent and to zero, one, or two
 # children.
 set store [block [dict create\
     class       [restrict_value $element {mount equip weapon}]\
     code        [multi_adapter $code]\
     parent      $offset\
     left        $offset\
     down        $offset\
     right       $offset\
 ]]

 # Station definition.  This is a store offset for class "station".
 set station [block [dict create\
     name        $element\
     description $element\
     id          $element\
     parent      $offset\
     left        $offset\
     down        $offset\
     right       $offset\
 ]]

 # Degree-of-freedom, also known as an articulated part.
 set dof [block [dict create\
     name        $element\
     description $element\
     id          $element\
 ]]

 # Switch ("component") state.
 set state $dof

 # Light or light point system.
 set light $dof

 # Switch ("component"--- the name switch is taken) on a model.
 set component [block [dict create\
     name        $element\
     description $element\
     id          $element\
     state       [multi_adapter $state]\
 ]]

 # Visual model definition, plus all associated data.
 set model [block [dict create\
     name        $element\
     enum        [default_adapter $element ""]\
     description $element\
     id          [default_adapter $element none]\
     ssig        [list_adapter $element]\
     store       [default_adapter $store none]\
     damage      [default_adapter $damage [interpret $damage ""]]\
     dof         [multi_adapter $dof]\
     component   [multi_adapter $component]\
     light       [multi_adapter $light]\
     station     [multi_adapter $station]\
 ]]

 # Light definition not attached to a particular model.
 set top_light [block [dict create\
     name        $element\
     description $element\
     class       [restrict_value $element {entity environment view
                                           view_group sensor system}]\
     instance    $element\
     component   $element\
 ]]

 # Top level of the configuration data.
 set top [block [dict create\
     model       [multi_adapter $model]\
     light       [multi_adapter $top_light]\
 ]]

 ####################### Parse sample configuration data. #######################
 set data [interpret $top {
     # Missile model.
     model {
         name aim9                                   ;# Short name.
         description "AIM-9 Sidewinder"              ;# Longer name.
         id 123                                      ;# Visual ID.
                                                     ;# Note: no SSIG given.
         store {                                     ;# Missile is a store.
             class weapon                            ;# Its class is "weapon".
             code {                                  ;# Multiple on-rail codes.
                 name aim9l                          ;# Usable as AIM-9L missile.
                 description "AIM-9L Sidewinder"     ;# Long name for this code.
                 id 1                                ;# On-rail code is 1.
             }
             code {
                 name aim9m6                         ;# Also AIM-9M-6.
                 description "AIM-9M-6 Sidewinder"
                 id 2
             }
             code {
                 name aim9m8                         ;# Also AIM-9M-8.
                 description "AIM-9M-8 Sidewinder"
                 id 3
             }
             code {
                 name aim9x                          ;# Also AIM-9X.
                 description "AIM-9X Sidewinder"
                 id 4
             }
             parent {                                ;# Define attach point.
                 offset 0 0 0.065                    ;# Diameter is 13cm.
                 roll 90                             ;# Spin the fins a bit.
             }
         }
     }

     # Sidewinder launch rail for F/A-18 wingtip.
     model {
         name lau7                                   ;# Short name.
         description "LAU-7/A"                       ;# Longer name.
                                                     ;# Launch rail is invisible.
         store {                                     ;# It is a store.
             class mount                             ;# It is mount hardware.
             down {offset 0 0 0.01}                  ;# It's ~10cm thick.
         }
     }

     # F/A-18A: the US's first strike fighter!
     model {
         name fa18a                                  ;# Short name.
         enum f18                                    ;# Really short name.
         description "F/A-18A Hornet"                ;# Longer name.
         id 456                                      ;# Visual ID.
         ssig 42 144                                 ;# Several external IDs.

         damage {                                    ;# Damage characteristics.
             switch_id 5                             ;# Damage mask switch ID.
             fire_scale 4                            ;# Make the fire this big.
             explosion_scale 4.5                     ;# Ditto for the explosion.
         }

         dof {                                       ;# Define an art. part.
             name brake                              ;# Give it a short name.
             description "Speed brake"               ;# Longer name.
             id 2                                    ;# Define visual ID.
         }

         component {                                 ;# Define a switch.
             name nozzle                             ;# Short name.
             description "Engine nozzles"            ;# Longer name.
             id 8                                    ;# Visual ID.
             state {                                 ;# First switch state.
                 name closed                         ;# Short name.
                 description "Minimally open"        ;# Longer name.
                 id 0                                ;# Visual ID.
             }
             state {                                 ;# Second switch state.
                 name open                           ;# Short name.
                 description "Fully open (A/B)"      ;# Afterburner!
                 id 1                                ;# Visual ID.
             }
         }

         light {                                     ;# Light system.
             name strobe                             ;# Short name.
             description "Strobe/anti-collision lights"  ;# Long name.
             id 2                                    ;# Visual ID.
         }

         station {                                   ;# Station/hard point.
             name left_wingtip                       ;# Short name.
             description "Left wingtip station"      ;# Long name.
             id 1                                    ;# Station ID.
             parent {offset 13.5 1 -0.5}             ;# Position.
             down {roll 90}                          ;# Rotate child by 90 deg.
         }
     }

     light {                                         ;# Global light system.
         name culture                                ;# Short name.
         description "Cultural lighting"             ;# Long name.
         class environment                           ;# CIGI class name.
         instance 0                                  ;# CIGI instance ID.
         component 0                                 ;# CIGI component ID.
     }
 }]

 ################################ Sample query. #################################
 puts "enum {"
 foreach model [dict get $data model] {
     if {[dict get $model id] ne "none"} {
         if {[dict get $model enum] ne ""} {
             set name MO_[string toupper [dict get $model enum]]
         } else {
             set name MO_[string toupper [dict get $model name]]
         }
         set id [dict get $model id]
         set comment [dict get $model description].
         if {[llength [dict get $model ssig]] != 0} {
             set ssig "SSIG: [join [dict get $model ssig] ", "]"
         } else {
             set ssig ""
         }
         set line [format "    %-10s =%4d, // %-30s%s" $name $id $comment $ssig]
         puts [regsub { +$} $line ""]
     }
 }
 puts "};"

 # vim: set ts=4 sts=4 sw=4 tw=80 et ft=tcl:

Got all that? :^)

If you run it, you will get the following output:

 enum {
     MO_AIM9    = 123, // AIM-9 Sidewinder.
     MO_F18     = 456, // F/A-18A Hornet.               SSIG: 42, 144
 };

If you don't, you probably don't have support for apply.

After grok does its magic, $data will contain something similar to the following:

 light {{description {Cultural lighting} name culture instance 0 class environment
 component 0}} model {{description {AIM-9 Sidewinder} light {} name aim9 enum {} id
 123 store {down {heading 0 pitch 0 roll 0 offset {0 0 0}} class weapon parent
 {heading 0 pitch 0 roll 90 offset {0 0 0.065}} right {heading 0 pitch 0 roll 0 offset
 {0 0 0}} code {{description {AIM-9L Sidewinder} name aim9l id 1} {description
 {AIM-9M-6 Sidewinder} name aim9m6 id 2} {description {AIM-9M-8 Sidewinder} name
 aim9m8 id 3} {description {AIM-9X Sidewinder} name aim9x id 4}} left {heading 0 pitch
 0 roll 0 offset {0 0 0}}} dof {} ssig {} station {} damage {explosion_scale 1
 fire_scale 1 switch_id none offset {0 0 0}} component {}} {description LAU-7/A light
 {} name lau7 enum {} id none store {down {heading 0 pitch 0 roll 0 offset {0 0 0.01}}
 class mount parent {heading 0 pitch 0 roll 0 offset {0 0 0}} right {heading 0 pitch 0
 roll 0 offset {0 0 0}} code {} left {heading 0 pitch 0 roll 0 offset {0 0 0}}} dof {}
 ssig {} station {} damage {explosion_scale 1 fire_scale 1 switch_id none offset {0 0
 0}} component {}} {description {F/A-18A Hornet} light {{description {Strobe/
 anti-collision lights} name strobe id 2}} name fa18a enum f18 id 456 store none dof
 {{description {Speed brake} name brake id 2}} ssig {42 144} station {{description
 {Left wingtip station} down {heading 0 pitch 0 roll 90 offset {0 0 0}} name
 left_wingtip id 1 parent {heading 0 pitch 0 roll 0 offset {13.5 1 -0.5}} right
 {heading 0 pitch 0 roll 0 offset {0 0 0}} left {heading 0 pitch 0 roll 0 offset {0 0
 0}}}} damage {explosion_scale 4.5 fire_scale 4 switch_id 5 offset {0 0 0}} component
 {{description {Engine nozzles} name nozzle id 8 state {{description {Minimally open}
 name closed id 0} {description {Fully open (A/B)} name open id 1}}}}}}

As messy as this looks, it's actually quite easy for Tcl to process. It's a dictionary with two keys, light and model. The values for each of these keys are lists, with one element per light or model, respectively.


After writing all that code, I don't know if I have the energy to describe it too. :^) Let me try anyway.

The heart of the syntax definition is the element. Elements are defined by dictionaries with the following keys and values:

  • init: Element initialization script.
  • destroy: Element destruction cleanup script.
  • set_args: List of arguments accepted by the set script.
  • set_body: Element value assignment script.
  • get: Element value interrogation script.

The most basic element is defined thusly:

  • init: (empty)
  • destroy: (empty)
  • set_args: {value}
  • set_body: Set self(value) to $value.
  • get: Return $self(value).

For this element, set_body includes a constraint that the value cannot be set more than once, and get has the constraint that the value must have been set at least once.

An element is instantiated, assigned, interrogated, and destroyed by the [interpret] command or by a containing element. One example of a container element is a block, which is defined something like the following:

  • init: Create child interp. Run all contained element initialization scripts inside this child interp. For each contained element, create a proc in the child interp linked to its assignment script.
  • destroy: Run all contained element destruction scripts inside the child interp. Destroy the child interp.
  • set_args: {def} - a script to run in the child interp.
  • set_body: $self(interp) eval $def.
  • get: Build a dictionary mapping from the names of the contained elements to the values returned by their interrogation scripts.

Two more examples are elements modified by [list_adapter] or [multi_adapter]. These commands derive new syntax definitions from the old. The new syntax definitions are a bit like the below:

  • init: Initialize the container, not the contained elements.
  • destroy: Destroy the contained elements, followed by the container.
  • set_args: {args} - binding to arguments is handled by the assignment script.
  • set_body: Instantiate new contained elements and call their assignment scripts with the values passed as $args.
  • get: Call the interrogation scripts for all contained elements and put them in a list.

Other syntax modification functions include [default_adapter], [restrict_length], and [restrict_value], which work just like you'd imagine.


This code is slow. :^( I'm guessing it has to do with the fact that it's composed entirely of large numbers of one-off, generated, customized scripts that usually only run once each. If I can group more of this into a common library, it might be faster. Or is there a better approach? Maybe it should be rewritten using an object system rather than implementing its own.


The child interps are not -safe. But they could be. I decided to leave them non-safe for ease of debugging. (That's a funny thing to say; this thing is a nightmare to debug!) If used in production, just add -safe, and maybe go a step further by stripping out the standard Tcl commands.

For now, have fun [puts]'ing from inside the configuration file!

Actually, this opens up some very interesting possibilities. The configuration file can define procedures (macros, they might be called) and such to pretty much generate itself. It can also [source] other files, with the names determined by [glob] or maybe configuration files of its own. Configuring a configuration file... what a delightfully scary thought!

The [interpret] invocation can probably be changed to [interpret $top {source config.file}] so that the data is neither contained in nor read by the program itself.


Any other comments? I've had enough thinking for one day, so it's your turn.


See also: