Version 3 of data is code

(Page begun by Lars H, 11 March 2007; feel free to expand.)

Data is code is a powerful programming technique for working with complicated data, especially tree-like data. The idea is that the data structure itself can also be regarded as a command, so that traversing a data structure can simply be to evalute it! By providing different definitions (of the basic node types) in different namespaces, it is possible to implement a variety of different operations as evaluations of the same data structure.

The idea supports polymorphism to about the same degree as Object-oriented Programming, but does so in a quite different way. While it in OOP is the data (objects) that have to know all the operations (methods) that may act upon it, data-is-code requires that the operations (in this setting: namespaces) know all the types of data that they may encounter. Both support extensions to about the same degree:

In OOP, a new type can be implemented as a new class, whereas a new operation can be implemented as a new method on the ancestor class.
With data-is-code, a new type is implemented by defining commands for it in all operation namespaces, whereas a new operation is implemented as a new namespace.

Where OOP can use inheritance to share some operation definitions (make them the same as for some other type), data-is-code can do the same by creating aliases. The main difference is that objects, being commands even in cases where they're primarily used as containers for data, need to be explicitly destroyed and therefore objects add the burden of lifetime management. Pure data that is sometimes treated as code does not have that problem.

An example: Algebraic expressions

An example which illustrates many of these points can be provided by a data structure for simple algebraic expressions. Suppose an expression is a list on one of the forms

: + expression expression
: - expression expression
: * expression expression
: / expression expression
: var name
: const value

With the obvious interpretation of +, -, etc., we find that the expression (2x-1)/(x+3) is represented by the data structure

  / {- {* {const 2} {var x}} {const 1}} {+ {var x} {const 3}}

So far, this is just a data structure; there are no operations which act on it. (The +, -, and friends are node types in the general terminology above.) A very natural operation would however be to evaluate the expression, which can be implemented as follows:

  namespace eval algexpr::eval {
     proc + {term1 term2} {
        expr {[eval $term1] + [eval $term2]}
     }
     proc - {term1 term2} {
        expr {[eval $term1] - [eval $term2]}
     }
     proc * {factor1 factor2} {
        expr {[eval $factor1] * [eval $factor2]}
     }
     proc / {numer denom} {
        expr {[eval $numer] / double([eval $denom])}
     }
     proc const {val} {return $val}
     proc var {name} {
        upvar #0 $name val
        return $val
     }
  }

The command to actually evaluate an expression $datum is then

  namespace inscope algexpr::eval $datum

(or [namespace eval algexpr::eval $datum], but the syntax of namespace inscope is often more appropriate).

  % set datum {/ {- {* {const 2} {var x}} {const 1}} {+ {var x} {const 3}}}
  % set x 1
  % namespace inscope algexpr::eval $datum
  0.25
  % set x 2
  % namespace inscope algexpr::eval $datum
  0.6
  % set x 3
  % namespace inscope algexpr::eval $datum
  0.833333333333

Works fine, but it is probably not all that practical to fetch the variable values from global variables. A more natural construction is to provide them in the evaluation call, but that requires a different set of definitions, and is thus a different operation.

  namespace eval algexpr::eval2 {
     proc + {term1 term2 args} {
        expr {[eval $term1 $args] + [eval $term2 $args]}
     }
     proc - {term1 term2 args} {
        expr {[eval $term1 $args] - [eval $term2 $args]}
     }
     proc * {factor1 factor2 args} {
        expr {[eval $factor1 $args] * [eval $factor2 $args]}
     }
     proc / {numer denom args} {
        expr {[eval $numer $args] / double([eval $denom $args])}
     }
     proc const {val args} {return $val}
     proc var {name args} {
        foreach {var val} $args {
           if {$name eq $var} then {return $val}
        }
        error "No value specified for '$name'"
     }
  }

  0.25
  % namespace inscope algexpr::eval2 $datum x 2
  0.6
  % namespace inscope algexpr::eval2 $datum x 3
  0.833333333333

variables

  % namespace inscope algexpr::eval2 $datum2 x 1 y 2
  0.2

extra arguments, which is a bit artificial.)

Evaluating expressions is a rather simple operation however, so what about something trickier, like differentiation? This would mean that

differentiation rules are straightforward to implement:

     proc + {term1 term2 var} {
        list + [eval $term1 [list $var]] [eval $term2 [list $var]]
        list + [eval $term1 [list $var]] [eval $term2 [list $var]]
     proc - {term1 term2 var} {
        list - [eval $term1 [list $var]] [eval $term2 [list $var]]
        list - [eval $term1 [list $var]] [eval $term2 [list $var]]
     proc * {factor1 factor2 var} {
        list + [
        list + [
           list * [eval $factor1 [list $var]] $factor2
        ] [
           list * $factor1 [eval $factor2 [list $var]]
        ]
     }
     proc / {factor1 factor2 var} {
        list / [
        list / [
           list - [
              list * [eval $factor1 [list $var]] $factor2
           ] [
              list * $factor1 [eval $factor2 [list $var]]
           ]
        ] [
           list * $factor2 $factor2
        ]
     proc const {val var} {
        list const 0
        list const 0
     proc var {name var} {
        list const [expr {$name eq $var}]
        list const [expr {$name eq $var}]
  }

  / {- {* {- {+ {* {const 0} {var x}} {* {const 2} {const 1}}} {const 0}} {+ {var x} {const 3}}} {* {- {* {const 2} {var x}} {const 1}} {+ {const 1} {const 0}}}} {* {+ {var x} {const 3}} {+ {var x} {const 3}}}
  % namespace inscope algexpr::eval2 $ddx_datum x 2
  0.28

What does the $ddx_datum expression look like in more traditional notation, though? Conversion to expr syntax is of course another operation, so now that we're warmed up, it is probably easier to implement it than to decipher the $ddx_datum by hand. The only issue one needs to solve is how to know when to insert parentheses, but that's easily handled by specifying the operation to be to

convert an algexpr to expr form such that it can safely be made an operand of an expr-operation with priority priority, where 0 means no operation, 1 means +, 2 means *, and 3 is as denominator.

The following will then do the trick

     proc + {term1 term2 priority} {
        set res "[eval $term1 [list 1]]+[eval $term2 [list 1]]"
        set res "[eval $term1 [list 1]]+[eval $term2 [list 1]]"
        if {$priority>1} then {
           return ($res)
        } else {
           return $res
        }
     proc - {term1 term2 priority} {
        set res "[eval $term1 [list 1]]-[eval $term2 [list 2]]"
        set res "[eval $term1 [list 1]]-[eval $term2 [list 2]]"
        if {$priority>1} then {
           return ($res)
        } else {
           return $res
        }
     proc * {factor1 factor2 priority} {
        set res "[eval $factor1 [list 2]]*[eval $factor2 [list 2]]"
        set res "[eval $factor1 [list 2]]*[eval $factor2 [list 2]]"
        if {$priority>2} then {
           return ($res)
        } else {
           return $res
        }
     proc / {numer denom priority} {
        set res "[eval $numer [list 2]]/[eval $denom [list 3]]"
        set res "[eval $numer [list 2]]/[eval $denom [list 3]]"
        if {$priority>2} then {
           return ($res)
        } else {
           return $res
        }
     proc var {name priority} {
        if {[regexp -nocase -- {^[a-z_][a-z0-9_]*$} $name]} then {
        if {[regexp -nocase -- {^[a-z_][a-z0-9_]*$} $name]} then {
           return "\$$name"
        } else {
           return "\[[list set $name]\]"
        }
     proc const {val priority} {return $val}
  }

  (2*$x-1)/($x+3)

  ((0*$x+2*1-0)*($x+3)-(2*$x-1)*(1+0))/(($x+3)*($x+3))

constant subexpressions that any human would have evaluated during the differentiation operation. One way to get rid of these would be to improve the differentiation operation to do these simplifications on the fly, but another is to implement a more generic operation for partial evaluation of expressions. The actual situations one has to deal with are the same in both cases, so the more generic operation seems preferable.

     proc const {val args} {list const $val}
     proc var {name args} {
        foreach {var val} $args {
        foreach {var val} $args {
           if {$name eq $var} then {return [list const $val]}
        }
        return [list var $name]
     # These two were the central ones, but most of the actual 
     # work remains, since there are many small cases where
     # partial evaluation is possible.
     proc + {term1 term2 args} {
        set res1 [eval $term1 $args]
        set res1 [eval $term1 $args]
        foreach {t1 v1} $res1 break
        set res2 [eval $term2 $args]
        foreach {t2 v2} $res2 break
        if {$t1 eq "const"} then {
           if {$t2 eq "const"} then {
              list const [expr {$v1 + $v2}]
           } elseif {$v1==0} then {
              set res2
           } else {
              list + $res2 $res1
           }
        } elseif {$t2 eq "const" && $v2==0} then {
           set res1
        } else {
           list + $res1 $res2
        }
     proc - {term1 term2 args} {
        set res1 [eval $term1 $args]
        set res1 [eval $term1 $args]
        foreach {t1 v1} $res1 break
        set res2 [eval $term2 $args]
        foreach {t2 v2} $res2 break
        if {$t1 eq "const"} then {
           if {$t2 eq "const"} then {
              list const [expr {$v1 - $v2}]
           } else {
              list - $res1 $res2
           }
        } elseif {$t2 eq "const" && $v2==0} then {
           set res1
        } else {
           list - $res1 $res2
        }
     proc * {factor1 factor2 args} {
        set res1 [eval $factor1 $args]
        set res1 [eval $factor1 $args]
        foreach {t1 v1} $res1 break
        set res2 [eval $factor2 $args]
        foreach {t2 v2} $res2 break
        if {$t1 eq "const"} then {
           if {$t2 eq "const"} then {
              list const [expr {$v1 * $v2}]
           } elseif {$v1==0} then {
              list const 0
           } elseif {$v1==1} then {
              set res2
           } else {
              list * $res1 $res2
           }
        } elseif {$t2 eq "const"} then {
           if {$v2==0} then {
              list const 0
           } elseif {$v2==1} then {
              set res1
           } else {
              list * $res2 $res1
           }
        } else {
           list * $res1 $res2
        }
     proc / {numer denom args} {
        set res1 [eval $numer $args]
        set res1 [eval $numer $args]
        foreach {t1 v1} $res1 break
        set res2 [eval $denom $args]
        foreach {t2 v2} $res2 break
        if {$t2 eq "const"} then {
           if {$t1 eq "const"} then {
              list const [expr {$v1 / double($v2)}]
           } elseif {$v2==1} then {
              set res1
           } else {
              list / $res1 $res2
           }
        } else {
           list / $res1 $res2
        }
  }

Some examples:

  const 0.6
  % namespace inscope algexpr::parteval $datum2 x 2
  / {const 3} {+ {var y} {const 3}}

  / {- {* {const 2} {+ {var x} {const 3}}} {- {* {const 2} {var x}} {const 1}}} {* {+ {var x} {const 3}} {+ {var x} {const 3}}}
  % namespace inscope algexpr::toexpr $ddx_datum_simplified 0
  (2*($x+3)-(2*$x-1))/(($x+3)*($x+3))

about the degree expected from a math student is really rather complicated, once you have to instruct a computer to do it, and this representation with binary +, -, *, and / is not very practical for the task. Better solutions are beyond the simple example, however.)

Let us instead consider how to extend the algexpr data structure with a new node type, say exp for the (natural base) exponential function. This requires defining exp procedures in all the namespaces where this node type may be expected:

     proc exp {arg} {expr {exp([eval $arg])}}
  }
  namespace eval algexpr::eval2 {
     proc exp {arg args} {expr {exp([eval $arg $args])}}
  }
  namespace eval algexpr::diff {
     proc exp {arg var} {
        list * [list exp $arg] [eval $arg [list $var]]
        list * [list exp $arg] [eval $arg [list $var]]
  }
  namespace eval algexpr::toexpr {
     proc exp {arg priority} {return "exp([eval $arg [list 0]])"}
  }
  namespace eval algexpr::parteval {
     proc exp {arg args} {
        set ares [eval $arg $args]
        set ares [eval $arg $args]
        if {[lindex $ares 0] eq "const"} then {
           list const [expr {exp([lindex $ares 1])}]
        } else {
           list exp $ares
        }
  }

  % namespace inscope algexpr::eval2 $datum3 x 2 y 1
  7.38905609893
  % set ddy_datum3 [namespace inscope algexpr::diff $datum3 y]
  * {exp {/ {var x} {var y}}} {/ {- {* {const 0} {var y}} {* {var x} {const 1}}} {* {var y} {var y}}}
  % namespace inscope algexpr::toexpr $ddy_datum3 0
  exp($x/$y)*(0*$y-$x*1)/($y*$y)
  % set ddy_datum3 [namespace inscope algexpr::parteval $ddy_datum3]
  * {exp {/ {var x} {var y}}} {/ {- {const 0} {var x}} {* {var y} {var y}}}
  % namespace inscope algexpr::toexpr $ddy_datum3 0
  exp($x/$y)*(0-$x)/($y*$y)

Sharing commands

Sharing commands The examples above never gave an opportunity to use the same command for more than one thing, but using interp alias or namespace import to make one command appear in several places is a standard Tcl programming technique that is explained elsewhere. There are however some special issues one should take note of:

Avoid colons in node types: Colons interact strangely with the namespace separator ::, so you may be unable to make use of, or even speak about, the command you want.
Avoid colons as node types: Colons interact strangely with the namespace separator ::, so you may be unable to make use of, or even speak about, the command you want.
Be aware of the namespace context: If you import or alias a proc into a namespace, then the namespace context of this proc does not follow. For set-ups such as the above, where the namespace context is an essential foundation for the interpretation, this can be disastrous.

There is a rather simple solution to the namespace problem, however: use uplevel instead of eval! This works because it causes a child node to be interpreted as a command in the same context as the parent node was, so if for example the child and parent nodes are of the same type then they will both be handled by the same command.

If this still seem mysterious to you, then remember that namespace eval and kin (e.g. namespace inscope) adds a level visible to uplevel. The following provides an example:

  proc ::a {} {
     puts "- I'm ::a, called as [info level 0]."
     return ::a
  }
  proc ::foo::a {} {
     puts "- I'm ::foo::a, called as [info level 0]."
     return ::foo::a
  }
  proc ::b {} {
     puts -nonewline "- I'm ::b, called as [info level 0]; "
     puts "my namespace is [namespace current]."
     puts "::b says: I've just called [uplevel 1 a]."
  }
  interp alias {} ::foo::b {} ::b

  - I'm ::b, called as b; my namespace is ::.
  - I'm ::a, called as a.
  ::b says: I've just called ::a.

and the call [namespace inscope foo b] prints

  - I'm ::b, called as ::b; my namespace is ::.
  - I'm ::foo::a, called as a.
  ::b says: I've just called ::foo::a.

because b calls the a visible in the namespace from which b itself was called, even if that isn't where b resides.

An alternative approach to the namespace problem, which can be useful for generic utility commands (that via aliases with some extra arguments can implement a whole score of commands), is to supply the intended namespace as an extra argument. Then if one defines

     set leftval [namespace eval $ns $left $args]
     set rightval [namespace eval $ns $right $args]
     expr \$leftval $op \$rightval
  }

can all be given as aliases to this exprbinop:

  interp alias {} algexpr::eval::- {} exprbinop algexpr::eval -
  interp alias {} algexpr::eval::* {} exprbinop algexpr::eval *
  interp alias {} algexpr::eval2::+ {} exprbinop algexpr::eval2 +
  interp alias {} algexpr::eval2::- {} exprbinop algexpr::eval2 -
  interp alias {} algexpr::eval2::* {} exprbinop algexpr::eval2 *

predict what level to uplevel to.

Handling unknown node types

Traversals with state Sometimes, it is important to maintain a state when traversing a data structure, and update that state for each node processed. That each node is processed in a separate local context makes it trickier to maintain such a state than it would be if a single procedure call did all the work, but there are some approaches one can use:

Use a global or namespace variable. Inelegant, but often easy and good enough.
Use a local variable that is accessed using upvar. Gets into trouble if uplevel or namespace eval is being used as shown in the previous section.
Pass the previous state as a command argument and return it as the command result. This is the functional programming idiom.
Pass the previous state as a command argument and return it as the command value. This is the functional programming idiom.

Collecting children in a sublist or placing them directly the node list

Thanks for your very detailed data is code page!

My best example for data is code was my way to read VRML files, to import their structure and their geometrical data:

Read the file data
Convert the file data to a valid list
Reorganize the file data list to have all VRML node attributes in sub lists
Evaluate this list with the scheme, that each VRML node name could be the name of a tcl command or procedure. If such a tcl procedure exists, than evaluate it with the VRML node attributes as arguments, if not ignore this VRML node.

Such becomes VRML data executable code.