(Page begun by Lars H, 11 March 2007; feel free to expand.)
Data is code is a powerful programming technique for working with complicated data, especially tree-like data. The idea is that the data structure itself can also be regarded as a command, so that traversing a data structure can simply be to evalute it! By providing different definitions (of the basic node types) in different namespaces, it is possible to implement a variety of different operations as evaluations of the same data structure.
The idea supports polymorphism to about the same degree as Object-oriented Programming, but does so in a quite different way. While it in OOP is the data (objects) that have to know all the operations (methods) that may act upon it, data-is-code requires that the operations (in this setting: namespaces) know all the types of data that they may encounter. Both support extensions to about the same degree:
Where OOP can use inheritance to share some operation definitions (make them the same as for some other type), data-is-code can do the same by creating aliases. The main difference is that objects, being commands even in cases where they're primarily used as containers for data, need to be explicitly destroyed and therefore objects add the burden of lifetime management. Pure data that is sometimes treated as code does not have that problem.
An example: Algebraic expressions
An example which illustrates many of these points can be provided by a data structure for simple algebraic expressions. Suppose an expression is a list on one of the forms
With the obvious interpretation of +, -, etc., we find that the expression (2x-1)/(x+3) is represented by the data structure
/ {- {* {const 2} {var x}} {const 1}} {+ {var x} {const 3}}
So far, this is just a data structure; there are no operations which act on it. (The +, -, and friends are node types in the general terminology above.) A very natural operation would however be to evaluate the expression, which can be implemented as follows:
namespace eval algexpr::eval { proc + {term1 term2} { expr {[eval $term1] + [eval $term2]} } proc - {term1 term2} { expr {[eval $term1] - [eval $term2]} } proc * {factor1 factor2} { expr {[eval $factor1] * [eval $factor2]} } proc / {numer denom} { expr {[eval $numer] / double([eval $denom])} } proc const {val} {return $val} proc var {name} { upvar #0 $name val return $val } }
The command to actually evaluate an expression $datum is then
namespace inscope algexpr::eval $datum
(or [namespace eval algexpr::eval $datum], but the syntax of namespace inscope is often more appropriate).
% set datum {/ {- {* {const 2} {var x}} {const 1}} {+ {var x} {const 3}}} % set x 1 % namespace inscope algexpr::eval $datum 0.25 % set x 2 % namespace inscope algexpr::eval $datum 0.6 % set x 3 % namespace inscope algexpr::eval $datum 0.833333333333
Works fine, but it is probably not all that practical to fetch the variable values from global variables. A more natural construction is to provide them in the evaluation call, but that requires a different set of definitions, and is thus a different operation.
namespace eval algexpr::eval2 { proc + {term1 term2 args} { expr {[eval $term1 $args] + [eval $term2 $args]} } proc - {term1 term2 args} { expr {[eval $term1 $args] - [eval $term2 $args]} } proc * {factor1 factor2 args} { expr {[eval $factor1 $args] * [eval $factor2 $args]} } proc / {numer denom args} { expr {[eval $numer $args] / double([eval $denom $args])} } proc const {val args} {return $val} proc var {name args} { foreach {var val} $args { if {$name eq $var} then {return $val} } error "No value specified for '$name'" } }
0.25 % namespace inscope algexpr::eval2 $datum x 2 0.6 % namespace inscope algexpr::eval2 $datum x 3 0.833333333333
variables
% namespace inscope algexpr::eval2 $datum2 x 1 y 2 0.2
extra arguments, which is a bit artificial.)
Evaluating expressions is a rather simple operation however, so what about something trickier, like differentiation? This would mean that
differentiation rules are straightforward to implement:
proc + {term1 term2 var} { list + [eval $term1 [list $var]] [eval $term2 [list $var]] list + [eval $term1 [list $var]] [eval $term2 [list $var]] proc - {term1 term2 var} { list - [eval $term1 [list $var]] [eval $term2 [list $var]] list - [eval $term1 [list $var]] [eval $term2 [list $var]] proc * {factor1 factor2 var} { list + [ list + [ list * [eval $factor1 [list $var]] $factor2 ] [ list * $factor1 [eval $factor2 [list $var]] ] } proc / {factor1 factor2 var} { list / [ list / [ list - [ list * [eval $factor1 [list $var]] $factor2 ] [ list * $factor1 [eval $factor2 [list $var]] ] ] [ list * $factor2 $factor2 ] proc const {val var} { list const 0 list const 0 proc var {name var} { list const [expr {$name eq $var}] list const [expr {$name eq $var}] }
/ {- {* {- {+ {* {const 0} {var x}} {* {const 2} {const 1}}} {const 0}} {+ {var x} {const 3}}} {* {- {* {const 2} {var x}} {const 1}} {+ {const 1} {const 0}}}} {* {+ {var x} {const 3}} {+ {var x} {const 3}}} % namespace inscope algexpr::eval2 $ddx_datum x 2 0.28
What does the $ddx_datum expression look like in more traditional notation, though? Conversion to expr syntax is of course another operation, so now that we're warmed up, it is probably easier to implement it than to decipher the $ddx_datum by hand. The only issue one needs to solve is how to know when to insert parentheses, but that's easily handled by specifying the operation to be to
The following will then do the trick
proc + {term1 term2 priority} { set res "[eval $term1 [list 1]]+[eval $term2 [list 1]]" set res "[eval $term1 [list 1]]+[eval $term2 [list 1]]" if {$priority>1} then { return ($res) } else { return $res } proc - {term1 term2 priority} { set res "[eval $term1 [list 1]]-[eval $term2 [list 2]]" set res "[eval $term1 [list 1]]-[eval $term2 [list 2]]" if {$priority>1} then { return ($res) } else { return $res } proc * {factor1 factor2 priority} { set res "[eval $factor1 [list 2]]*[eval $factor2 [list 2]]" set res "[eval $factor1 [list 2]]*[eval $factor2 [list 2]]" if {$priority>2} then { return ($res) } else { return $res } proc / {numer denom priority} { set res "[eval $numer [list 2]]/[eval $denom [list 3]]" set res "[eval $numer [list 2]]/[eval $denom [list 3]]" if {$priority>2} then { return ($res) } else { return $res } proc var {name priority} { if {[regexp -nocase -- {^[a-z_][a-z0-9_]*$} $name]} then { if {[regexp -nocase -- {^[a-z_][a-z0-9_]*$} $name]} then { return "\$$name" } else { return "\[[list set $name]\]" } proc const {val priority} {return $val} }
(2*$x-1)/($x+3)
((0*$x+2*1-0)*($x+3)-(2*$x-1)*(1+0))/(($x+3)*($x+3))
constant subexpressions that any human would have evaluated during the differentiation operation. One way to get rid of these would be to improve the differentiation operation to do these simplifications on the fly, but another is to implement a more generic operation for partial evaluation of expressions. The actual situations one has to deal with are the same in both cases, so the more generic operation seems preferable.
proc const {val args} {list const $val} proc var {name args} { foreach {var val} $args { foreach {var val} $args { if {$name eq $var} then {return [list const $val]} } return [list var $name] # These two were the central ones, but most of the actual # work remains, since there are many small cases where # partial evaluation is possible. proc + {term1 term2 args} { set res1 [eval $term1 $args] set res1 [eval $term1 $args] foreach {t1 v1} $res1 break set res2 [eval $term2 $args] foreach {t2 v2} $res2 break if {$t1 eq "const"} then { if {$t2 eq "const"} then { list const [expr {$v1 + $v2}] } elseif {$v1==0} then { set res2 } else { list + $res2 $res1 } } elseif {$t2 eq "const" && $v2==0} then { set res1 } else { list + $res1 $res2 } proc - {term1 term2 args} { set res1 [eval $term1 $args] set res1 [eval $term1 $args] foreach {t1 v1} $res1 break set res2 [eval $term2 $args] foreach {t2 v2} $res2 break if {$t1 eq "const"} then { if {$t2 eq "const"} then { list const [expr {$v1 - $v2}] } else { list - $res1 $res2 } } elseif {$t2 eq "const" && $v2==0} then { set res1 } else { list - $res1 $res2 } proc * {factor1 factor2 args} { set res1 [eval $factor1 $args] set res1 [eval $factor1 $args] foreach {t1 v1} $res1 break set res2 [eval $factor2 $args] foreach {t2 v2} $res2 break if {$t1 eq "const"} then { if {$t2 eq "const"} then { list const [expr {$v1 * $v2}] } elseif {$v1==0} then { list const 0 } elseif {$v1==1} then { set res2 } else { list * $res1 $res2 } } elseif {$t2 eq "const"} then { if {$v2==0} then { list const 0 } elseif {$v2==1} then { set res1 } else { list * $res2 $res1 } } else { list * $res1 $res2 } proc / {numer denom args} { set res1 [eval $numer $args] set res1 [eval $numer $args] foreach {t1 v1} $res1 break set res2 [eval $denom $args] foreach {t2 v2} $res2 break if {$t2 eq "const"} then { if {$t1 eq "const"} then { list const [expr {$v1 / double($v2)}] } elseif {$v2==1} then { set res1 } else { list / $res1 $res2 } } else { list / $res1 $res2 } }
Some examples:
const 0.6 % namespace inscope algexpr::parteval $datum2 x 2 / {const 3} {+ {var y} {const 3}}
/ {- {* {const 2} {+ {var x} {const 3}}} {- {* {const 2} {var x}} {const 1}}} {* {+ {var x} {const 3}} {+ {var x} {const 3}}} % namespace inscope algexpr::toexpr $ddx_datum_simplified 0 (2*($x+3)-(2*$x-1))/(($x+3)*($x+3))
about the degree expected from a math student is really rather complicated, once you have to instruct a computer to do it, and this representation with binary +, -, *, and / is not very practical for the task. Better solutions are beyond the simple example, however.)
Let us instead consider how to extend the algexpr data structure with a new node type, say exp for the (natural base) exponential function. This requires defining exp procedures in all the namespaces where this node type may be expected:
proc exp {arg} {expr {exp([eval $arg])}} } namespace eval algexpr::eval2 { proc exp {arg args} {expr {exp([eval $arg $args])}} } namespace eval algexpr::diff { proc exp {arg var} { list * [list exp $arg] [eval $arg [list $var]] list * [list exp $arg] [eval $arg [list $var]] } namespace eval algexpr::toexpr { proc exp {arg priority} {return "exp([eval $arg [list 0]])"} } namespace eval algexpr::parteval { proc exp {arg args} { set ares [eval $arg $args] set ares [eval $arg $args] if {[lindex $ares 0] eq "const"} then { list const [expr {exp([lindex $ares 1])}] } else { list exp $ares } }
% namespace inscope algexpr::eval2 $datum3 x 2 y 1 7.38905609893 % set ddy_datum3 [namespace inscope algexpr::diff $datum3 y] * {exp {/ {var x} {var y}}} {/ {- {* {const 0} {var y}} {* {var x} {const 1}}} {* {var y} {var y}}} % namespace inscope algexpr::toexpr $ddy_datum3 0 exp($x/$y)*(0*$y-$x*1)/($y*$y) % set ddy_datum3 [namespace inscope algexpr::parteval $ddy_datum3] * {exp {/ {var x} {var y}}} {/ {- {const 0} {var x}} {* {var y} {var y}}} % namespace inscope algexpr::toexpr $ddy_datum3 0 exp($x/$y)*(0-$x)/($y*$y)
Sharing commands The examples above never gave an opportunity to use the same command for more than one thing, but using interp alias or namespace import to make one command appear in several places is a standard Tcl programming technique that is explained elsewhere. There are however some special issues one should take note of:
There is a rather simple solution to the namespace problem, however: use uplevel instead of eval! This works because it causes a child node to be interpreted as a command in the same context as the parent node was, so if for example the child and parent nodes are of the same type then they will both be handled by the same command.
If this still seem mysterious to you, then remember that namespace eval and kin (e.g. namespace inscope) adds a level visible to uplevel. The following provides an example:
proc ::a {} { puts "- I'm ::a, called as [info level 0]." return ::a } proc ::foo::a {} { puts "- I'm ::foo::a, called as [info level 0]." return ::foo::a } proc ::b {} { puts -nonewline "- I'm ::b, called as [info level 0]; " puts "my namespace is [namespace current]." puts "::b says: I've just called [uplevel 1 a]." } interp alias {} ::foo::b {} ::b
- I'm ::b, called as b; my namespace is ::. - I'm ::a, called as a. ::b says: I've just called ::a.
and the call [namespace inscope foo b] prints
- I'm ::b, called as ::b; my namespace is ::. - I'm ::foo::a, called as a. ::b says: I've just called ::foo::a.
because b calls the a visible in the namespace from which b itself was called, even if that isn't where b resides.
An alternative approach to the namespace problem, which can be useful for generic utility commands (that via aliases with some extra arguments can implement a whole score of commands), is to supply the intended namespace as an extra argument. Then if one defines
set leftval [namespace eval $ns $left $args] set rightval [namespace eval $ns $right $args] expr \$leftval $op \$rightval }
can all be given as aliases to this exprbinop:
interp alias {} algexpr::eval::- {} exprbinop algexpr::eval - interp alias {} algexpr::eval::* {} exprbinop algexpr::eval * interp alias {} algexpr::eval2::+ {} exprbinop algexpr::eval2 + interp alias {} algexpr::eval2::- {} exprbinop algexpr::eval2 - interp alias {} algexpr::eval2::* {} exprbinop algexpr::eval2 *
predict what level to uplevel to.
Traversals with state Sometimes, it is important to maintain a state when traversing a data structure, and update that state for each node processed. That each node is processed in a separate local context makes it trickier to maintain such a state than it would be if a single procedure call did all the work, but there are some approaches one can use:
Thanks for your very detailed data is code page!
My best example for data is code was my way to read VRML files, to import their structure and their geometrical data:
Such becomes VRML data executable code.
NEM 12 Mar 2007: Thanks, Lars, this is a useful technique to document. I've only skimmed at [ Category Concept | Category Tutorial | Category Data Structure | Category Control Structure | Category Design | Category Mathematics ]