(Page begun by [Lars H], 11 March 2007; feel free to expand.) '''Data is code''' is a powerful programming technique for working with complicated data, especially tree-like data. The idea is that the data structure itself can also be regarded as a command, so that traversing a data structure can simply be to [eval]ute it! By providing different definitions (of the basic node types) in different [namespace]s, it is possible to implement a variety of different operations as evaluations of the same data structure. The idea supports [polymorphism] to about the same degree as Object-oriented Programming, but does so in a quite different way. While it in [OOP] is the data (objects) that have to know all the operations (methods) that may act upon it, data-is-code requires that the operations (in this setting: namespaces) know all the types of data that they may encounter. Both support extensions to about the same degree: * In OOP, a new type can be implemented as a new class, whereas a new operation can be implemented as a new method on the ancestor class. * With data-is-code, a new type is implemented by defining commands for it in all operation namespaces, whereas a new operation is implemented as a new namespace. Where OOP can use inheritance to share some operation definitions (make them the same as for some other type), data-is-code can do the same by creating aliases. The main difference is that objects, being commands even in cases where they're primarily used as containers for data, need to be explicitly destroyed and therefore objects add the burden of lifetime management. Pure data that is sometimes treated as code does not have that problem. [TOOT] is a bit of a [data is code]/[OO] hybrid. Its basic mechanism is to treat data as code (and thus doesn't have the lifetime management problem), but the only operation implemented is OO-style method dispatch. ---- '''An example: Algebraic expressions''' An example which illustrates many of these points can be provided by a data structure for simple algebraic expressions. Suppose an ''expression'' is a list on one of the forms : '''+''' ''expression'' ''expression'' : '''-''' ''expression'' ''expression'' : '''*''' ''expression'' ''expression'' : '''/''' ''expression'' ''expression'' : '''var''' ''name'' : '''const''' ''value'' With the obvious interpretation of +, -, etc., we find that the expression (2x-1)/(x+3) is represented by the data structure / {- {* {const 2} {var x}} {const 1}} {+ {var x} {const 3}} So far, this is just a data structure; there are no operations which act on it. (The +, -, and friends are node types in the general terminology above.) A very natural operation would however be to evaluate the expression, which can be implemented as follows: namespace eval algexpr::eval { proc + {term1 term2} { expr {[eval $term1] + [eval $term2]} } proc - {term1 term2} { expr {[eval $term1] - [eval $term2]} } proc * {factor1 factor2} { expr {[eval $factor1] * [eval $factor2]} } proc / {numer denom} { expr {[eval $numer] / double([eval $denom])} } proc const {val} {return $val} proc var {name} { upvar #0 $name val return $val } } The command to actually evaluate an expression $datum is then namespace inscope algexpr::eval $datum (or [[namespace eval algexpr::eval $datum]], but the syntax of [namespace inscope] is often more appropriate). % set datum {/ {- {* {const 2} {var x}} {const 1}} {+ {var x} {const 3}}} % set x 1 % namespace inscope algexpr::eval $datum 0.25 % set x 2 % namespace inscope algexpr::eval $datum 0.6 % set x 3 % namespace inscope algexpr::eval $datum 0.833333333333 Works fine, but it is probably not all that practical to fetch the variable values from global variables. A more natural construction is to provide them in the evaluation call, but that requires a different set of definitions, and is thus a different operation. namespace eval algexpr::eval2 { proc + {term1 term2 args} { expr {[eval $term1 $args] + [eval $term2 $args]} } proc - {term1 term2 args} { expr {[eval $term1 $args] - [eval $term2 $args]} } proc * {factor1 factor2 args} { expr {[eval $factor1 $args] * [eval $factor2 $args]} } proc / {numer denom args} { expr {[eval $numer $args] / double([eval $denom $args])} } proc const {val args} {return $val} proc var {name args} { foreach {var val} $args { if {$name eq $var} then {return $val} } error "No value specified for '$name'" } } ====== 0.25 % namespace inscope algexpr::eval2 $datum x 2 0.6 % namespace inscope algexpr::eval2 $datum x 3 0.833333333333 ====== variables ====== % namespace inscope algexpr::eval2 $datum2 x 1 y 2 0.2 ====== extra arguments, which is a bit artificial.) Evaluating expressions is a rather simple operation however, so what about something trickier, like differentiation? This would mean that ====== ====== differentiation rules are straightforward to implement: ====== proc + {term1 term2 var} { list + [eval $term1 [list $var]] [eval $term2 [list $var]] list + [eval $term1 [list $var]] [eval $term2 [list $var]] proc - {term1 term2 var} { list - [eval $term1 [list $var]] [eval $term2 [list $var]] list - [eval $term1 [list $var]] [eval $term2 [list $var]] proc * {factor1 factor2 var} { list + [ list + [ list * [eval $factor1 [list $var]] $factor2 ] [ list * $factor1 [eval $factor2 [list $var]] ] } proc / {factor1 factor2 var} { list / [ list / [ list - [ list * [eval $factor1 [list $var]] $factor2 ] [ list * $factor1 [eval $factor2 [list $var]] ] ] [ list * $factor2 $factor2 ] proc const {val var} { list const 0 list const 0 proc var {name var} { list const [expr {$name eq $var}] list const [expr {$name eq $var}] } ====== ====== / {- {* {- {+ {* {const 0} {var x}} {* {const 2} {const 1}}} {const 0}} {+ {var x} {const 3}}} {* {- {* {const 2} {var x}} {const 1}} {+ {const 1} {const 0}}}} {* {+ {var x} {const 3}} {+ {var x} {const 3}}} % namespace inscope algexpr::eval2 $ddx_datum x 2 0.28 ====== What does the $ddx_datum expression look like in more traditional notation, though? Conversion to [expr] syntax is of course another operation, so now that we're warmed up, it is probably easier to implement it than to decipher the $ddx_datum by hand. The only issue one needs to solve is how to know when to insert parentheses, but that's easily handled by specifying the operation to be to * convert an algexpr to [expr] form such that it can safely be made an operand of an expr-operation with priority ''priority'', where 0 means no operation, 1 means +, 2 means *, and 3 is as denominator. The following will then do the trick ====== proc + {term1 term2 priority} { set res "[eval $term1 [list 1]]+[eval $term2 [list 1]]" set res "[eval $term1 [list 1]]+[eval $term2 [list 1]]" if {$priority>1} then { return ($res) } else { return $res } proc - {term1 term2 priority} { set res "[eval $term1 [list 1]]-[eval $term2 [list 2]]" set res "[eval $term1 [list 1]]-[eval $term2 [list 2]]" if {$priority>1} then { return ($res) } else { return $res } proc * {factor1 factor2 priority} { set res "[eval $factor1 [list 2]]*[eval $factor2 [list 2]]" set res "[eval $factor1 [list 2]]*[eval $factor2 [list 2]]" if {$priority>2} then { return ($res) } else { return $res } proc / {numer denom priority} { set res "[eval $numer [list 2]]/[eval $denom [list 3]]" set res "[eval $numer [list 2]]/[eval $denom [list 3]]" if {$priority>2} then { return ($res) } else { return $res } proc var {name priority} { if {[regexp -nocase -- {^[a-z_][a-z0-9_]*$} $name]} then { if {[regexp -nocase -- {^[a-z_][a-z0-9_]*$} $name]} then { return "\$$name" } else { return "\[[list set $name]\]" } proc const {val priority} {return $val} } ====== ====== (2*$x-1)/($x+3) ====== ====== ((0*$x+2*1-0)*($x+3)-(2*$x-1)*(1+0))/(($x+3)*($x+3)) ====== constant subexpressions that any human would have evaluated during the differentiation operation. One way to get rid of these would be to improve the differentiation operation to do these simplifications on the fly, but another is to implement a more generic operation for ''partial evaluation'' of expressions. The actual situations one has to deal with are the same in both cases, so the more generic operation seems preferable. ====== proc const {val args} {list const $val} proc var {name args} { foreach {var val} $args { foreach {var val} $args { if {$name eq $var} then {return [list const $val]} } return [list var $name] # These two were the central ones, but most of the actual # work remains, since there are many small cases where # partial evaluation is possible. proc + {term1 term2 args} { set res1 [eval $term1 $args] set res1 [eval $term1 $args] foreach {t1 v1} $res1 break set res2 [eval $term2 $args] foreach {t2 v2} $res2 break if {$t1 eq "const"} then { if {$t2 eq "const"} then { list const [expr {$v1 + $v2}] } elseif {$v1==0} then { set res2 } else { list + $res2 $res1 } } elseif {$t2 eq "const" && $v2==0} then { set res1 } else { list + $res1 $res2 } proc - {term1 term2 args} { set res1 [eval $term1 $args] set res1 [eval $term1 $args] foreach {t1 v1} $res1 break set res2 [eval $term2 $args] foreach {t2 v2} $res2 break if {$t1 eq "const"} then { if {$t2 eq "const"} then { list const [expr {$v1 - $v2}] } else { list - $res1 $res2 } } elseif {$t2 eq "const" && $v2==0} then { set res1 } else { list - $res1 $res2 } proc * {factor1 factor2 args} { set res1 [eval $factor1 $args] set res1 [eval $factor1 $args] foreach {t1 v1} $res1 break set res2 [eval $factor2 $args] foreach {t2 v2} $res2 break if {$t1 eq "const"} then { if {$t2 eq "const"} then { list const [expr {$v1 * $v2}] } elseif {$v1==0} then { list const 0 } elseif {$v1==1} then { set res2 } else { list * $res1 $res2 } } elseif {$t2 eq "const"} then { if {$v2==0} then { list const 0 } elseif {$v2==1} then { set res1 } else { list * $res2 $res1 } } else { list * $res1 $res2 } proc / {numer denom args} { set res1 [eval $numer $args] set res1 [eval $numer $args] foreach {t1 v1} $res1 break set res2 [eval $denom $args] foreach {t2 v2} $res2 break if {$t2 eq "const"} then { if {$t1 eq "const"} then { list const [expr {$v1 / double($v2)}] } elseif {$v2==1} then { set res1 } else { list / $res1 $res2 } } else { list / $res1 $res2 } } ====== Some examples: ====== const 0.6 % namespace inscope algexpr::parteval $datum2 x 2 / {const 3} {+ {var y} {const 3}} ====== ====== / {- {* {const 2} {+ {var x} {const 3}}} {- {* {const 2} {var x}} {const 1}}} {* {+ {var x} {const 3}} {+ {var x} {const 3}}} % namespace inscope algexpr::toexpr $ddx_datum_simplified 0 (2*($x+3)-(2*$x-1))/(($x+3)*($x+3)) ====== about the degree expected from a math student is really rather complicated, once you have to instruct a computer to do it, and this representation with binary +, -, *, and / is not very practical for the task. Better solutions are beyond the simple example, however.) Let us instead consider how to extend the algexpr data structure with a new node type, say '''exp''' for the (natural base) exponential function. This requires defining exp procedures in all the namespaces where this node type may be expected: ====== proc exp {arg} {expr {exp([eval $arg])}} } namespace eval algexpr::eval2 { proc exp {arg args} {expr {exp([eval $arg $args])}} } namespace eval algexpr::diff { proc exp {arg var} { list * [list exp $arg] [eval $arg [list $var]] list * [list exp $arg] [eval $arg [list $var]] } namespace eval algexpr::toexpr { proc exp {arg priority} {return "exp([eval $arg [list 0]])"} } namespace eval algexpr::parteval { proc exp {arg args} { set ares [eval $arg $args] set ares [eval $arg $args] if {[lindex $ares 0] eq "const"} then { list const [expr {exp([lindex $ares 1])}] } else { list exp $ares } } ====== ====== % namespace inscope algexpr::eval2 $datum3 x 2 y 1 7.38905609893 % set ddy_datum3 [namespace inscope algexpr::diff $datum3 y] * {exp {/ {var x} {var y}}} {/ {- {* {const 0} {var y}} {* {var x} {const 1}}} {* {var y} {var y}}} % namespace inscope algexpr::toexpr $ddy_datum3 0 exp($x/$y)*(0*$y-$x*1)/($y*$y) % set ddy_datum3 [namespace inscope algexpr::parteval $ddy_datum3] * {exp {/ {var x} {var y}}} {/ {- {const 0} {var x}} {* {var y} {var y}}} % namespace inscope algexpr::toexpr $ddy_datum3 0 exp($x/$y)*(0-$x)/($y*$y) ====== ---- **Sharing commands** '''Sharing commands''' The examples above never gave an opportunity to use the same command for more than one thing, but using [interp alias] or [namespace import] to make one command appear in several places is a standard Tcl programming technique that is explained elsewhere. There are however some special issues one should take note of: Avoid colons in node types: Colons interact strangely with the namespace separator ::, so you may be unable to make use of, or even speak about, the command you want. Be aware of the namespace context: If you import or alias a proc into a namespace, then the namespace context of this proc does ''not'' follow. For set-ups such as the above, where the namespace context is an essential foundation for the interpretation, this can be disastrous. There is a rather simple solution to the namespace problem, however: use [uplevel] instead of [eval]! This works because it causes a child node to be interpreted as a command in the same context as the parent node was, so if for example the child and parent nodes are of the same type then they will both be handled by the same command. If this still seem mysterious to you, then remember that [namespace eval] and kin (e.g. [namespace inscope]) adds a level visible to [uplevel]. The following provides an example: ====== proc ::a {} { puts "- I'm ::a, called as [info level 0]." return ::a } proc ::foo::a {} { puts "- I'm ::foo::a, called as [info level 0]." return ::foo::a } proc ::b {} { puts -nonewline "- I'm ::b, called as [info level 0]; " puts "my namespace is [namespace current]." puts "::b says: I've just called [uplevel 1 a]." } interp alias {} ::foo::b {} ::b ====== - I'm ::b, called as b; my namespace is ::. - I'm ::a, called as a. ::b says: I've just called ::a. and the call [[namespace inscope foo b]] prints - I'm ::b, called as ::b; my namespace is ::. - I'm ::foo::a, called as a. ::b says: I've just called ::foo::a. because '''b''' calls the '''a''' visible in the namespace from which '''b''' itself was called, even if that isn't where '''b''' resides. An alternative approach to the namespace problem, which can be useful for generic utility commands (that via aliases with some extra arguments can implement a whole score of commands), is to supply the intended namespace as an extra argument. Then if one defines ====== set leftval [namespace eval $ns $left $args] set rightval [namespace eval $ns $right $args] expr \$leftval $op \$rightval } ====== can all be given as aliases to this exprbinop: ====== interp alias {} algexpr::eval::- {} exprbinop algexpr::eval - interp alias {} algexpr::eval::* {} exprbinop algexpr::eval * interp alias {} algexpr::eval2::+ {} exprbinop algexpr::eval2 + interp alias {} algexpr::eval2::- {} exprbinop algexpr::eval2 - interp alias {} algexpr::eval2::* {} exprbinop algexpr::eval2 * ====== predict what level to uplevel to. ---- **Handling unknown node types** '''Traversals with state''' Sometimes, it is important to maintain a state when traversing a data structure, and update that state for each node processed. That each node is processed in a separate local context makes it trickier to maintain such a state than it would be if a single procedure call did all the work, but there are some approaches one can use: 1. Use a global or namespace variable. Inelegant, but often easy and good enough. 2. Use a local variable that is accessed using [upvar]. Gets into trouble if [uplevel] or [namespace eval] is being used as shown in the previous section. 3. Pass the previous state as a command argument and return it as the command result. This is the [functional programming] idiom. 3. Pass the previous state as a command argument and return it as the command value. This is the [functional programming] idiom. **Collecting children in a sublist or placing them directly the node list** Thanks for your very detailed [data is code] page! My best example for [data is code] was my way to read VRML files, to import their structure and their geometrical data: 1. Read the file data 2. Convert the file data to a valid list 3. Reorganize the file data list to have all VRML node attributes in sub lists 4. Evaluate this list with the scheme, that each VRML node name could be the name of a tcl command or procedure. If such a tcl procedure exists, than evaluate it with the VRML node attributes as arguments, if not ignore this VRML node. Such becomes VRML data executable code. ---- [NEM] ''12 Mar 2007'': Thanks, Lars, this is a useful technique to document. I've only skimmed at present, so apologies if some of these remarks are covered. An alternative to using a namespace would be to just use a [switch] (or some form of [pattern matching]) inside a proc. For instance, the first expression evaluator might be written: ====== proc algexpr::calc expr { switch -exact -- $op { + { expr {[calc $a] + [calc $b]} } + { expr {[calc $a] + [calc $b]} } - { expr {[calc $a] - [calc $b]} } * { expr {[calc $a] * [calc $b]} } / { expr {[calc $a] / double([calc $b])} } const { return $a } var { upvar #0 $a v; return $v } } ====== ''[Lars H]: The only catch in that is that it doesn't really do [data is code], as the data isn't treated as code; in a sense, it is precisely the more traditional technique that data-is-code provides an alternative to!'' [NEM]: Right, but what are the advantages of manipulating data as Tcl code as opposed to just treating it as a bit of data? Speed? Leveraging the existing Tcl interpreter code? The above proc is a bare-bones interpreter for a minimal language of arithmetic expressions, so its data is also code. Code ''is'' just data, after all. (The "real" dichotomy would be between data and processes, where code could be seen as data that describes a process). [Lars H]: Speed and leveraging existing mechanisms are indeed reasons why this is useful. Another reason is extensionality (how do you tell algexpr::calc about the '''exp''' node type?). As for "[code is data]"... well, that would be the subject of another page (every now and then I mix the two up too). [Lars H]: Speed and leveraging existing mechanisms are indeed reasons why this is useful. Another reason is extensionality (how do you tell algexpr::calc about the '''exp''' node type?). As for "code ''is'' data"... well, that would be the subject of another page (every now and then I mix the two up too). [NEM]: Code really is just data... But anyway, to the other points: I'm not so sure about the sub-class and implement the operations), but trickier to add new operations, whereas a pattern matching approach makes it easier to add new operations but harder to add new data types (as you need to update all the operations). This is apparently known as the ''expression problem'' in programming language design, and there are a number of ways in which it is tackled. The usual OO way is the [Visitor pattern], which essentially implements pattern matching over usual OO way is the Visitor pattern, which essentially implements pattern matching over ====== abstract method visit {visitor} } class Num val { inherit Expr method visit {visitor} { $visitor visitNum $val } } class AddOp a b { inherit Expr method visit {visitor} { $visitor visitAddOp $a $b } } ... class ExprVisitor { abstract method visitNum {val} abstract method visitAddOp {a b} ... } class ExprInterp { inherit ExprVisitor method visitNum {val} { return $val } method visitAddOp {a b} { expr {[$a visit $self] + [$b visit $self]} } ... } [AddOp [AddOp [Num 2] [Num 5]] [Num 4]] visit [ExprInterp] ;# => 11 ====== ---- [LV] One of the examples I've read about, in terms of this topic, has to do with saving the state of a Tcl program. The idea is that one would write [proc]s which would go through and save off the values of any variable, proc, namespace, etc. At first, I thought ''well, only the values which are dynamically set are necessary - the procs defined in the program's file itself need not be saved, nor variables which are set within that file.'' However, given that someone might come along and modify the program, or for that matter, given that you would have to write specialized code to avoid getting those values, it is much easier just to save everything. If done properly, the "save file" should be able to be invoked and the application would be back the way things were (with some possible exceptions, depending on the code, extensions, etc.) when last started. ---- **See also** [[ [Category Concept] | [Category Tutorial] | [Category Data Structure] | [Category Control Structure] | [Category Design] | [Category Mathematics] ]]