(Page begun by [Lars H], 11 March 2007; feel free to expand.) '''Data is code''' is a powerful programming technique for working with complicated data, especially tree-like data. The idea is that the data structure itself can also be regarded as a command, so that traversing a data structure can simply be to [eval]uate it! By providing different definitions (of the basic node types) in different [namespace]s, it is possible to implement a variety of different operations as evaluations of the same data structure. The idea supports [polymorphism] to about the same degree as Object-oriented Programming, but does so in a quite different way. While it in [OOP] is the data (objects) that have to know all the operations (methods) that may act upon it, data-is-code requires that the operations (in this setting: namespaces) know all the types of data that they may encounter. Both support extensions to about the same degree: * In OOP, a new type can be implemented as a new class, whereas a new operation can be implemented as a new method on the ancestor class. * With data-is-code, a new type is implemented by defining commands for it in all operation namespaces, whereas a new operation is implemented as a new namespace. Where OOP can use inheritance to share some operation definitions (make them the same as for some other type), data-is-code can do the same by creating aliases. The main difference is that objects, being commands even in cases where they're primarily used as containers for data, need to be explicitly destroyed and therefore objects add the burden of lifetime management. Pure data that is sometimes treated as code does not have that problem. [TOOT] is a bit of a [data is code]/[OO] hybrid. Its basic mechanism is to treat data as code (and thus doesn't have the lifetime management problem), but the only operation implemented is OO-style method dispatch. ---- **An example: Algebraic expressions** An example which illustrates many of these points can be provided by a data structure for simple algebraic expressions. Suppose an ''expression'' is a list on one of the forms : '''+''' ''expression'' ''expression'' : '''-''' ''expression'' ''expression'' : '''*''' ''expression'' ''expression'' : '''/''' ''expression'' ''expression'' : '''var''' ''name'' : '''const''' ''value'' With the obvious interpretation of +, -, etc., we find that the expression (2x-1)/(x+3) is represented by the data structure / {- {* {const 2} {var x}} {const 1}} {+ {var x} {const 3}} So far, this is just a data structure; there are no operations which act on it. (The +, -, and friends are node types in the general terminology above.) A very natural operation would however be to evaluate the expression, which can be implemented as follows: namespace eval algexpr::eval { proc + {term1 term2} { expr {[eval $term1] + [eval $term2]} } proc - {term1 term2} { expr {[eval $term1] - [eval $term2]} } proc * {factor1 factor2} { expr {[eval $factor1] * [eval $factor2]} } proc / {numer denom} { expr {[eval $numer] / double([eval $denom])} } proc const {val} {return $val} proc var {name} { upvar #0 $name val return $val } } The command to actually evaluate an expression $datum is then namespace inscope algexpr::eval $datum (or [[namespace eval algexpr::eval $datum]], but the syntax of [namespace inscope] is often more appropriate). % set datum {/ {- {* {const 2} {var x}} {const 1}} {+ {var x} {const 3}}} % set x 1 % namespace inscope algexpr::eval $datum 0.25 % set x 2 % namespace inscope algexpr::eval $datum 0.6 % set x 3 % namespace inscope algexpr::eval $datum 0.833333333333 Works fine, but it is probably not all that practical to fetch the variable values from global variables. A more natural construction is to provide them in the evaluation call, but that requires a different set of definitions, and is thus a different operation. namespace eval algexpr::eval2 { proc + {term1 term2 args} { expr {[eval $term1 $args] + [eval $term2 $args]} } proc - {term1 term2 args} { expr {[eval $term1 $args] - [eval $term2 $args]} } proc * {factor1 factor2 args} { expr {[eval $factor1 $args] * [eval $factor2 $args]} } proc / {numer denom args} { expr {[eval $numer $args] / double([eval $denom $args])} } proc const {val args} {return $val} proc var {name args} { foreach {var val} $args { if {$name eq $var} then {return $val} } error "No value specified for '$name'" } } ====== 0.25 % namespace inscope algexpr::eval2 $datum x 2 0.6 % namespace inscope algexpr::eval2 $datum x 3 0.833333333333 ====== variables ====== % namespace inscope algexpr::eval2 $datum2 x 1 y 2 0.2 ====== extra arguments, which is a bit artificial.) Evaluating expressions is a rather simple operation however, so what about something trickier, like differentiation? This would mean that ====== ====== differentiation rules are straightforward to implement: ====== proc + {term1 term2 var} { list + [eval $term1 [list $var]] [eval $term2 [list $var]] list + [eval $term1 [list $var]] [eval $term2 [list $var]] proc - {term1 term2 var} { list - [eval $term1 [list $var]] [eval $term2 [list $var]] list - [eval $term1 [list $var]] [eval $term2 [list $var]] proc * {factor1 factor2 var} { list + [ list + [ list * [eval $factor1 [list $var]] $factor2 ] [ list * $factor1 [eval $factor2 [list $var]] ] } proc / {factor1 factor2 var} { list / [ list / [ list - [ list * [eval $factor1 [list $var]] $factor2 ] [ list * $factor1 [eval $factor2 [list $var]] ] ] [ list * $factor2 $factor2 ] proc const {val var} { list const 0 list const 0 proc var {name var} { list const [expr {$name eq $var}] list const [expr {$name eq $var}] } ====== ====== / {- {* {- {+ {* {const 0} {var x}} {* {const 2} {const 1}}} {const 0}} {+ {var x} {const 3}}} {* {- {* {const 2} {var x}} {const 1}} {+ {const 1} {const 0}}}} {* {+ {var x} {const 3}} {+ {var x} {const 3}}} % namespace inscope algexpr::eval2 $ddx_datum x 2 0.28 ====== What does the $ddx_datum expression look like in more traditional notation, though? Conversion to [expr] syntax is of course another operation, so now that we're warmed up, it is probably easier to implement it than to decipher the $ddx_datum by hand. The only issue one needs to solve is how to know when to insert parentheses, but that's easily handled by specifying the operation to be to * convert an algexpr to [expr] form such that it can safely be made an operand of an expr-operation with priority ''priority'', where 0 means no operation, 1 means +, 2 means *, and 3 is as denominator. The following will then do the trick ====== proc + {term1 term2 priority} { set res "[eval $term1 [list 1]]+[eval $term2 [list 1]]" set res "[eval $term1 [list 1]]+[eval $term2 [list 1]]" if {$priority>1} then { return ($res) } else { return $res } proc - {term1 term2 priority} { set res "[eval $term1 [list 1]]-[eval $term2 [list 2]]" set res "[eval $term1 [list 1]]-[eval $term2 [list 2]]" if {$priority>1} then { return ($res) } else { return $res } proc * {factor1 factor2 priority} { set res "[eval $factor1 [list 2]]*[eval $factor2 [list 2]]" set res "[eval $factor1 [list 2]]*[eval $factor2 [list 2]]" if {$priority>2} then { return ($res) } else { return $res } proc / {numer denom priority} { set res "[eval $numer [list 2]]/[eval $denom [list 3]]" set res "[eval $numer [list 2]]/[eval $denom [list 3]]" if {$priority>2} then { return ($res) } else { return $res } proc var {name priority} { if {[regexp -nocase -- {^[a-z_][a-z0-9_]*$} $name]} then { if {[regexp -nocase -- {^[a-z_][a-z0-9_]*$} $name]} then { return "\$$name" } else { return "\[[list set $name]\]" } proc const {val priority} {return $val} } ====== ====== (2*$x-1)/($x+3) ====== ====== ((0*$x+2*1-0)*($x+3)-(2*$x-1)*(1+0))/(($x+3)*($x+3)) ====== constant subexpressions that any human would have evaluated during the differentiation operation. One way to get rid of these would be to improve the differentiation operation to do these simplifications on the fly, but another is to implement a more generic operation for ''partial evaluation'' of expressions. The actual situations one has to deal with are the same in both cases, so the more generic operation seems preferable. ====== proc const {val args} {list const $val} proc var {name args} { foreach {var val} $args { foreach {var val} $args { if {$name eq $var} then {return [list const $val]} } return [list var $name] # These two were the central ones, but most of the actual # work remains, since there are many small cases where # partial evaluation is possible. proc + {term1 term2 args} { set res1 [eval $term1 $args] set res1 [eval $term1 $args] foreach {t1 v1} $res1 break set res2 [eval $term2 $args] foreach {t2 v2} $res2 break if {$t1 eq "const"} then { if {$t2 eq "const"} then { list const [expr {$v1 + $v2}] } elseif {$v1==0} then { set res2 } else { list + $res2 $res1 } } elseif {$t2 eq "const" && $v2==0} then { set res1 } else { list + $res1 $res2 } proc - {term1 term2 args} { set res1 [eval $term1 $args] set res1 [eval $term1 $args] foreach {t1 v1} $res1 break set res2 [eval $term2 $args] foreach {t2 v2} $res2 break if {$t1 eq "const"} then { if {$t2 eq "const"} then { list const [expr {$v1 - $v2}] } else { list - $res1 $res2 } } elseif {$t2 eq "const" && $v2==0} then { set res1 } else { list - $res1 $res2 } proc * {factor1 factor2 args} { set res1 [eval $factor1 $args] set res1 [eval $factor1 $args] foreach {t1 v1} $res1 break set res2 [eval $factor2 $args] foreach {t2 v2} $res2 break if {$t1 eq "const"} then { if {$t2 eq "const"} then { list const [expr {$v1 * $v2}] } elseif {$v1==0} then { list const 0 } elseif {$v1==1} then { set res2 } else { list * $res1 $res2 } } elseif {$t2 eq "const"} then { if {$v2==0} then { list const 0 } elseif {$v2==1} then { set res1 } else { list * $res2 $res1 } } else { list * $res1 $res2 } proc / {numer denom args} { set res1 [eval $numer $args] set res1 [eval $numer $args] foreach {t1 v1} $res1 break set res2 [eval $denom $args] foreach {t2 v2} $res2 break if {$t2 eq "const"} then { if {$t1 eq "const"} then { list const [expr {$v1 / double($v2)}] } elseif {$v2==1} then { set res1 } else { list / $res1 $res2 } } else { list / $res1 $res2 } } ====== Some examples: ====== const 0.6 % namespace inscope algexpr::parteval $datum2 x 2 / {const 3} {+ {var y} {const 3}} ====== ====== / {- {* {const 2} {+ {var x} {const 3}}} {- {* {const 2} {var x}} {const 1}}} {* {+ {var x} {const 3}} {+ {var x} {const 3}}} % namespace inscope algexpr::toexpr $ddx_datum_simplified 0 (2*($x+3)-(2*$x-1))/(($x+3)*($x+3)) ====== about the degree expected from a math student is really rather complicated, once you have to instruct a computer to do it, and this representation with binary +, -, *, and / is not very practical for the task. Better solutions are beyond the simple example, however.) Let us instead consider how to extend the algexpr data structure with a new node type, say '''exp''' for the (natural base) exponential function. This requires defining exp procedures in all the namespaces where this node type may be expected: ====== proc exp {arg} {expr {exp([eval $arg])}} } namespace eval algexpr::eval2 { proc exp {arg args} {expr {exp([eval $arg $args])}} } namespace eval algexpr::diff { proc exp {arg var} { list * [list exp $arg] [eval $arg [list $var]] list * [list exp $arg] [eval $arg [list $var]] } namespace eval algexpr::toexpr { proc exp {arg priority} {return "exp([eval $arg [list 0]])"} } namespace eval algexpr::parteval { proc exp {arg args} { set ares [eval $arg $args] set ares [eval $arg $args] if {[lindex $ares 0] eq "const"} then { list const [expr {exp([lindex $ares 1])}] } else { list exp $ares } } ====== ====== % namespace inscope algexpr::eval2 $datum3 x 2 y 1 7.38905609893 % set ddy_datum3 [namespace inscope algexpr::diff $datum3 y] * {exp {/ {var x} {var y}}} {/ {- {* {const 0} {var y}} {* {var x} {const 1}}} {* {var y} {var y}}} % namespace inscope algexpr::toexpr $ddy_datum3 0 exp($x/$y)*(0*$y-$x*1)/($y*$y) % set ddy_datum3 [namespace inscope algexpr::parteval $ddy_datum3] * {exp {/ {var x} {var y}}} {/ {- {const 0} {var x}} {* {var y} {var y}}} % namespace inscope algexpr::toexpr $ddy_datum3 0 exp($x/$y)*(0-$x)/($y*$y) ====== ---- **Sharing commands** The examples above never gave an opportunity to use the same command for more than one thing, but using [interp alias] or [namespace import] to make one command appear in several places is a standard Tcl programming technique that is explained elsewhere. There are however some special issues one should take note of: Avoid colons in node types: Colons interact strangely with the namespace separator ::, so you may be unable to make use of, or even speak about, the command you want. Be aware of the namespace context: If you import or alias a proc into a namespace, then the namespace context of this proc does ''not'' follow. For set-ups such as the above, where the namespace context is an essential foundation for the interpretation, this can be disastrous. There is a rather simple solution to the namespace problem, however: use [uplevel] instead of [eval]! This works because it causes a child node to be interpreted as a command in the same context as the parent node was, so if for example the child and parent nodes are of the same type then they will both be handled by the same command. If this still seem mysterious to you, then remember that [namespace eval] and kin (e.g. [namespace inscope]) adds a level visible to [uplevel]. The following provides an example: ====== proc ::a {} { puts "- I'm ::a, called as [info level 0]." return ::a } proc ::foo::a {} { puts "- I'm ::foo::a, called as [info level 0]." return ::foo::a } proc ::b {} { puts -nonewline "- I'm ::b, called as [info level 0]; " puts "my namespace is [namespace current]." puts "::b says: I've just called [uplevel 1 a]." } interp alias {} ::foo::b {} ::b ====== - I'm ::b, called as b; my namespace is ::. - I'm ::a, called as a. ::b says: I've just called ::a. and the call [[namespace inscope foo b]] prints - I'm ::b, called as ::b; my namespace is ::. - I'm ::foo::a, called as a. ::b says: I've just called ::foo::a. because '''b''' calls the '''a''' visible in the namespace from which '''b''' itself was called, even if that isn't where '''b''' resides. An alternative approach to the namespace problem, which can be useful for generic utility commands (that via aliases with some extra arguments can implement a whole score of commands), is to supply the intended namespace as an extra argument. Then if one defines ====== set leftval [namespace eval $ns $left $args] set rightval [namespace eval $ns $right $args] expr \$leftval $op \$rightval } ====== can all be given as aliases to this exprbinop: ====== interp alias {} algexpr::eval::- {} exprbinop algexpr::eval - interp alias {} algexpr::eval::* {} exprbinop algexpr::eval * interp alias {} algexpr::eval2::+ {} exprbinop algexpr::eval2 + interp alias {} algexpr::eval2::- {} exprbinop algexpr::eval2 - interp alias {} algexpr::eval2::* {} exprbinop algexpr::eval2 * ====== predict what level to uplevel to. ---- **Handling unknown node types** In some cases, the set of possible node types is not known from start, but all except a known set of nodes are to be given default processing (which is typically to do nothing here, but recursively process all the children). This is a problem for the [namespace inscope] technique, since there is no way to stop Tcl from looking outside the namespace for command definitions: [namespace path] can be used to make Tcl look in additional namespaces, and [namespace unknown] can be used to override the generic [unknown], but there is no way to prevent commands in the :: namespace from being found as legitimate implementations of a the operation on a node. If [list], [set], [expr], or the like can occur as node types then this means you're in trouble. One way to get around this could be to make sure that the :: namespace is empty, by using an [empty interpreter] like so: ====== dispatch hide namespace dispatch invokehidden namespace delete :: ====== : '''dispatch invokehidden namespace inscope''' ''namespace'' ''node'' ?''arg'' ...? but this is rather complicated. As of Tcl 8.5, a better approach is to use an ensemble to do the dispatch, since this does not look outside its given namespace. The set-up closest to that of [namespace inscope] is to do ====== namespace export * namespace ensemble create -prefix 0 } ====== ====== ====== ====== ====== ====== ====== ====== ====== A problem with this set-up is however that it is less obvious how to make recursive calls for children; an [eval] or [uplevel] on a child will not reinvoke the ensemble (thus losing the protection against unknown node types). Hardcoding the fully qualified name of the ensemble where one needs to make a recursive call is one option, but that gets inconvenient for nested namespaces. When the ensemble (as is the default) has the same name as its namespace, its name can be obtained as [[namespace current]], e.g. ====== expr {[[namespace current] {*}$term1 {*}$args] + [[namespace current] {*}$term2 {*}$args]} expr {[[namespace current] {*}$term1 {*}$args] + [[namespace current] {*}$term2 {*}$args]} ====== ====== proc + {term1 term2 args} { expr {["" {*}$term1 {*}$args] + ["" {*}$term2 {*}$args]} expr {["" {*}$term1 {*}$args] + ["" {*}$term2 {*}$args]} proc - {term1 term2 args} { expr {["" {*}$term1 {*}$args] - ["" {*}$term2 {*}$args]} expr {["" {*}$term1 {*}$args] - ["" {*}$term2 {*}$args]} proc * {factor1 factor2 args} { expr {["" {*}$factor1 {*}$args] * ["" {*}$factor2 {*}$args]} expr {["" {*}$factor1 {*}$args] * ["" {*}$factor2 {*}$args]} proc / {numer denom args} { expr {["" {*}$numer {*}$args] / double(["" {*}$denom {*}$args])} expr {["" {*}$numer {*}$args] / double(["" {*}$denom {*}$args])} proc const {val args} {return $val} proc var {name args} { foreach {var val} $args { foreach {var val} $args { if {$name eq $var} then {return $val} } error "No value specified for '$name'" namespace export ?* namespace ensemble create -command [namespace current]:: \ -prefix 0 -unknown [list ::default_to_type +] -prefix 0 -unknown [list ::default_to_type +] } proc default_to_type {type ensemble args} {list $ensemble $type} ====== The [interp alias] is so that the ensemble can still be called by external users as if it had the same name as its namespace. Using ?* as export pattern means all commands in the namespace ''except'' the ensemble command itself are available as node types. The '''-unknown''' handler means all unknown node types are treated as if they were '''+''': ====== 3 % algexpr::eval2 - {const 1} {var x} x 2 -1 % algexpr::eval2 / {const 1} {var x} x 2 0.5 % algexpr::eval2 & {const 1} {var x} x 2 3 ====== ---- **Traversals with state** Sometimes, it is important to maintain a state when traversing a data structure, and update that state for each node processed. That each node is processed in a separate local context makes it trickier to maintain such a state than it would be if a single procedure call did all the work, but there are some approaches one can use: 1. Use a global or namespace variable. Inelegant, but often easy and good enough. 2. Use a local variable that is accessed using [upvar]. Gets into trouble if [uplevel] or [namespace eval] is being used as shown in the previous section. 3. Pass the previous state as a command argument and return it as the command result. This is the [functional programming] idiom. ---- **Collecting children in a sublist or placing them directly the node list** The data in a node can be categorised as: 1. Node type. 2. Annotations (data not in the form of a node). 3. Children (data in node format). In the algebraic examples above, the '''const''' and '''var''' nodes have annotations (value and variable names respectively) but no children, whereas the other node types have children but no annotations. The [grammar_fa] example (see bullet list below) is similar, but adds the complication that some nodes can have an arbitrary number of children. In the [parsetcl] example, all nodes have two pieces of annotation and in addition many have a variable number of children. Most general is the data structure employed by [a little XML parser], since a typical node in that case can have an arbitrary number of annotations and an arbitrary number of children. The issue considered here is how to best design the list structure for the various node types. If the number of annotations and children is determined by the node type, then it probably seems natural to just lay them out in a flat structure ====== ====== proc definition ====== ====== messages for free. It is ''possible'' to use this design also for node types with a variable number of children, since the proc definition can just as easily be ====== ====== it is ''better'' to have just one argument which is the list of children, like so: ====== ====== arguments (like '''eval2''', '''parteval''', '''diff''', and '''toexpr''' above)! Consider what would happen if + nodes looked like : '''+''' ''?term ...?'' (variadic addition is actually more natural for algebra). The '''eval''' operation is straightforward ====== set sum 0 foreach term $args { set sum [expr {$sum + [eval $term]}] set sum [expr {$sum + [eval $term]}] return $sum } ====== exceptional final argument holding the differentiation variable name ====== set var [lindex $args end] set res [list +] foreach term [lrange $args 0 end-1] { lappend res [eval $term [list $var]] lappend res [eval $term [list $var]] return $res } ====== is no sure way to distinguish variable-value arguments from child node arguments. This problem does not arise if the syntax had instead been : '''sum''' ''term-list'' (the '''+''' node type is already assigned to binary addition) since the implementations are then ====== set res [list sum] foreach term $terms { lappend res [eval $term [list $var]] lappend res [eval $term [list $var]] return $res } proc algexpr::eval2::sum {terms args} { set sum 0 foreach term $terms { set sum [expr {$sum + [eval $term $args]}] set sum [expr {$sum + [eval $term $args]}] return $sum } ====== that is a dictionary of annotations and another that is the list of children. After that can follow any number of operation arguments, and it is easy to see which is which. Still, it sometimes happens that one has to implement operations on data in a legacy format, with the list of children expanded into the node list. What can one do then? One possibility is to first implement an operation converting the data to a child-list format, because that operation doesn't need arguments, and then perform the wanted operation on the converted data. Another possibility is to design the argument structure for the operation so that there is an explicit end-of-children argument, like -- to signal end of switches. In this case, the right form of an end-of-children argument is an empty list, since every child must be a list of length at least 1. In that case, one could define ====== set n 0 foreach term $args { if {[llength $term] < 1} then {break} if {[llength $term] < 1} then {break} incr n set sum 0 foreach term [lreplace $args $n end] { set sum [expr {$sum + [eval $term [lrange $args $n end]]}] set sum [expr {$sum + [eval $term [lrange $args $n end]]}] return $sum } ====== ====== ====== is not the same operation as '''eval2'''. ---- **Discussion** [male] - 2007-03-12: Thanks for your very detailed [data is code] page! My best example for [data is code] was my way to read VRML files, to import their structure and their geometrical data: 1. Read the file data 2. Convert the file data to a valid list 3. Reorganize the file data list to have all VRML node attributes in sub lists 4. Evaluate this list with the scheme, that each VRML node name could be the name of a tcl command or procedure. If such a tcl procedure exists, than evaluate it with the VRML node attributes as arguments, if not ignore this VRML node. Such becomes VRML data executable code. ---- [NEM] ''12 Mar 2007'': Thanks, Lars, this is a useful technique to document. I've only skimmed at present, so apologies if some of these remarks are covered. An alternative to using a namespace would be to just use a [switch] (or some form of [pattern matching]) inside a proc. For instance, the first expression evaluator might be written: ====== proc algexpr::calc expr { switch -exact -- $op { + { expr {[calc $a] + [calc $b]} } + { expr {[calc $a] + [calc $b]} } - { expr {[calc $a] - [calc $b]} } * { expr {[calc $a] * [calc $b]} } / { expr {[calc $a] / double([calc $b])} } const { return $a } var { upvar #0 $a v; return $v } } ====== ''[Lars H]: The only catch in that is that it doesn't really do [data is code], as the data isn't treated as code; in a sense, it is precisely the more traditional technique that data-is-code provides an alternative to!'' [NEM]: Right, but what are the advantages of manipulating data as Tcl code as opposed to just treating it as a bit of data? Speed? Leveraging the existing Tcl interpreter code? The above proc is a bare-bones interpreter for a minimal language of arithmetic expressions, so its data is also code. Code ''is'' just data, after all. (The "real" dichotomy would be between data and processes, where code could be seen as data that describes a process). [Lars H]: Speed and leveraging existing mechanisms are indeed reasons why this is useful. Another reason is extensionality (how do you tell algexpr::calc about the '''exp''' node type?). As for "[code is data]"... well, that would be the subject of another page (every now and then I mix the two up too). [NEM]: Code really is just data... But anyway, to the other points: I'm not so sure about the speed, although I'm prepared to believe it for more complex examples. For example, transforming XML data into Tcl code could result in a quite fast parser, but it's tricky to do correctly, and there are of course issues with trust -- Tcl is a much more powerful language than a custom little interpreter. As to the question of extensibility, it is actually fairly simple to extend a switch-based interpreter without altering the original with a little forethought. All you need to do is to use open-recursion and an explicit fixpoint operator (here we add support for environments too; [dict] required): ====== lassign $expr op a b switch -exact -- $op { + { expr {[call $self $a $env] + [call $self $b $env]} } + { expr {[call $self $a $env] + [call $self $b $env]} } - { expr {[call $self $a $env] - [call $self $b $env]} } * { expr {[call $self $a $env] * [call $self $b $env]} } / { expr {[call $self $a $env] / double([call $self $b $env])} } const { return $a } var { dict get $env $a } default { error "unknown op \"$op\"" } } # fixpoint operator (similar to Y combinator): proc call {cmd args} { uplevel #0 $cmd [linsert $args 0 $cmd] } ====== [TOOT] dispatch): ====== / {- {* {const 2} {var x}} {const 1}} {+ {var x} {const 3}} % call calc $datum {x 1} 0.25 ====== extending an OO program: ====== lassign $expr op arg switch -exact -- $op { exp { expr {exp([call $self $arg $env])} } exp { expr {exp([call $self $arg $env])} } default { calc $self $expr $env } } ====== ====== exp {/ {var x} {var y}} % call calcexp $datum3 {x 2 y 1} 7.38905609893065 ====== language have to get it: ====== unknown op "exp" ====== [NEM] (A few minutes later): It's also quite simple to extend the language in different directions. For example (and because it's fun!), here's how to add lexically-scoped [lambda] expressions to the language (something Tcl itself doesn't have built-in): ====== lassign $expr op a b switch -exact -- $op { lambda { list closure $a $b $env } lambda { list closure $a $b $env } app { calcapp $self [call $self $a $env] [call $self $b $env] } default { calc $self $expr $env } } proc calcapp {self f x} { lassign $f op arg body env switch -exact -- $op { closure { call $self $body [dict replace $env $arg $x] } closure { call $self $body [dict replace $env $arg $x] } default { error "not a closure: \"$f\"" } } ====== ====== lambda x {lambda y {* {var x} {var y}}} % set exp [list app [list app $f {const 3}] {const 4}] app {app {lambda x {lambda y {* {var x} {var y}}}} {const 3}} {const 4} % call calclam $exp {} 12 ====== [NEM] ''(cont.)'' As you mention at the start, an [OO] approach makes it easy to add new types (just add a new sub-class and implement the operations), but trickier to add new operations, whereas a pattern matching approach makes it easier to add new operations but harder to add new data types (as you need to update all the operations). This is apparently known as the ''expression problem'' in programming language design, and there are a number of ways in which it is tackled. The usual OO way is the [Visitor pattern], which essentially implements pattern matching over OO dispatch. In this case you'd have something like the following (psuedo-code): ====== abstract method visit {visitor} } class Num val { inherit Expr method visit {visitor} { $visitor visitNum $val } } class AddOp a b { inherit Expr method visit {visitor} { $visitor visitAddOp $a $b } } ... class ExprVisitor { abstract method visitNum {val} abstract method visitAddOp {a b} ... } class ExprInterp { inherit ExprVisitor method visitNum {val} { return $val } method visitAddOp {a b} { expr {[$a visit $self] + [$b visit $self]} } ... } [AddOp [AddOp [Num 2] [Num 5]] [Num 4]] visit [ExprInterp] ;# => 11 ====== ---- [LV] One of the examples I've read about, in terms of this topic, has to do with saving the state of a Tcl program. The idea is that one would write [proc]s which would go through and save off the values of any variable, proc, namespace, etc. At first, I thought ''well, only the values which are dynamically set are necessary - the procs defined in the program's file itself need not be saved, nor variables which are set within that file.'' However, given that someone might come along and modify the program, or for that matter, given that you would have to write specialized code to avoid getting those values, it is much easier just to save everything. If done properly, the "save file" should be able to be invoked and the application would be back the way things were (with some possible exceptions, depending on the code, extensions, etc.) when last started. ---- **See also** Examples found "in the wild" of data structures which support the data-is-code paradigm: [parsetcl]: Represents the syntactic structure of Tcl code as a list-tree. A wiki user contributed a set of procs that made the list-tree convert itself back to Tcl code when evaluated! [grammar_fa]: Has a list-tree representation for regular expressions. Node types are '''S''', '''.''', '''|''', '''&''', '''*''', '''?''', '''+''', and '''!'''. [A little XML parser]: Has tree-list representation (inherited from [tdom]) for XML data. Node types are the tags that may occur, and #text for text content. [Geometry]: Represents geometric objects as list-trees, with POINT and VECTOR as leaf node types. Going one step further: [code is data is code] ---- [NEM]: Singular or plural. "Data are codes" sounds weird. ---- !!!!!! %| [Category Concept] | [Category Control Structure] | [Category Data Structure] | [Category Design] | [Category Mathematics] | [Category Tutorial] |% !!!!!!