code is data

Code is data -- that a piece of code is also a piece of data, and can be handled as such -- is one of the hallmarks of dynamic languages. At the machine level this is an obvious consequence of the von Neumann architecture for computers, but few mainstream programming languages have provided any facilities for letting code act on code.

All but the most trivial Tcl programs rely on the fact that code is data, since Tcl's control structures are ordinary commands which take some script(s) as argument(s) and arranges for these to evaluated: if, for, foreach, while, proc, catch, uplevel, ... This is nothing that needs to trouble those who are accustomed to the "reserved words and blocks of statements" model used in C and other languages, because the Tcl control structures look just as if they are built around "blocks of statements" when the arguments are constant strings, but the dynamic power of these structures is available when one needs it -- for example to build new control structures like every:

 proc every {milliseconds body} {
    eval $body
    after $milliseconds [list every $milliseconds $body]
 }
 every 60000 {puts "Another minute just passed."}

Code is data desribes the cozy relationship between code and data in Tcl.

Description

That a piece of code is also a piece of data, and can be handled as such -- is one of the hallmarks of dynamic languages. At the machine level this is an obvious consequence of the von Neumann architecture for computers, but few mainstream programming languages have provided any facilities for letting code act on code.

All but the most trivial Tcl programs rely on the fact that code is data, since Tcl's control structures are ordinary commands which take some script(s) as argument(s) and arranges for these to evaluated: if, for, foreach, while, proc, catch, uplevel, ... This is nothing that needs to trouble those who are accustomed to the "reserved words and blocks of statements" model used in C and other languages, because the Tcl control structures look just as if they are built around "blocks of statements" when the arguments are constant strings, but the dynamic power of these structures is available when one needs it -- for example to build new control structures like every:

proc every {milliseconds body} {
    eval $body
    after $milliseconds [list every $milliseconds $body]
}
every 60000 {puts "Another minute just passed."}

The classical language where code is data is LISP, since (in Tclish terms) any LISP command is also the list of the words of that command, and can be accessed as the elements of any other list. Tcl is formally just as powerful since everything is a string, but it is usually not practical to have Tcl code take Tcl scripts apart and modified, since there are few facilities available out of the box for handling Tcl scripts at a higher level than as a string of characters (there are packages for higher level handling, however). What is straightforward to do is:

  • storing code anywhere data can be stored,
  • manufacturing code from arbitrary data (list-quoting), and
  • combining pieces of code into larger pieces of code.

The net effect is that existing code is mostly opaque to Tcl scripts, but can be used and manipulated.


NEM: Code is, of course, data as far as the interpreter or compiler is concerned. Most mainstream programming languages are sufficiently expressive to allow you to write a compiler or interpreter in them, so they all have the potential to treat code as data. Numerous ways of representing code exist, such as ASTs, S-expressions, strings, etc. The interesting questions then are:

  • Can the language represent its own code as data? (Almost certainly possible)
  • Does the language provide some sort of built-in datatype/operations to construct/manipulate code of the same language? (Usually not)
  • Does the language provide a means of instantiating such constructed code as a process (i.e., running it: eval, compiling it, macros etc)? (Usually not)
  • Does the language provide a means of causally interacting with the current process (i.e., reflection, introspection, etc)? (Usually not)
  • Does the language provide a means of executing the constructed code within the current process? (Usually not)

What is interesting about languages like Lisp, Prolog, Tcl and so on is not that they can represent code as data (any language can), or even that they can represent their own code as data (again, any language can, in principle). The difference is the ease with which you can get hold of the currently running code and manipulate it before it runs. In principle, though, you can write a meta-circular interpreter in any (Turing complete) language. It's just that that is orders of magnitude more effort to do in, say C++, than it would be for Lisp or Tcl. But even in those more complex languages there are still tools that treat code as data: e.g., custom preprocessors such as Qt's MOC and so on.


See also

Tools for transforming Tcl code

Sugar
xbody

Tools for parsing Tcl code

parsetcl
Tcl_ParseCommand and tclparser
A Tcl parser in Tcl
ptparser

Code in other languages than Tcl is data too

little language
critcl
Critcl does C++
Critcl goes Fortran

Other

code is data is code
which realises the LISP ideal of having arbitrary programs being nested lists (and possible to manipulate as such), at the price of transforming them to something straightforwardly evaluatable by Tcl but not quite suited for human eyes.
The Emacs Problem ,Steve Yegge ,2005-02-28
a rant about Emacs, Lisp, XML, HTML. "...That's because they're using languages that simply can't do a good job of representing data as code."

The classical language where code is data is LISP, since (in Tclish terms) any LISP command is also the list of the words of that command, and can be accessed as the elements of any other list. Tcl is formally just as powerful since everything is a string, but it is usually not practical to have Tcl code take Tcl scripts apart and modified, since there are few facilities available out of the box for handling Tcl scripts at a higher level than as a string of characters (there are packages for higher level handling, however). What is straightforward to do is:

  • storing code anywhere data can be stored,
  • manufacturing code from arbitrary data (list-quoting), and
  • combining pieces of code into larger pieces of code.

The net effect is that existing code is mostly opaque to Tcl scripts, but can be used and manipulated.


NEM Code is, of course, data as far as the interpreter or compiler is concerned. Most mainstream programming languages are sufficiently expressive to allow you to write a compiler or interpreter in them, so they all have the potential to treat code as data. Numerous ways of representing code exist, such as ASTs, S-expressions, strings, etc. The interesting questions then are:

  • Can the language represent its own code as data? (Almost certainly possible)
  • Does the language provide some sort of built-in datatype/operations to construct/manipulate code of the same language? (Usually not)
  • Does the language provide a means of instantiating such constructed code as a process (i.e., running it: eval, compiling it, macros etc)? (Usually not)
  • Does the language provide a means of causally interacting with the current process (i.e., reflection, introspection, etc)? (Usually not)
  • Does the language provide a means of executing the constructed code within the current process? (Usually not)

What is interesting about languages like Lisp, Prolog, Tcl and so on is not that they can represent code as data (any language can), or even that they can represent their own code as data (again, any language can, in principle). The difference is the ease with which you can get hold of the currently running code and manipulate it before it runs. In principle, though, you can write a meta-circular interpreter in any (Turing complete) language. It's just that that is orders of magnitude more effort to do in, say C++, than it would be for Lisp or Tcl. But even in those more complex languages there are still tools that treat code as data: e.g., custom preprocessors such as Qt's MOC and so on.


See also

Tools for transforming Tcl code

Tools for parsing Tcl code

Code in other languages than Tcl is data too

Other

  • code is data is code, which realises the LISP ideal of having arbitrary programs being nested lists (and possible to manipulate as such), at the price of transforming them to something straightforwardly evaluatable by Tcl but not quite suited for human eyes.