A critical mindset about policy

JCW: This page was prompted by recent developments (see CriTcl builds C extensions on-the-fly) and occasional discussions over the past years with a number people who are heavily into scripting.

The issue addressed here could be summarized as:

  • All policy issues in software must be scripted
  • Or even more succinct: No poli-C, please!

What I mean by this is that code falls into a number of categories, including: general structure, administration of details, core domain-specific logic, and performance issues. And much more, clearly.

What seems to happen all too frequently is that decisions one makes about how software should behave end up being spread out in many parts of the code. When scripting with some parts coded as C extensions, there is the risk that such choices end up solidifying... in C. When software is deployed to other machines, and when this is done with compiled code, that code may become quite hard to adjust: one has recompile, test locally, re-distribute, and test remotely.

Here's a little case study which illustrates the problem (this is not a reconstruction of facts, but a loose summary of what I think happened):

  1. Long ago, someone coded a "base64" encoding/decoding algorithm in C. Probably because a Tcl implementation was slow (especially in the pre-8x days).
  2. The Tcl code remains useful, i.e. when there is no compiler, on new platforms, for old installations, as documentation, etc.
  3. At some point, someone wanted/needed to add a few options to encoding: maximum line length, and a customizable separator between lines.
  4. The Tcl version is easily changed, the results are instantly available.
  5. The C code takes more work (to code, but especially to re-deploy).
  6. The state today, is a bit of C code which does some things, and a Tcl wrapper around it which does oter things and goes out of its way to make both pure-Tcl and C-optimized scenario's work in the same way.
  7. The pure-Tcl version is now part of Tcllib, and has been further optimized to take advantage of how Tcl works nowadays (some idioms change gradually, once underlying commands become substantially faster).

While the effort is laudable, understandable, and in fact quite logical, it leads to a lot of (IMNSHO) ... "cruft". There is complexity in here which can easily be avoided. Worse still, this complexity will continue to make it hard to add more features - if this ever needed.

What this example illustrates, is what happens all too often: when mixing scripting and C code, the switch to C (usually for performance reasons) ends up being full-scale. Policy decisions, i.e. how to split / join text resulting from base64 encoding, end up being coded in C as well - even though they do not impact performance. Handling of special cases, and errors, often also ends up being fully duplicated in C. The consequence is that a substantial part of extension logic is coded in C, thus killing one of the key benefits of scripting. What if I wanted to separate lines by more than a single character (say "\n ", for presentation purposes perhaps)? Or make different trade-offs w.r.t speed (knowing that bad input cannot happen, for example). The C code can't do it without a recompilation (on each platform), whereas in Tcl it would be trivial.

This could very easily have been avoided. Let's assume that the C code exists solely for performance reasons. The performance, in the case of base64, comes from a bit of mapping and bit-fiddling, to perform the 3-byte vs. 4-byte mapping of strings.

Here's how one "ought to" code the base64 extension, IMNSHO:

  • a support function in C which encodes N*3 bytes into N*4 bytes
  • a support function in C which decodes N*4 bytes into N*3 bytes
  • a support function in C which turns a string into a list of X-byte substrings

The rest, and that includes all option handling, special cases when sizes aren't a multiple of 3 or 4, and line length splitting/joining, can easily be coded in Tcl.

Each of the above 3 C functions is no more than two dozen lines of C code (Tcl arg checking and all). They do just one thing each, and they do it fast. In fact, they will do it faster than the current C implementations, because there is no special-case checking at all. They just race through input strings and construct results in a tight loop. If validation of correct input is an issue, a regexp call can easily be added in the Tcl layer.

End Nov 2001: as ascenc illustrates, now in CritLib - JCW

But base64 is really the tip of the iceberg, and not even an important one. What would really be needed, is a structured and concerted effort to apply the rule of "no policy gets coded in C" to the Tcl and Tk cores. There are huge amounts of C code in the core which reduce the flexibility of scripting: it is not possible to override sub-commands in Tcl ("info source" was hard to fix in the first Tcl-based VFS implementations), the channel design does not (yet?) allow coding channels in Tcl (again, VFS critically depends on this), Tk's "text" widget is huge and has things like a b-tree implementation which is totally inaccessible for any other purpose.

Perl implements far more of itself in Perl (without being a slow system at all). Python does not need channels to be stackable; it uses its standard object protocol to provide a "file interface".

To wrap it all up, I propose to look at scripting and C with a different mindset. Extensions coded in C should not aim to be useful as top-level modules, but only focus on doing small things well. With a Tcl layer wrapped on top, one can then build the API that gets used everywhere. This makes the C code smaller, simpler, and sometimes even faster. Note that even pure-Tcl implementations become simpler: only simple bits of C need to be recoded in Tcl, most of the code was already Tcl anyhow.

It is time to let go of the idea that the center of the universe is C, even for Tcl and Tk themselves. The "CriTcl" package makes it feasible to fully leave that approach behind one day. But even in small ways, reducing the number of policy decisions that are coded in C can quickly pay off, in that C code changes (and fixes, and testing) become far less frequent, and that far more fixes/changes/improvements end up in Tcl.

JCW


RS: Though sometimes I feel so, the center of the universe would surely also not be Tcl/Tk - rather, an empty point around which C(++)?, Tcl, Python, Perl, etc. rotate... Fascinating thoughts, well worth continuing. By separating core functionality from policy/configuration/syntax specifics, work spent on Tk and even Tcl internals could be of interest to the other scripting languages, similarly how they embrace Tk already... (vague visions of "the open source answer to .net"...)

JCW - Ah, that brings back memories of a "scripting kernel" which never materialized. For some delightful reading on programming languages, see [L1 ], which talks about Lisp (and we all know how close Tcl is to Lisp, right?), and which was mentioned as reference on [L2 ], which is about Python, which in turn came from the most recent Python-URL! [L3 ] ... It makes one wonder what evolution will do to computer languages in another 10..20 years. What is the holy grail? Is there a way to rise above languages, somehow? (Is there life after death?)

Stephen Trier - Maybe this is obvious, but Jean-Claude's writings above could serve as the manifesto for Tcl 9. Aggresively refactoring to code policy choices in Tcl and a performance toolkit in C would offer much room for new innovations (like VFS) and would help prevent the brittleness that is risked if the focus is only on adding features and fixes.