Tcl Performance - taking it to the next level
The purpose of this page is to provide an anchor point to discuss some half-formed (half-baked?) ideas that various Tcl'ers (including Miguel Sofer, Donal Fellows and Kevin Kenny) discussed, largely in the hallway and the bar, at the 2007 Tcl conference in New Orleans.
The basic idea is that Miguel's work on bytecode engine performance, variable reform, and so on, can only go so far. Eventually some more radical reforms will be needed. These reforms can't be undertaken lightly. For one thing, they will break compatibility with tbcload when they are introduced (although it is possible that a code translator can restore compatibility). Nevertheless, if Tcl is to go much faster, bytecode reform looks like nearly the only path forward. We have nearly exhausted the improvements gained by bytecoding individual commands; indeed, a useful subset of Tcl procedures now converts to 100% bytecode, with no callouts to command procedures.
Some of the possible improvements are fairly easy and benign:
Beyond that, we get into a realm where we need to begin to consider the safety of optimisations. The next big win looks to appear from the unboxing of local variables; essentially, in the (overwhelmingly common) case where a variable local to a procedure has no aliases and is not traced, we can eliminate the variable object altogether. Together with that, the overhead of checking for traces goes away, as does the overhead of repeated resolution of the variable name (the caching of the mapping from variable name to variable object is less effective than it might be).
For a concrete example, let's examine a procedure for computing the inner product of two vectors:
proc dot {a b} { set prod 0.0 foreach x $a y $b { set prod [expr {$prod + $x*$y}] } return $prod }
What we need to do, to promote local variables to registers, is to ascertain that
There are no doubt other rules. It is best to think in terms of proving that code is safe, rather than detecting that it is unsafe. It may in some cases be possible to defer safety checks to interpretation time, and fall back on deoptimized code if the checks fail.
Once variable are lifted to registers, it becomes possible to think about proving useful assertions about the variable contents. For instance, in the procedure above, it is fairly easy to prove that the variable of prod can only be a 'double' - and hence it should be possible to substitute specialized operations that work only on 'double' values, effectively unboxing the value twice (once from the variable and once from the Tcl_Obj).
Finally, if we can infer types on a significant fraction of the operations, it would become profitable to think of generating direct machine code. If this can be done well, it would be quite a tour de force; compiling very high level languages without resorting to extensive programmer-supplied assertions is seldom feasible. It does look to be (just barely) feasible for Tcl, thanks to Tcl's considerable regularities.
All of this, of course, emerges from idle speculation at a conference. (Conferences are a good time to dream big dreams, and a less good time to commit to execution.) But perhaps some of the participants can comment further in this space.
MS has been thinking about the register machine modification; it seems that it might be able to provide a relatively inexpensive way to provide the first-level unboxing for compiled local variables that requires no further analysis. The idea/trick is essentially (recorded here so that it does not go away, and to ellicit commentary):
In this manner access to scalar local variables reduces to R/W access to the corresponding register. The question is what happens in the presence of local arrays, upvars and traces. The idea is:
Note that when upvaring to a foreign Var, if the Var already was VAR_INDIRECT, the pre-existing indirection has to be removed and replaced with the new one.
I know this is vague ...