'''Tcl Performance - taking it to the next level'''

The purpose of this page is to provide an anchor point to discuss some
half-formed (half-baked?) ideas that various Tcl'ers (including
[Miguel Sofer], [Donal Fellows] and [Kevin Kenny]) discussed, largely
in the hallway and the bar, at the 2007 Tcl conference in New Orleans.

The basic idea is that Miguel's work on bytecode engine performance,
variable reform, and so on, can only go so far.  Eventually some more
radical reforms will be needed.  These reforms can't be undertaken
lightly. For one thing, they will break compatibility with ''tbcload''
when they are introduced (although it is possible that a code
translator can restore compatibility). Nevertheless, if Tcl is to go
much faster, bytecode reform looks like nearly the only path
forward. We have nearly exhausted the improvements gained by
bytecoding individual commands; indeed, a useful subset of Tcl
procedures now converts to 100% bytecode, with no callouts to command
procedures.

Some of the possible improvements are fairly easy and benign:

   * Replace the stack-oriented bytecodes with a register machine. Others report that register-oriented bytecode engines are significantly faster than stack-oriented ones.

   * Replace bytecodes with wordcodes. Bytecode interpretation involves a fair amount of misaligned memory access; being able to store, say, 32-bit integers on a 32-byte boundary eliminates a fair amount of loading and shifting.

   * Replace the "big switch statement" with threaded code.  This part needs to be undertaken fairly carefully; it's not clear that threading yields better performance in all cases.

   * There are several other reforms that can be tried in this general area.  Miguel has some ideas for "command reform" similar to the recently-committed "variable reform", for instance.

Beyond that, we get into a realm where we need to begin to consider the
safety of optimisations.  The next big win looks to appear from the
unboxing of local variables; essentially, in the (overwhelmingly
common) case where a variable local to a procedure has no aliases and
is not traced, we can eliminate the variable object
altogether. Together with that, the overhead of checking for traces
goes away, as does the overhead of repeated resolution of the variable
name (the caching of the mapping from variable name to variable object
is less effective than it might be).

For a concrete example, let's examine a procedure for computing the
inner product of two vectors:

    proc dot {a b} {
        set prod 0.0
        foreach x $a y $b {
            set prod [expr {$prod + $x*$y}]
        }
        return $prod
    }

What we need to do, to promote local variables to registers, is to
ascertain that

   * Particular blocks of code fire no execution or variable traces. At the very least, traces that do fire must not violate any of these rules with respect to the variables in question.
   * Particular blocks of code establish no new execution or variable traces.
   * Particular blocks of code do not change the semantics of Core commands within the blocks (here, [[set]], [[expr]], [[foreach]] and [[return]]), either by redefinition, overloading in a namespace, or changing command resolution rules.
   * Particular blocks of code do not change the variables to which names resolve: either by aliasing with [[upvar]], [[variable]] or [[global]], or by changing variable name resolvers.
   * There is no opportunity to change a variable in an uncontrolled fashion, for example by [[upvar]] or [[uplevel]] from a called procedure or [[eval]] of unknown code in the current procedure. 

There are no doubt other rules.  It is best to think in terms of proving that code is safe, rather than detecting that it is unsafe.  It may in some cases be possible to defer safety checks to interpretation time, and fall back on deoptimized code if the checks fail.

Once variable are lifted to registers, it becomes possible to think about proving useful assertions about the variable contents.  For instance, in the procedure above, it is fairly easy to prove that the variable of ''prod'' can only be a 'double' - and hence it should be possible to substitute specialized operations that work only on 'double' values, effectively unboxing the value twice (once from the variable and once from the Tcl_Obj).

Finally, if we can infer types on a significant fraction of the operations, it would become profitable to think of generating direct machine code. If this can be done well, it would be quite a ''tour de force;'' compiling very high level languages without resorting to extensive programmer-supplied assertions is seldom feasible.  It does look to be (just barely) feasible for Tcl, thanks to Tcl's considerable regularities.

All of this, of course, emerges from idle speculation at a conference. (Conferences are a good time to dream big dreams, and a less good time to commit to execution.)  But perhaps some of the participants can comment further in this space.
----
[Category Internals]