Version 0 of Playing bytecode

Updated 2004-05-20 10:40:14

if 0 {Richard Suchenwirth 2004-05-19 - From Tcl 8.0, bytecode compilation has remarkably increased execution speed - code (e.g. in proc or while bodies) is no longer interpreted word by word, character by character, but rather converted into a compact sequence of bytes once, and this byte sequence is executed by a "virtual machine", Tcl_ExecuteByteCode(), or TEBC for short, as often as needed. The concept isn't new. BASICs in the 1980s had that, as well as the "p-code" into which UCSD Pascal was converted. But it certainly has advantages over naive interpretation of strings as commands.

I have to admit that Tcl's real bytecodes are still sort of a mystery to me. I looked at the C code that implements and executes it, but I never managed to really work on it (in regular Tcl builds, the bytecode debugging isn't very exposed; and the code set happens to change between minor releases). But one can always play...:) This page contains a tiny implementation of my vague ideas about a bytecode engine, much weaker than the real thing, but possibly educational. Compilation of bytecode (here restricted to the 94 printable ASCII characters) is done manually into a compact reverse-Polish language (see RPN again), in which no spaces are needed to separate "words", as they're all one byte long. This allows for shorter (and more mysterious-looking) code...

The bytecode engine ebc (execute byte code) is stack based. Instructions get their arguments from the stack, and push their result to the stack. Unrelated with this stack, I simulate a "local stack frame" by assigning the arguments to ebc to variables $1..$9 (see half-lambda). Constants are limited to 0..9 in this toy; the (n)@ opcode retrieves the (n)-th argument. So Tcl's

   lambda {1 2} {expr ($1+$2)/3}

comes out in this bytecode language as

   1@2@+3/

You can also assign to such local variables, with the = opcode of course. As in TEBC, expr operators (plus some functions) are available as first-class "commands"; I added some other list commands as needed for the examples at end. Finally, the stack is returned.

One could define commands in this way:

 interp alias {} hypot {} ebc 1@d*2@d*+s  ;# sqrt($1*$1 + $2*$2)

but of course this toy runs slower than Tcl's real bytecodes.- What's missing is looping constructs or even conditional branching. For this, the simple foreach iteration isn't sufficient - one would need a "program counter" that points into the bytecode, and can move forward or back as required, as e.g. in Playing Assembler.

Another feature of the real TEBC seems to be that it can fall back to pure interpretation of a string, e.g. for eval - I've tried to approximate this with the {...} construct below. }

 proc ebc {code args} {
    foreach {1 2 3 4 5 6 7 8 9} $args break
    set ::S {} ;# clear stack
    set keep 0
    foreach byte [split $code ""] {
        if {$keep && $byte ne "\}"} {append buffer $byte; continue}
        switch -- $byte {
            0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 {push $byte}
            @ {push [set [pop]] ;# retrieve numbered argument}
            + - "-" - * - / - % - > {swap; push [expr [pop] $byte [pop]]}
            = {set [pop] [pop]  ;# assign to a numbered variable}
            d {dup}
            I {swap; push [lindex [pop] [pop]]}
            i {push [expr int([pop])]}
            L {push [llength [pop]]}
            r {push [expr rand()]}
            S {push [lsort -real [pop]]}
            s {push [expr sqrt([pop])]}
            \{ {set keep 1; set buffer "" ;# start embedded command}
            \} {set keep 0; eval $buffer  ;# execute embedded command}
            default {error "unknown bytecode $byte"}
        }
    }
    set ::S
 }

#-- Stack routines:

 interp alias {} push {} lappend ::S
 proc pop  {}    {
    if ![llength $::S] {error "stack underflow"}
    K [lindex $::S end] [set ::S [lrange $::S 0 end-1]]
 }
 proc K    {a b} {set a}
 proc dup  {}    {push [lindex $::S end]}
 proc swap {}    {push [pop] [pop]}

#-- Testing, and usage examples:

 proc ? {body expected} {
    catch {uplevel 1 $body} res
    if {$res ne $expected} {puts "$body->$res, expected $expected"}
 }
 ? {ebc 123++}  6                    ;# simple starter
 ? {ebc +}      "stack underflow"    ;# deliberate error
 ? {ebc 1@d* 5} 25                   ;# square
 ? {ebc 1@d*2@d*+s 3 4} 5.0          ;# hypot()
 ? {ebc 1@dLr*iI {red green blue}} ? ;# random choice of a list element
 ? {ebc 1@2@+3=3@ 5 7} 12            ;# variable assignment & retrieval
 interp alias {} sgn {} ebc 1@0>01@>- ;# rmax's sign function
 ? {sgn -42} -1
 ? {sgn 0}    0
 ? {sgn 42}   1
 ? {ebc 1@S0I {1.2 1.1 2.3}} 1.1     ;# minimum of a list of numbers
 ? {ebc "12{puts hello}+"} 3         ;# embedded code to eval

if 0 {


Arts and crafts of Tcl-Tk programming }