Version 2 of Playing bytecode

Updated 2004-05-21 22:11:06

if 0 {Richard Suchenwirth 2004-05-19 - From Tcl 8.0, bytecode compilation has remarkably increased execution speed - code (e.g. in proc or while bodies) is no longer interpreted word by word, character by character, but rather converted into a compact sequence of bytes once, and this byte sequence is executed by a "virtual machine", Tcl_ExecuteByteCode(), or TEBC for short, as often as needed. The concept isn't new. BASICs in the 1980s had that, as well as the "p-code" into which UCSD Pascal was converted. But it certainly has advantages over naive interpretation of strings as commands.

I have to admit that Tcl's real bytecodes are still sort of a mystery to me. I looked at the C code that implements and executes it, but I never managed to really work on it (in regular Tcl builds, the bytecode debugging isn't very exposed; and the code set happens to change between minor releases). But one can always play...:) This page contains a tiny implementation of my vague ideas about a bytecode engine, much weaker than the real thing, but possibly educational. Compilation of bytecode (here restricted to the 94 printable ASCII characters) is done manually into a compact reverse-Polish language (see RPN again), in which no spaces are needed to separate "words", as they're all one byte long. This allows for shorter (and more mysterious-looking) code...

The bytecode engine ebc (execute byte code) is stack based. Instructions get their arguments from the stack, and push their result to the stack. Unrelated with this stack, I simulate a "local stack frame" by assigning the arguments to ebc to variables $1..$9 (see half-lambda). Constants are limited to 0..9 in this toy; the (n)@ opcode retrieves the (n)-th argument. So Tcl's

   lambda {1 2} {expr ($1+$2)/3}

comes out in this bytecode language as

   [email protected]@+3/

You can also assign to such local variables, with the = opcode of course. As in TEBC, expr operators (plus some functions) are available as first-class "commands"; I added some other list commands as needed for the examples at end. Finally, the stack is returned.

One could define commands in this way:

 interp alias {} hypot {} ebc [email protected]*[email protected]*+s  ;# sqrt($1*$1 + $2*$2)

but of course this toy runs slower than Tcl's real bytecodes.-

Like in Tcl, nesting {} braces can be used to group without substitution. This way, you can push and eval longer strings, e.g.

 {puts hello}e

Conditional branching is done with the (i)f opcode:

 c{t}{e}f

which, according to the condition c on top of stack, executes either the t(hen) or the e(lse) bytecode embeddedly. What's missing is looping constructs like for, while, foreach. For this, the simple foreach iteration isn't sufficient - one would need a "program counter" that points into the bytecode, and can move forward or back as required, as e.g. in Playing Assembler. }

 proc ebc {code args} {
    foreach {1 2 3 4 5 6 7 8 9} $args break
    set S {} ;# clear stack
    set keep 0; set buffer ""
    foreach byte [split $code ""] {
        #puts [list $code | $byte | $S | $keep | $buffer]
        if $keep {
            if {$byte eq "\}"} {
                if ![incr keep -1] {push S $buffer; set buffer ""}
            }
            if $keep {append buffer $byte}
            if {$byte eq "\{"} {incr keep}
            continue
        }
        switch -- $byte {
            0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 {push S $byte}
            @ {push S [set [pop S]] ;# retrieve numbered argument}
            + - "-" - * - / - % - > - < {
                swap S; push S [expr [pop S] $byte [pop S]]}
            . {swap S; push S [pop S][pop S] ;# string concatenation}
            = {set [pop S] [pop S]  ;# assign to a numbered variable}
            , {puts -nonewline "[pop S] "}
            _ {}
            d {dup S}
            e {push S [eval [pop S]]}
            f {set else [pop S];set then [pop S]
               if [pop S] {push S [ebc $then]} {push S [ebc $else]}}
            I {swap S; push S [lindex [pop S] [pop S]]}
            i {push S [expr int([pop S])]}
            L {push S [llength [pop S]]}
            p {pop S}
            r {push S [expr rand()]}
            S {push S [lsort -real [pop S]]}
            s {push S [expr sqrt([pop S])]}
            x {ebc [pop S]}
            \{ {incr keep; set buffer "" ;# start embedded command}
            default {error "unknown bytecode $byte"}
        }
    }
    set S
 }

#-- Stack routines:

 interp alias {} push {} lappend
 proc pop Sname {
    upvar 1 $Sname S
    if ![llength $S] {error "stack underflow"}
    K [lindex $S end] [set S [lrange $S 0 end-1]]
 }
 proc K    {a b} {set a}
 proc dup  Sname {upvar 1 $Sname S; push S [lindex $S end]}
 proc swap Sname {upvar 1 $Sname S; push S [pop S] [pop S]}

#-- Testing, and usage examples:

 proc ? {body expected} {
    catch {uplevel 1 $body} res
    if {$res ne $expected} {puts "$body->$res, expected $expected"}
 }
 ? {ebc 123++}  6                     ;# simple starter
 ? {ebc +}      "stack underflow"     ;# deliberate error
 ? {ebc [email protected]* 5} 25                    ;# square
 ? {ebc [email protected]*[email protected]*+s 3 4} 5.0           ;# hypot()
 ? {ebc [email protected]*iI {red green blue}} ?  ;# random choice of a list element
 ? {ebc [email protected]@[email protected] 5 7} 12             ;# variable assignment & retrieval
 interp alias {} sgn {} ebc [email protected]>[email protected]<- ;# rmax's sign function
 ? {sgn -42} -1
 ? {sgn 0}    0
 ? {sgn 42}   1
 ? {ebc [email protected] {1.2 1.1 2.3}} 1.1       ;# minimum of a list of numbers
 ? {ebc [email protected]{e}I {1.2 1.1 2.3 0.4}} 2.3 ;# maximum of a list of numbers

 ? {ebc "12{puts hello}ep+"} 3         ;# embedded code to eval
 ? {ebc "12>{{yes}}{{no}}f"} no
 ? {ebc "12<{{yes}}{{no}}f"} yes

if 0 { The test suite should only complain at a random color (I can't predict that ;-), and show "hello" from the puts test.


Arts and crafts of Tcl-Tk programming }