Playing bytecode

if 0 {Richard Suchenwirth 2004-05-19 - From Tcl 8.0, bytecode compilation has remarkably increased execution speed - code (e.g. in proc or while bodies) is no longer interpreted word by word, character by character, but rather converted into a compact sequence of bytes once, and this byte sequence is executed by the Tcl virtual machine, Tcl_ExecuteByteCode(), or TEBC for short, as often as needed. The concept isn't new. BASICS in the 1980s had that, as well as the "p-code" into which UCSD Pascal was converted. But it certainly has advantages over naive interpretation of strings as commands.

I have to admit that Tcl's real bytecodes are still sort of a mystery to me. I looked at the C code that implements and executes it, but I never managed to really work on it (in regular Tcl builds, the bytecode debugging isn't very exposed; and the code set happens to change between minor releases). But one can always play...:) This page contains a tiny implementation of my vague ideas about a bytecode engine, much weaker than the real thing, but possibly educational. Compilation of bytecode (here restricted to the 94 printable ASCII characters) is done manually into a compact reverse-Polish language (see RPN again), in which no spaces are needed to separate "words", as they're all one byte long. This allows for shorter (and more mysterious-looking) code...

The bytecode engine ebc (execute byte code) is stack based. Instructions get their arguments from the stack, and push their result to the stack. Unrelated with this stack, I simulate a "local stack frame" by assigning the arguments to ebc to variables $1..$9 (see half-lambda). Constants are limited to 0..9 in this toy, around higher values you'll need braces; the (n)@ opcode retrieves the (n)-th argument. So Tcl's

lambda {1 2} {expr ($1+$2)/3}

comes out in this bytecode language as

1@2@+3/

You can also assign to such local variables, with the = opcode of course. As in TEBC, expr operators (plus some functions) are available as first-class "commands"; I added some other list commands as needed for the examples at end. Finally, the stack is returned.

One could define commands in this way:

interp alias {} hypot {} ebc 1@d*2@d*+s  ;# sqrt($1*$1 + $2*$2)

but of course this toy runs slower than Tcl's real bytecodes.-

Like in Tcl, nesting {} braces can be used to group without substitution. This way, you can push and eval longer strings, e.g.

{puts hello}e

Conditional branching is done with the (i)f opcode:

c{t}{e}f

which, according to the condition c on top of stack, executes either the t(hen) or the e(lse) bytecode embeddedly. What's missing is looping constructs like for, while, foreach. For this, the simple foreach iteration isn't sufficient - one would need a "program counter" that points into the bytecode, and can move forward or back as required, as e.g. in Playing Assembler. }

 proc ebc {code args} {
    foreach {1 2 3 4 5 6 7 8 9} $args break
    set S {} ;# clear stack
    set keep 0; set buffer ""
    foreach byte [split $code ""] {
        #puts [list $code | $byte | $S | $keep | $buffer]
        if $keep {
            if {$byte eq "\}"} {
                if ![incr keep -1] {push S $buffer; set buffer ""}
            }
            if $keep {append buffer $byte}
            if {$byte eq "\{"} {incr keep}
            continue
        }
        switch -- $byte {
            0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 {push S $byte}
            @ {push S [set [pop S]] ;# retrieve numbered argument}
            + - "-" - * - / - % - > - < {
                swap S; push S [expr [pop S] $byte [pop S]]}
            . {swap S; push S [pop S][pop S] ;# string concatenation}
            = {set [pop S] [pop S]  ;# assign to a numbered variable}
            , {puts -nonewline "[pop S] "}
            _ {}
            d {dup S}
            e {push S [eval [pop S]]}
            f {set else [pop S];set then [pop S]
               if [pop S] {push S [ebc $then]} {push S [ebc $else]}}
            I {swap S; push S [lindex [pop S] [pop S]]}
            i {push S [expr int([pop S])]}
            L {push S [llength [pop S]]}
            p {pop S}
            r {push S [expr rand()]}
            S {push S [lsort -real [pop S]]}
            s {push S [expr sqrt([pop S])]}
            w {set body [pop S]; set cond [pop S]
               while {[ebc $cond]} {push S [ebc $body]}}
            x {ebc [pop S]}
            \{ {incr keep; set buffer "" ;# start embedded command}
            [ {push S \uffff ;# push magic marker}
            ] { # Pop to magic marker and make list
                set idx [expr [llength $S]-1]
                for {} {$idx>=0} {incr idx -1} {
                    if {[lindex $S $idx] eq "\uffff"} {
                        set list [lrange $S [expr $idx+1] end]
                        set S [lrange $S 0 [expr $idx-1]]
                        push S $list
                        break
                    }
                }
            }
            default {error "unknown bytecode $byte"}
        }
    }
    set S
 }

#-- Stack routines:

 interp alias {} push {} lappend
 proc pop Sname {
    upvar 1 $Sname S
    if ![llength $S] {error "stack underflow"}
    K [lindex $S end] [set S [lrange $S 0 end-1]]
 }
 proc K    {a b} {set a}
 proc dup  Sname {upvar 1 $Sname S; push S [lindex $S end]}
 proc swap Sname {upvar 1 $Sname S; push S [pop S] [pop S]}

#-- Testing, and usage examples:

 proc ? {body expected} {
    catch {uplevel 1 $body} res
    if {$res ne $expected} {puts "$body->$res, expected $expected"}
 }
 ? {ebc 123++}  6                     ;# simple starter
 ? {ebc +}      "stack underflow"     ;# deliberate error
 ? {ebc 1@d* 5} 25                    ;# square
 ? {ebc 1@d*2@d*+s 3 4} 5.0           ;# hypot()
 ? {ebc 1@dLr*iI {red green blue}} ?  ;# random choice of a list element
 ? {ebc 1@2@+3=3@ 5 7} 12             ;# variable assignment & retrieval
 interp alias {} sgn {} ebc 1@0>1@0<- ;# rmax's sign function
 ? {sgn -42} -1
 ? {sgn 0}    0
 ? {sgn 42}   1
 ? {ebc 1@S0I {1.2 1.1 2.3}} 1.1       ;# minimum of a list of numbers
 ? {ebc 1@S{e}I {1.2 1.1 2.3 0.4}} 2.3 ;# maximum of a list of numbers

 ? {ebc "12{puts hello}ep+"} 3         ;# embedded code to eval
 ? {ebc "12>{{yes}}{{no}}f"} no
 ? {ebc "12<{{yes}}{{no}}f"} yes

if 0 { The test suite should only complain at a random color (I can't predict that ;-), and show "hello" from the puts test.

DKF: Added concatenation and postscript-like list-building operators. Have fun. :)

DKF: Also added a while operator.


RS 2013-10-30 - Re-reading this page after nine years, and found that "ebc" can do more than advertised. (Without me changing any code.) Constants above 9 just need to be braced, and then go neatly on the stack:

 % ebc "{10}{32}+"
 42

See Brute force meets Goedel for a minimal ebc variant


Arts and crafts of Tcl-Tk programming }