if 0 {Richard Suchenwirth 2004-05-19 - From Tcl 8.0, bytecode compilation has remarkably increased execution speed - code (e.g. in proc or while bodies) is no longer interpreted word by word, character by character, but rather converted into a compact sequence of bytes once, and this byte sequence is executed by the Tcl virtual machine, Tcl_ExecuteByteCode(), or TEBC for short, as often as needed. The concept isn't new. BASICS in the 1980s had that, as well as the "p-code" into which UCSD Pascal was converted. But it certainly has advantages over naive interpretation of strings as commands.
I have to admit that Tcl's real bytecodes are still sort of a mystery to me. I looked at the C code that implements and executes it, but I never managed to really work on it (in regular Tcl builds, the bytecode debugging isn't very exposed; and the code set happens to change between minor releases). But one can always play...:) This page contains a tiny implementation of my vague ideas about a bytecode engine, much weaker than the real thing, but possibly educational. Compilation of bytecode (here restricted to the 94 printable ASCII characters) is done manually into a compact reverse-Polish language (see RPN again), in which no spaces are needed to separate "words", as they're all one byte long. This allows for shorter (and more mysterious-looking) code...
The bytecode engine ebc (execute byte code) is stack based. Instructions get their arguments from the stack, and push their result to the stack. Unrelated with this stack, I simulate a "local stack frame" by assigning the arguments to ebc to variables $1..$9 (see half-lambda). Constants are limited to 0..9 in this toy, around higher values you'll need braces; the (n)@ opcode retrieves the (n)-th argument. So Tcl's
lambda {1 2} {expr ($1+$2)/3}
comes out in this bytecode language as
1@2@+3/
You can also assign to such local variables, with the = opcode of course. As in TEBC, expr operators (plus some functions) are available as first-class "commands"; I added some other list commands as needed for the examples at end. Finally, the stack is returned.
One could define commands in this way:
interp alias {} hypot {} ebc 1@d*2@d*+s ;# sqrt($1*$1 + $2*$2)
but of course this toy runs slower than Tcl's real bytecodes.-
Like in Tcl, nesting {} braces can be used to group without substitution. This way, you can push and eval longer strings, e.g.
{puts hello}e
Conditional branching is done with the (i)f opcode:
c{t}{e}f
which, according to the condition c on top of stack, executes either the t(hen) or the e(lse) bytecode embeddedly. What's missing is looping constructs like for, while, foreach. For this, the simple foreach iteration isn't sufficient - one would need a "program counter" that points into the bytecode, and can move forward or back as required, as e.g. in Playing Assembler. }
proc ebc {code args} { foreach {1 2 3 4 5 6 7 8 9} $args break set S {} ;# clear stack set keep 0; set buffer "" foreach byte [split $code ""] { #puts [list $code | $byte | $S | $keep | $buffer] if $keep { if {$byte eq "\}"} { if ![incr keep -1] {push S $buffer; set buffer ""} } if $keep {append buffer $byte} if {$byte eq "\{"} {incr keep} continue } switch -- $byte { 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 {push S $byte} @ {push S [set [pop S]] ;# retrieve numbered argument} + - "-" - * - / - % - > - < { swap S; push S [expr [pop S] $byte [pop S]]} . {swap S; push S [pop S][pop S] ;# string concatenation} = {set [pop S] [pop S] ;# assign to a numbered variable} , {puts -nonewline "[pop S] "} _ {} d {dup S} e {push S [eval [pop S]]} f {set else [pop S];set then [pop S] if [pop S] {push S [ebc $then]} {push S [ebc $else]}} I {swap S; push S [lindex [pop S] [pop S]]} i {push S [expr int([pop S])]} L {push S [llength [pop S]]} p {pop S} r {push S [expr rand()]} S {push S [lsort -real [pop S]]} s {push S [expr sqrt([pop S])]} w {set body [pop S]; set cond [pop S] while {[ebc $cond]} {push S [ebc $body]}} x {ebc [pop S]} \{ {incr keep; set buffer "" ;# start embedded command} [ {push S \uffff ;# push magic marker} ] { # Pop to magic marker and make list set idx [expr [llength $S]-1] for {} {$idx>=0} {incr idx -1} { if {[lindex $S $idx] eq "\uffff"} { set list [lrange $S [expr $idx+1] end] set S [lrange $S 0 [expr $idx-1]] push S $list break } } } default {error "unknown bytecode $byte"} } } set S }
#-- Stack routines:
interp alias {} push {} lappend proc pop Sname { upvar 1 $Sname S if ![llength $S] {error "stack underflow"} K [lindex $S end] [set S [lrange $S 0 end-1]] } proc K {a b} {set a} proc dup Sname {upvar 1 $Sname S; push S [lindex $S end]} proc swap Sname {upvar 1 $Sname S; push S [pop S] [pop S]}
#-- Testing, and usage examples:
proc ? {body expected} { catch {uplevel 1 $body} res if {$res ne $expected} {puts "$body->$res, expected $expected"} } ? {ebc 123++} 6 ;# simple starter ? {ebc +} "stack underflow" ;# deliberate error ? {ebc 1@d* 5} 25 ;# square ? {ebc 1@d*2@d*+s 3 4} 5.0 ;# hypot() ? {ebc 1@dLr*iI {red green blue}} ? ;# random choice of a list element ? {ebc 1@2@+3=3@ 5 7} 12 ;# variable assignment & retrieval interp alias {} sgn {} ebc 1@0>1@0<- ;# rmax's sign function ? {sgn -42} -1 ? {sgn 0} 0 ? {sgn 42} 1 ? {ebc 1@S0I {1.2 1.1 2.3}} 1.1 ;# minimum of a list of numbers ? {ebc 1@S{e}I {1.2 1.1 2.3 0.4}} 2.3 ;# maximum of a list of numbers ? {ebc "12{puts hello}ep+"} 3 ;# embedded code to eval ? {ebc "12>{{yes}}{{no}}f"} no ? {ebc "12<{{yes}}{{no}}f"} yes
if 0 { The test suite should only complain at a random color (I can't predict that ;-), and show "hello" from the puts test.
DKF: Added concatenation and postscript-like list-building operators. Have fun. :)
DKF: Also added a while operator.
RS 2013-10-30 - Re-reading this page after nine years, and found that "ebc" can do more than advertised. (Without me changing any code.) Constants above 9 just need to be braced, and then go neatly on the stack:
% ebc "{10}{32}+" 42
See Brute force meets Goedel for a minimal ebc variant