Version 29 of dis2asm

Updated 2013-11-29 17:42:40 by suchenwi

Richard Suchenwirth 2013-11-28 - Since Tcl 8.0, proc bodies are byte-compiled. Since 8.5, we can use the tcl::unsupported::disassemble command to see how it is done in the Tcl Assembly language, TAL. Since 8.6, there is also a sort-of converse function tcl::unsupported::assemble to turn a symbolic assembler notation into bytecode.

However, there are quite many inequalities between the language that is produced by disassemble (call it "dis"), and the language accepted by assemble (call it "asm"). "The TAL that can't be assembled is not the real TAL", a Daoist might say..

Some instruction names can be translated (e.g. "incrScalar1Imm" in "dis" corresponds to "incrImm" in "asm", see the normalize function below), for others I haven't found a way yet. "done" (something like return) in "dis" can be ignored when at the end of the code, but not in the middle of it. foreach winds up as two instructions in "dis", but I haven't found matches for them in "asm".

Though ideally, one would expect to be able to feed the disassemble output into assemble to get the same results...

The following code tries to bridge that gap. It is far from feature-complete and will need much more work. But it at least works in several simple cases (see the examples at bottom), and will be extended to cover more. In any case, the "proof of the pudding" is:

  • write a proc, e.g. "f", in Tcl
  • disassemble its body
  • use dis2asm to convert the disassembly to TAL (input) language
  • test it by calling it (only then the assembly takes place), see if it works like before

Some shortcuts to start with:

 interp alias {} asm    {} ::tcl::unsupported::assemble
 interp alias {} disasm {} ::tcl::unsupported::disassemble

I have extended the aproc function, which before just returned the disassembly, to accept an extra -x flag to "reassemble" the disassembly, and eval it as a proc, so you can test it just by calling it. The original disassembly code is also shown in a comment.

proc aproc {name argl body args} {
    proc $name $argl $body
    set res [disasm proc $name]
    if {"-x" in $args} {
        set res [list proc $name [add_locals $argl $res] [list asm [dis2asm $res]]]
        eval $res
    }
    return $res
}

This workaround scans the disassembly for local vars that are not in the argument lists, and appends them to it as optionals, defaulting to "". Otherwise, reassembly fails with

 cannot use this instruction to create a variable in a non-proc context

Note that the awk language has the same mechanism for "declaring" local variables since a long time...

proc add_locals {argl disasm} {
    foreach line [split $disasm \n] {
        if [regexp {slot.+scalar, "(.+)"} $line -> local] {
            lappend argl [list $local ""]
        }
    }
    return $argl
}
 proc dis2asm body {
    set res ""
    set jumptargets {}
    foreach line [split $body \n] {
        if [regexp {\# pc (\d+)} $line -> pc] {lappend jumptargets $pc}
    }
    foreach line [split $body \n] {
        set line [string trim $line]
        if {$line eq ""} continue
        set code ""
        if {[regexp {\((\d+)\) (.+)} $line -> pc instr]} {
            if {$pc in $jumptargets} {
                append res "\n label L$pc;"
            }
            if {[regexp {(.+)#(.+)} $instr -> instr comment]} {
                set arg [lindex $comment end]
                if {$arg eq ""} {set arg "{}"}
                if [string match jump* $instr] {set arg L$arg}
            } else {set arg ""}
            set instr0 [normalize [lindex $instr 0]]
            if {$instr0 in {invokeStk}} {set arg [lindex $instr end]}
            if {$instr0 in {incrImm}} {set arg [list $arg [lindex $instr end]]}
            set code [format " %-24s" "$instr0 $arg"]
            if {$instr0 in {startCommand}} {set code ""}
            append res "\n  $code ;# [string trim $line]"
        }
    }
    append res \n
    return $res
 }

This translates "dis" instruction names to "asm" instruction names, where different:

 proc normalize instr {
    regsub {\d+$} $instr "" instr ;# strip off length indicator
    set instr [string map {
        loadScalar load done "" nop "" storeScalar store
        incrScalar1Imm incrImm
    } $instr]
    return $instr
 }

Now to try it out.

% aproc f x {expr {sqrt($x)+1}} -x
proc f x {asm {
   push tcl::mathfunc::sqrt ;# (0) push1 0         # "tcl::mathfunc::sqrt"
   load x                   ;# (2) loadScalar1 %v0         # var "x"
   invokeStk 2              ;# (4) invokeStk1 2
   push 1                   ;# (6) push1 1         # "1"
   add                      ;# (8) add
                            ;# (9) done}}
% f 2
2.414213562373095

The following test shows that while mostly we have to remove parts of the disassembly, for jump targets we need to insert label pseudo-instructions:

% aproc f x {if {$x==1} {set x 2};return $x} -x
proc f x {asm {
   load x                   ;# (0) loadScalar1 %v0         # var "x"
   push 1                   ;# (2) push1 0         # "1"
   eq                       ;# (4) eq
   jumpFalse L13            ;# (5) jumpFalse1 +8         # pc 13
   push 2                   ;# (7) push1 1         # "2"
   store x                  ;# (9) storeScalar1 %v0         # var "x"
   jump L15                 ;# (11) jump1 +4         # pc 15
 label L13;
   push {}                  ;# (13) push1 2         # ""
 label L15;
   pop                      ;# (15) pop
   load x                   ;# (16) loadScalar1 %v0         # var "x"
                            ;# (18) done}}
% f 3
3
% f 1
2
% aproc hypot {x y} {expr {sqrt($x**2+$y**2)}} -x
proc hypot {x y} {asm {
   push tcl::mathfunc::sqrt ;# (0) push1 0         # "tcl::mathfunc::sqrt"
   load x                   ;# (2) loadScalar1 %v0         # var "x"
   push 2                   ;# (4) push1 1         # "2"
   expon                    ;# (6) expon
   load y                   ;# (7) loadScalar1 %v1         # var "y"
   push 2                   ;# (9) push1 1         # "2"
   expon                    ;# (11) expon
   add                      ;# (12) add
   invokeStk 2              ;# (13) invokeStk1 2
   tryCvtToNumeric          ;# (15) tryCvtToNumeric
                            ;# (16) done}}
% hypot 3 4
5.0
% aproc f x {incr x -1} -x
proc f x {asm {
   incrImm x -1             ;# (0) incrScalar1Imm %v0 -1        # var "x"
                            ;# (3) done}}
% f 5
4

Local variables seem only to be possible if they are, possibly with default values, in the argument list. See also my "add_locals" workaround above. Given that, while loops can already be handled:

 % aproc f {x {i 0}} {while {$i <= $x} {puts $i; incr i}} -x
proc f {x {i 0}} {asm {
   jump L22                 ;# (0) jump1 +22         # pc 22
 label L2;
   push puts                ;# (2) push1 0         # "puts"
   load i                   ;# (4) loadScalar1 %v1         # var "i"
   invokeStk 2              ;# (6) invokeStk1 2
   pop                      ;# (8) pop
   ;# (9) startCommand +12 1         # next cmd at pc 21
   incrImm i +1             ;# (18) incrScalar1Imm %v1 +1         # var "i"
   pop                      ;# (21) pop
 label L22;
   load i                   ;# (22) loadScalar1 %v1         # var "i"
   load x                   ;# (24) loadScalar1 %v0         # var "x"
   le                       ;# (26) le
   jumpTrue L2              ;# (27) jumpTrue1 -25         # pc 2
   push {}                  ;# (29) push1 1         # ""
                            ;# (31) done
}}

... or, what also works, to push the name of a local variable and use (load|store)Stk instructions:

% proc f x {asm {
   push i;
   push 42;
   storeStk; # "i" := "42"
   pop;      # needed - otherwise stack is unbalanced in the end
   push i;
   loadStk;
   dup;
   add
}}
% f x
84

But plain "load" didn't work here (again, now fixed with the "add_locals" workaround above):

% proc f x {asm {push i;push 42;storeStk;load i; dup; add}}

% f x
cannot use this instruction to create a variable in a non-proc context

What doesn't work yet

The following illustrates the issue of mid-code "done" (see (9)).. we can't convert it to input TAL. Maybe a "jump" to a label at the end might help? More specifically, when a "done" is met in non-final position, it shall be converted to "jump done", and a "label done" added at the end.

% aproc f x {if {$x > 0} {return 1} else {return 0}} -x
proc f x {asm {
   load x                   ;# (0) loadScalar1 %v0         # var "x"
   push 0                   ;# (2) push1 0         # "0"
   gt                       ;# (4) gt
   jumpFalse L12            ;# (5) jumpFalse1 +7         # pc 12
   push 1                   ;# (7) push1 1         # "1"
                            ;# (9) done
                            ;# (10) nop
                            ;# (11) nop
 label L12;
   push 0                   ;# (12) push1 0         # "0"
                            ;# (14) done
                            ;# (15) done}}
% f 1
inconsistent stack depths on two execution paths

The above error came from a Tclkit 8.6.1 on a Lubuntu netbook. The same example on tclsh 8.6b2 on Win XP works there (with the solution I had in mind - jump to final "done" ;^)

% aproc f x {if {$x > 0} {return 1} else {return 0}} -x

proc f x {asm {
   load x                   ;# (0) loadScalar1 %v0         # var "x"
   push 0                   ;# (2) push1 0         # "0"
   gt                       ;# (4) gt
   jumpFalse L21            ;# (5) jumpFalse1 +16         # pc 21
   ;# (7) startCommand +12 1         # next cmd at pc 19
   push 1                   ;# (16) push1 1         # "1"
                            ;# (18) done
   jump L33                 ;# (19) jump1 +14         # pc 33
 label L21;
   ;# (21) startCommand +12 1         # next cmd at pc 33
   push 0                   ;# (30) push1 0         # "0"
                            ;# (32) done
 label L33;
                            ;# (33) done
}}
37 % f 3
1

Especially the "dis" language generated for foreach looks so different that at the moment I'm only puzzled. The language accepted by "asm" has no instruction beginning with "foreach"...:

137 % aproc f x {foreach i $x {puts $i}}
ByteCode 0x01513B48, refCt 1, epoch 5, interp 0x00950B70 (epoch 5)
  Source "foreach i $x {puts $i}"
  Cmds 2, src 22, inst 29, litObjs 2, aux 1, stkDepth 2, code/src 0.00
  Proc 0x01523EE8, refCt 1, args 1, compiled locals 4
      slot 0, scalar, arg, "x"
      slot 1, scalar, temp
      slot 2, scalar, temp
      slot 3, scalar, "i"
  Exception ranges 1, depth 1:
      0: level 0, loop, pc 17-22, continue 10, break 26
  Commands 2:
      1: pc 0-27, src 0-21        2: pc 17-22, src 14-20
  Command 1: "foreach i $x {puts $i}"
    (0) loadScalar1 %v0         # var "x"
    (2) storeScalar1 %v1         # temp var 1
    (4) pop 
    (5) foreach_start4 0 
                [data=[%v1], loop=%v2
                 it%v1        [%v3]]
    (10) foreach_step4 0 
                [data=[%v1], loop=%v2
                 it%v1        [%v3]]
    (15) jumpFalse1 +11         # pc 26
  Command 2: "puts $i"
    (17) push1 0         # "puts"
    (19) loadScalar1 %v3         # var "i"
    (21) invokeStk1 2 
    (23) pop 
    (24) jump1 -14         # pc 10
    (26) push1 1         # ""
    (28) done