Version 0 of dis2asm gets better

Updated 2013-11-30 20:02:20 by suchenwi

Richard Suchenwirth 2013-11-30 - Another chapter in the dis2asm saga: The Tcl compiler converts proc bodies into bytecode. With tcl::unsupported::disassemble we can inspect the generated code in assembler notation "dis" . With tcl::unsupported::assemble we can convert a (somewhat different) assembler notation "TAL" to bytecode again. The job of dis2asm is to convert a string in "dis" to another string in TAL so the original proc works equally well.

Looking at the dis2asm output, we sometimes notice that it produces code that could be better - in other words, "optimized": doing the same job in less bytecodes, and hence (at least marginally) less time. Examples:

   push {}
   pop

is an absolutely redundant piece of code: first something is pushed on the stack, and then immediately popped off again. No effect, but 3 bytes in bytecode, and just a little more time needed to run. Or

   jump Lxx
 label Lyy;
   jump Lzz

is also redundant: because the code above jumps away, Lyy can only be reached by code that explicitly jumps there - only to be redirected to Lzz again. It might as well have directly jumped to Lzz.

The code shown on this page deals with such issues. It is a postprocessor which converts dis2asm output to another string in the same TAL language, but optimized where possible. To test it, I have extended the aproc wrapper to accept a -o flag and if present, runs the optimizer on the TAL output:

proc aproc {name argl body args} {
    proc $name $argl $body
    set res [disasm proc $name]
    if {"-x" in $args} {
        set res [list proc $name $argl [list asm [dis2asm $res]]]
        if {"-o" in $args} {set res [optimize $res]}
        eval $res
    }
    return $res
}

The optimize proc is a little longer than that, even though it currently just handles the two cases discussed above. It splits the TAL input in a list of lines, so it can also operate on other than the current line. Lines considered redundant are first marked with the prefix "#o", and removed after one pass, so that indexes don't get confused.

proc optimize tal {
    set last  ""
    set last2 ""
    set lines [split $tal \n]
    for {set i 0} {$i < [llength $lines]} {incr i} {
        set instr [regexp -inline {[A-Za-z0-9_]+} [lindex $lines $i]]
        if {"$last $instr" eq "push pop"} {
            lset lines $i-1 #o[lindex $lines $i-1] ;# mark for deletion
            lset lines $i   #o[lindex $lines $i]
        }
        set last $instr
    }
    while 1 {
        set tmp {} ;# remove marked lines
        foreach line $lines {if ![string match #o* $line] {lappend tmp $line}}
        set lines $tmp
        set found 0
        for {set i 0} {$i < [llength $lines]} {incr i} {
            set instr [regexp -inline {[A-Za-z0-9_]+} [lindex $lines $i]]
            if {"$last2 $last $instr" eq "jump label jump"} {
                set oldTrg [string trimright [lindex $lines $i-1 1] ";"]
                set newTrg [lindex $lines $i   1]
                lset lines $i-1 #o[lindex $lines $i-1] ;# mark for deletion
                lset lines $i   #o[lindex $lines $i]
                set found 1
                break
            }
            set last2 $last
            set last  $instr
        }
        if $found {
            set tmp {} ;# remove marked lines
            foreach line $lines {
                if [regexp "jump.* $oldTrg " $line] {
                    set line [string map [list " $oldTrg " " $newTrg "] $line]
                }
                if ![string match #o* $line] {lappend tmp $line}
            }
            set lines $tmp
        } else break
    }
    return [join $lines \n]
}

Testing: first unoptimized TAL...

% aproc f x {foreach i {a b} {foreach j $x {puts $i,$j}}} -x 
proc f x {asm {
   push {a b}               ;# (0) push1 0         # "a b"
   store 1                  ;# (2) storeScalar1 %v1         # temp var 1
   pop                      ;# (4) pop
  push -1; store 2; pop    ;# (5) foreach_start4 0
 label L10;
   incrImm 2 +1;load 1;load 2
   listIndex;store i;pop
   load 1;listLength;lt     ;# (10) foreach_step4 0
   jumpFalse L63            ;# (15) jumpFalse1 +48         # pc 63
                            ;# (17) startCommand +43 1         # next cmd at pc 60
   load x                   ;# (26) loadScalar1 %v0         # var "x"
   store 4                  ;# (28) storeScalar1 %v4         # temp var 4
   pop                      ;# (30) pop
  push -1; store 5; pop    ;# (31) foreach_start4 1
 label L36;
   incrImm 5 +1;load 4;load 5
   listIndex;store j;pop
   load 4;listLength;lt     ;# (36) foreach_step4 1
   jumpFalse L58            ;# (41) jumpFalse1 +17         # pc 58
   push puts                ;# (43) push1 1         # "puts"
   load i                   ;# (45) loadScalar1 %v3         # var "i"
   push ,                   ;# (47) push1 2         # ","
   load j                   ;# (49) loadScalar1 %v6         # var "j"
   concat 3                 ;# (51) concat1 3
   invokeStk 2              ;# (53) invokeStk1 2
   pop                      ;# (55) pop
   jump L36                 ;# (56) jump1 -20         # pc 36
 label L58;
   push {}                  ;# (58) push1 3         # ""
   pop                      ;# (60) pop
   jump L10                 ;# (61) jump1 -51         # pc 10
 label L63;
   push {}                  ;# (63) push1 3         # ""
                            ;# (65) done
 label Done;
}}

and now, with the -o switch added, the optimized version:

% aproc f x {foreach i {a b} {foreach j $x {puts $i,$j}}} -x -o
proc f x {asm {
   push {a b}               ;# (0) push1 0         # "a b"
   store 1                  ;# (2) storeScalar1 %v1         # temp var 1
   pop                      ;# (4) pop
  push -1; store 2; pop    ;# (5) foreach_start4 0
 label L10;
   incrImm 2 +1;load 1;load 2
   listIndex;store i;pop
   load 1;listLength;lt     ;# (10) foreach_step4 0
   jumpFalse L63            ;# (15) jumpFalse1 +48         # pc 63
                            ;# (17) startCommand +43 1         # next cmd at pc 60
   load x                   ;# (26) loadScalar1 %v0         # var "x"
   store 4                  ;# (28) storeScalar1 %v4         # temp var 4
   pop                      ;# (30) pop
  push -1; store 5; pop    ;# (31) foreach_start4 1
 label L36;
   incrImm 5 +1;load 4;load 5
   listIndex;store j;pop
   load 4;listLength;lt     ;# (36) foreach_step4 1
   jumpFalse L10            ;# (41) jumpFalse1 +17         # pc 58
   push puts                ;# (43) push1 1         # "puts"
   load i                   ;# (45) loadScalar1 %v3         # var "i"
   push ,                   ;# (47) push1 2         # ","
   load j                   ;# (49) loadScalar1 %v6         # var "j"
   concat 3                 ;# (51) concat1 3
   invokeStk 2              ;# (53) invokeStk1 2
   pop                      ;# (55) pop
   jump L36                 ;# (56) jump1 -20         # pc 36
 label L63;
   push {}                  ;# (63) push1 3         # ""
                            ;# (65) done
 label Done;
}}

Lines 58 to 61 are gone, line 41 now jumps directly to L10... but does it still work as before?

% f {0 1 2}
a,0
a,1
a,2
b,0
b,1
b,2