[Richard Suchenwirth] 2013-11-30 - Another chapter in the [dis2asm] saga: The Tcl compiler converts [proc] bodies into [bytecode]. With tcl::unsupported::'''disassemble''' we can inspect the generated code in "dis" [assembler] notation. With tcl::unsupported::'''assemble''' we can convert a (somewhat different) assembler notation "[TAL]" to bytecode again. The job of [dis2asm] is to convert a string in "dis" to another string in TAL so the original proc works equally well. Looking at the [dis2asm] output, we sometimes notice that it produces code that could be better - in other words, "optimized": doing the same job in less bytecodes, and hence (at least marginally) less time. Examples: push {} pop is an absolutely redundant piece of code: first something is pushed on the stack, and then immediately popped off again. No effect, but 3 bytes in bytecode, and just a little more time needed to run. Or, the second and third line of jump Lxx label Lyy; jump Lzz are also redundant: because the code above jumps away, Lyy can only be reached by code that explicitly jumps there - only to be redirected to Lzz again. It might as well have directly jumped to Lzz. The code shown on this page deals with such issues. It is a postprocessor which converts [dis2asm] output to another string in the same TAL language, but optimized where possible. To test it, I have extended the ''aproc'' wrapper to accept an -o flag and if present, to run the optimizer on the TAL output: ====== proc aproc {name argl body args} { proc $name $argl $body set res [disasm proc $name] if {"-x" in $args} { set res [list proc $name $argl [list asm [dis2asm $res]]] if {"-o" in $args} {set res [optimize $res]} eval $res } return $res } ====== The '''optimize''' proc is a little longer than that, even though it currently just handles the few cases discussed above (and below). It splits the TAL input into a list of lines and iterates over them with [for], so it can also operate on other than the current line. Lines considered redundant are first marked with the prefix "#o", and removed after one pass, so that indexes don't get confused. In testing conditions, the previous and current instructions are put together in a string - the "peephole". I think e.g. "push pop" is quite self-documenting for a peephole condition... :^) ====== proc optimize tal { set last "" set last2 "" set lines [split $tal \n] for {set i 0} {$i < [llength $lines]} {incr i} { set instr [regexp -inline {[A-Za-z0-9_]+} [lindex $lines $i]] if {"$last $instr" eq "push pop"} { lset lines $i-1 #o[lindex $lines $i-1] ;# mark for deletion lset lines $i #o[lindex $lines $i] } elseif {"$last $instr" eq "jump jump"} { ;# unreachable jump lset lines $i #o[lindex $lines $i] } set last $instr } while 1 { set tmp {} ;# remove marked lines foreach line $lines {if ![string match #o* $line] {lappend tmp $line}} set lines $tmp set found 0 for {set i 0} {$i < [llength $lines]} {incr i} { set instr [regexp -inline {[A-Za-z0-9_]+} [lindex $lines $i]] if {"$last2 $last $instr" eq "jump label jump"} { set oldTrg [string trimright [lindex $lines $i-1 1] ";"] set newTrg [lindex $lines $i 1] lset lines $i-1 #o[lindex $lines $i-1] ;# mark for deletion lset lines $i #o[lindex $lines $i] set found 1 break } set last2 $last set last $instr } if $found { set tmp {} ;# remove marked lines foreach line $lines { if [regexp "jump.* $oldTrg " $line] { set line [string map [list " $oldTrg " " $newTrg "] $line] } if ![string match #o* $line] {lappend tmp $line} } set lines $tmp } else break } return [join $lines \n] } ====== Testing: first unoptimized TAL... ====== % aproc f x {foreach i {a b} {foreach j $x {puts $i,$j}}} -x proc f x {asm { push {a b} ;# (0) push1 0 # "a b" store 1 ;# (2) storeScalar1 %v1 # temp var 1 pop ;# (4) pop push -1; store 2; pop ;# (5) foreach_start4 0 label L10; incrImm 2 +1;load 1;load 2 listIndex;store i;pop load 1;listLength;lt ;# (10) foreach_step4 0 jumpFalse L63 ;# (15) jumpFalse1 +48 # pc 63 ;# (17) startCommand +43 1 # next cmd at pc 60 load x ;# (26) loadScalar1 %v0 # var "x" store 4 ;# (28) storeScalar1 %v4 # temp var 4 pop ;# (30) pop push -1; store 5; pop ;# (31) foreach_start4 1 label L36; incrImm 5 +1;load 4;load 5 listIndex;store j;pop load 4;listLength;lt ;# (36) foreach_step4 1 jumpFalse L58 ;# (41) jumpFalse1 +17 # pc 58 push puts ;# (43) push1 1 # "puts" load i ;# (45) loadScalar1 %v3 # var "i" push , ;# (47) push1 2 # "," load j ;# (49) loadScalar1 %v6 # var "j" concat 3 ;# (51) concat1 3 invokeStk 2 ;# (53) invokeStk1 2 pop ;# (55) pop jump L36 ;# (56) jump1 -20 # pc 36 label L58; push {} ;# (58) push1 3 # "" pop ;# (60) pop jump L10 ;# (61) jump1 -51 # pc 10 label L63; push {} ;# (63) push1 3 # "" ;# (65) done label Done; }} ====== and now, with the -o switch added, the optimized version: ====== % aproc f x {foreach i {a b} {foreach j $x {puts $i,$j}}} -x -o proc f x {asm { push {a b} ;# (0) push1 0 # "a b" store 1 ;# (2) storeScalar1 %v1 # temp var 1 pop ;# (4) pop push -1; store 2; pop ;# (5) foreach_start4 0 label L10; incrImm 2 +1;load 1;load 2 listIndex;store i;pop load 1;listLength;lt ;# (10) foreach_step4 0 jumpFalse L63 ;# (15) jumpFalse1 +48 # pc 63 ;# (17) startCommand +43 1 # next cmd at pc 60 load x ;# (26) loadScalar1 %v0 # var "x" store 4 ;# (28) storeScalar1 %v4 # temp var 4 pop ;# (30) pop push -1; store 5; pop ;# (31) foreach_start4 1 label L36; incrImm 5 +1;load 4;load 5 listIndex;store j;pop load 4;listLength;lt ;# (36) foreach_step4 1 jumpFalse L10 ;# (41) jumpFalse1 +17 # pc 58 push puts ;# (43) push1 1 # "puts" load i ;# (45) loadScalar1 %v3 # var "i" push , ;# (47) push1 2 # "," load j ;# (49) loadScalar1 %v6 # var "j" concat 3 ;# (51) concat1 3 invokeStk 2 ;# (53) invokeStk1 2 pop ;# (55) pop jump L36 ;# (56) jump1 -20 # pc 36 label L63; push {} ;# (63) push1 3 # "" ;# (65) done label Done; }} ====== Lines marked 58 to 61 are gone, line 41 now jumps directly to L10... but does it still work as before? ====== % f {0 1 2} a,0 a,1 a,2 b,0 b,1 b,2 ====== Another potential for optimization appears in the following test: ====== % aproc f x {foreach i $x {if {$i eq "b"} continue;puts $i}} -x proc f x {asm { load x ;# (0) loadScalar1 %v0 # var "x" store 1 ;# (2) storeScalar1 %v1 # temp var 1 pop ;# (4) pop push -1; store 2; pop ;# (5) foreach_start4 0 label L10; incrImm 2 +1;load 1;load 2 listIndex;store i;pop load 1;listLength;lt ;# (10) foreach_step4 0 jumpFalse L61 ;# (15) jumpFalse1 +46 # pc 61 ;# (17) startCommand +34 1 # next cmd at pc 51 load i ;# (26) loadScalar1 %v3 # var "i" push b ;# (28) push1 0 # "b" streq ;# (30) streq jumpFalse L49 ;# (31) jumpFalse1 +18 # pc 49 ;# (33) startCommand +14 1 # next cmd at pc 47 jump L10 ;# (42) jump4 -32 # pc 10 jump L51 ;# (47) jump1 +4 # pc 51 label L49; push {} ;# (49) push1 1 # "" label L51; pop ;# (51) pop push puts ;# (52) push1 2 # "puts" load i ;# (54) loadScalar1 %v3 # var "i" invokeStk 2 ;# (56) invokeStk1 2 pop ;# (58) pop jump L10 ;# (59) jump1 -49 # pc 10 label L61; push {} ;# (61) push1 1 # "" ;# (63) done label Done; }} ====== Quite a mouthful, from a one-liner [proc]... Notice the lines marked (42) and (47), the latter of which is dead code that cannot be reached. I have added code to handle that situation in the ''optimize'' proc above. Another two bytes saved... <>Example