dis2asm does macros

Richard Suchenwirth 2013-11-30 - dis2asm, the converter from the language produced by tcl::unsupported::disassemble, to the one accepted by tcl::unsupported::assemble, TAL, had a troublesome issue: it could do for and while loops, but not foreach. Disassembling a foreach loop came out as

 % aproc f x {foreach j $x {puts $j}} 
 ByteCode 0x0x9eaad68, refCt 1, epoch 16, interp 0x0x9e057a0 (epoch 16)
  Source "foreach j $x {puts $j}"
  Cmds 2, src 22, inst 29, litObjs 2, aux 1, stkDepth 2, code/src 0.00
  Proc 0x0x9ebb820, refCt 1, args 1, compiled locals 4
      slot 0, scalar, arg, "x"
      slot 1, scalar, temp
      slot 2, scalar, temp
      slot 3, scalar, "j"
  Exception ranges 1, depth 1:
      0: level 0, loop, pc 17-22, continue 10, break 26
  Commands 2:
      1: pc 0-27, src 0-21        2: pc 17-22, src 14-20
  Command 1: "foreach j $x {puts $j}"
    (0) loadScalar1 %v0         # var "x"
    (2) storeScalar1 %v1         # temp var 1
    (4) pop 
    (5) foreach_start4 0 
                [data=[%v1], loop=%v2
                 it%v1        [%v3]]
    (10) foreach_step4 0 
                [data=[%v1], loop=%v2
                 it%v1        [%v3]]
    (15) jumpFalse1 +11         # pc 26
  Command 2: "puts $j"
    (17) push1 0         # "puts"
    (19) loadScalar1 %v3         # var "j"
    (21) invokeStk1 2 
    (23) pop 
    (24) jump1 -14         # pc 10
    (26) push1 1         # ""
    (28) done 

and in TAL there are no instructions foreach_start (see (5)) or foreach_step (see (10)). But I wanted them so badly...

Luckily it was weekend, and I had some time on my hands - so I made this my fun project of the day. At first, I manually coded a working solution in TAL:

 proc foreachtest x {asm {
    load x
    listLength
    store _l
    push -1
    store _i

    label loop;
    push puts
    load x
    load _i
    listIndex
    invokeStk 2
    pop
    incrImm _i +1
    load _l
    lt
    jumpTrue loop

    pop
    pop
    push {}
 }}
 foreachtest {a b c d}

This uses two local variables, _i as index into the input list, and _l to hold the length of the list. The code at the puts invocation did however not match the disassembly above. So as next step, I restructured the code to look exactly like the disassembly, except for extra instructions to replace foreach_start and foreach_step - I considered them macros. So here is the second version, that meets that requirement:

 proc foreachtest x {asm {
    load x
    store 1
    pop
    #foreach_start ;#
    push -1        ;#
    store 2        ;#
    pop            ;#
 label L10
    #foreach_step ;#
    incrImm 2 +1  ;#
    load 1        ;#
    load 2        ;#
    listIndex     ;#
    store i       ;#
    pop           ;#
    load 1        ;#
    listLength    ;#
    lt            ;#
    jumpFalse L26
    push puts
    load i
    invokeStk 2
    pop
    jump L10
 label L26
    push {}
 }}
 foreachtest {e f g h}
e
f
g
h

The macros are commented out, followed by their expansions also marked by a trailing comment. In the local variables, I accepted the numeric names

  • "1" (for a local copy of the list to be iterated over, in case it was changed in the loop) and
  • "2" (for the index into the list),

which are valid in Tcl as well. "i" now holds the current list element, like it should. The llength can be checked at runtime, no variable needed for that. There still was the problem that inside the expansion, "i" was mentioned (in the "store i" instruction), and could of course have any other name in practice (my counter-test was to use "j" instead ;^) So the macro had to be parametrized. Looking at the disassembly again, the local variable "slots" are listed:

      slot 0, scalar, arg, "x"
      slot 1, scalar, temp
      slot 2, scalar, temp
      slot 3, scalar, "j"

The sequence is not the same as in the argument list, but that doesn't matter - the "temp" ones can use the slot number as their name. I save the slot info in a temporary array indexed by slot number. Now, the foreach_* macros each span three lines:

    (5) foreach_start4 0 
                [data=[%v1], loop=%v2
                 it%v1        [%v3]]
    (10) foreach_step4 0 
                [data=[%v1], loop=%v2
                 it%v1        [%v3]]

and this is a little troublesome, as dis2asm processes one line at a time. Clearly, in the example

  • %v1 stands for the local copy of the list,
  • %v2 is the index, iterating from 0 to listLength-1,
  • %v3 indicates the named loop variable,

so I can parse these variables out of there. With foreach_start, I had to delay code generation until its third input line had been parsed.

Proof of the pudding: disassemble, reassemble, run:

% aproc f x {foreach j $x {puts $j}} -x
proc f x {asm {
   load x                   ;# (0) loadScalar1 %v0         # var "x"
   store 1                  ;# (2) storeScalar1 %v1         # temp var 1
   pop                      ;# (4) pop
   push -1; store 2;pop     ;# (5) foreach_start4 0
 label L10;
   incrImm 2 +1;load 1;load 2
   listIndex;store j;pop
   load 1;listLength;lt     ;# (10) foreach_step4 0
   jumpFalse L26            ;# (15) jumpFalse1 +11         # pc 26
   push puts                ;# (17) push1 0         # "puts"
   load j                   ;# (19) loadScalar1 %v3         # var "j"
   invokeStk 2              ;# (21) invokeStk1 2
   pop                      ;# (23) pop
   jump L10                 ;# (24) jump1 -14         # pc 10
 label L26;
   push {}                  ;# (26) push1 1         # ""
                            ;# (28) done
}}
% f {T A L}
T
A
L

Boy, was I happy! This is still far away from a real, decent macro assembler, with the tests and the substitutions hard-wired in the code, but it is a good result for the 3 hours it took me to investigate and hack... The add_locals proc turned out to be unnecessary. The aproc wrapper got simpler:

proc aproc {name argl body args} {
    proc $name $argl $body
    set res [disasm proc $name]
    if {"-x" in $args} {
        set res [list proc $name $argl [list asm [dis2asm $res]]]
        eval $res
    }
    return $res
}

The dis2asm proc was extended in various ways. The macro expansions were put in variables fstart, fstep with place-holders @x to be substituted with the appropriate variable names:

  • @i for the loop variable, the current element
  • @l for the local copy of the list (so it may be changed during the loop)
  • @p for the "pointer" (integer index) to the current list element
proc dis2asm body {
    set fstart " push -1; store @p; pop   "
    set fstep  " incrImm @p +1;load @l;load @p
   listIndex;store @i;pop
   load @l;listLength;lt    "
    set res  ""
    set wait ""
    set jumptargets {}
    foreach line [split $body \n] { #-- pass 1: collect jump targets
        if [regexp {\# pc (\d+)} $line -> pc] {lappend jumptargets $pc}
    }
    foreach line [split $body \n] {
        set line [string trim $line]
        if {$line eq ""} continue
        set code ""
        if {[regexp {slot (\d+), (.+)} $line -> number descr]} {
            set slot($number) $descr
        } elseif {[regexp {data=.+loop=%v(\d+)} $line -> ptr]} {
            #got ptr, carry on
        } elseif {[regexp {it%v(\d+).+\[%v(\d+)\]} $line -> copy number]} {
            set loopvar [lindex $slot($number) end]
            if {$wait ne ""} {
                set map [list @p $ptr @i $loopvar @l $copy]
                set code [string map $map $fstart]
                append res "\n  $code ;# $wait"
                set wait ""
            }
        } elseif {[regexp {\((\d+)\) (.+)} $line -> pc instr]} {
            if {$pc in $jumptargets} {append res "\n label L$pc;"}
            if {[regexp {(.+)#(.+)} $instr -> instr comment]} {
                set arg [list [lindex $comment end]]
                if [string match jump* $instr] {set arg L$arg}
            } else {set arg ""}
            set instr0 [normalize [lindex $instr 0]]
            switch -- $instr0 {
                concat - invokeStk {set arg [lindex $instr end]}
                incrImm   {set arg [list $arg [lindex $instr end]]}
            }
            set code [format " %-24s" "$instr0 $arg"]
            switch -- $instr0 {
                startCommand  {set code ""}
                foreach_start {set wait $line; continue}
                foreach_step  {set code [string map $map $fstep]}
            }
            append res "\n  $code ;# $line"
        }
    }
    append res \n
    return $res
}

Finally, to show that dis2asm can now deal well with nested foreach loops too:

% aproc f x {foreach i {a b} {foreach j $x {puts $i,$j}}} -x
proc f x {asm {
   push {a b}               ;# (0) push1 0         # "a b"
   store 1                  ;# (2) storeScalar1 %v1         # temp var 1
   pop                      ;# (4) pop
   push -1; store 2; pop    ;# (5) foreach_start4 0
 label L10;
   incrImm 2 +1;load 1;load 2
   listIndex;store i;pop
   load 1;listLength;lt     ;# (10) foreach_step4 0
   jumpFalse L63            ;# (15) jumpFalse1 +48         # pc 63
   ;# (17) startCommand +43 1         # next cmd at pc 60
   load x                   ;# (26) loadScalar1 %v0         # var "x"
   store 4                  ;# (28) storeScalar1 %v4         # temp var 4
   pop                      ;# (30) pop
   push -1; store 5; pop    ;# (31) foreach_start4 1
 label L36;
   incrImm 5 +1;load 4;load 5
   listIndex;store j;pop
   load 4;listLength;lt     ;# (36) foreach_step4 1
   jumpFalse L58            ;# (41) jumpFalse1 +17         # pc 58
   push puts                ;# (43) push1 1         # "puts"
   load i                   ;# (45) loadScalar1 %v3         # var "i"
   push ,                   ;# (47) push1 2         # ","
   load j                   ;# (49) loadScalar1 %v6         # var "j"
   concat 3                 ;# (51) concat1 3
   invokeStk 2              ;# (53) invokeStk1 2
   pop                      ;# (55) pop
   jump L36                 ;# (56) jump1 -20         # pc 36
 label L58;
   push {}                  ;# (58) push1 3         # ""
   pop                      ;# (60) pop
   jump L10                 ;# (61) jump1 -51         # pc 10
 label L63;
   push {}                  ;# (63) push1 3         # ""
                            ;# (65) done
}}
% f {0 1}
a,0
a,1
b,0
b,1

Now to look around for the next challenge, I have only used 25% of the weekend yet...

Consider the lines marked (58) and (60) in the last demo. First, the empty string is pushed (as result of foreach), but then popped away again. These two lines can be optimized out. There remains

 label L58:
   jump L10

which can equally be weeded out, if all instances of "jump L58" (there is only one, at (41)) are rewritten to "jump L10". Tighter code (uses 5 bytes less), and will run (marginally) faster... The solution does not fit on the margin of this page, but see dis2asm gets better.

See also: dis2asm gets things done


RS 2013-12-05 - dkf pointed out that foreach has several more complex ways of calling - multiple lists, multiple iterators on one list. A simple example for both is

% proc f {x y} {foreach i $x {j k} $y {puts $i,$j,$k}}
% f {a b} {c d e f}
a,c,d
b,e,f

% disasm proc f
ByteCode 0x00F2C8D8, refCt 1, epoch 6, interp 0x00A60B70 (epoch 6)
  Source "foreach i $x {j k} $y {puts $i,$j,$k}"
  Cmds 2, src 37, inst 44, litObjs 3, aux 1, stkDepth 6, code/src 0.00
  Proc 0x00F4A558, refCt 1, args 2, compiled locals 8
      slot 0, scalar, arg, "x"
      slot 1, scalar, arg, "y"
      slot 2, scalar, temp
      slot 3, scalar, temp
      slot 4, scalar, temp
      slot 5, scalar, "i"
      slot 6, scalar, "j"
      slot 7, scalar, "k"
  Exception ranges 1, depth 1:
      0: level 0, loop, pc 22-37, continue 15, break 41
  Commands 2:
      1: pc 0-42, src 0-36        2: pc 22-37, src 23-35
  Command 1: "foreach i $x {j k} $y {puts $i,$j,$k}"
    (0) loadScalar1 %v0         # var "x"
    (2) storeScalar1 %v2         # temp var 2
    (4) pop 
    (5) loadScalar1 %v1         # var "y"
    (7) storeScalar1 %v3         # temp var 3
    (9) pop 
    (10) foreach_start4 0 
                [data=[%v2, %v3], loop=%v4
                 it%v2        [%v5],
                 it%v3        [%v6, %v7]]
    (15) foreach_step4 0 
                [data=[%v2, %v3], loop=%v4
                 it%v2        [%v5],
                 it%v3        [%v6, %v7]]
    (20) jumpFalse1 +21         # pc 41
  Command 2: "puts $i,$j,$k"
    (22) push1 0         # "puts"
    (24) loadScalar1 %v5         # var "i"
    (26) push1 1         # ","
    (28) loadScalar1 %v6         # var "j"
    (30) push1 1         # ","
    (32) loadScalar1 %v7         # var "k"
    (34) concat1 5 
    (36) invokeStk1 2 
    (38) pop 
    (39) jump1 -24         # pc 15
    (41) push1 2         # ""
    (43) done 

Evidently, the extra lines following the foreach_start and foreach_step instructions are more expressive than I first thought.

 data=[%v2, %v3]

enumerates the lists to iterate over. I'm not sure about loop=%v4, but the rest is quite clear again:

                 it%v2        [%v5],
                 it%v3        [%v6, %v7]]

specify that %v5 ("i") is to iterate over the list copy %v2, while %v6 ("j") and %v7 ("k") iterate over %v3. As the number of extra lines depends on the number of lists, it will be better to parse for bracket balance, to be sure all extra lines have been processed.

As before, I first hand-coded what I thought may be the right expansion for the more complex foreach_* macros:

proc g {x y} {asm {
    load x         ;# make a local copy of x
    store 2
    pop
    load y          ;# ... and of y
    store 3
    pop
    #foreach_start ;# initialize each of the iterators:
    push -1        ;#
    store 5        ;#
    pop            ;#

    push -2        ;# initial value must be -stride (number of iterators)
    store 6        ;#
    pop            ;#

    push -1        ;# ... and incremented by one for each
    store 7        ;#
    pop            ;#
 label L10
    #foreach_step ;#-------------------- begin macro expansion
    incrImm 5 +1  ;#
    load 2        ;#
    load 5        ;#
    listIndex     ;#
    store i       ;# i := x[%v5]
    pop           ;#
    load 2        ;#
    listLength    ;#
    lt            ;# leave on stack for the end

    incrImm 6 +2  ;# increment by stride
    load 3        ;#
    load 6        ;#
    listIndex     ;#
    store j       ;# j := y[%v6]
    pop           ;#
    load 3        ;#
    listLength    ;#
    lt            ;# leave on stack for the end

    incrImm 7 +2  ;#
    load 3        ;#
    load 7        ;#
    listIndex     ;# 
    store k       ;# k := y[%v7]
    pop           ;#
    load 3        ;#
    listLength    ;#
    lt            ;# Now three "lt" results are on the stack.

    lor           ;# If at least one of them is true, we carry on.
    lor           ;#------------------ end of foreach_step macro expansion

    jumpFalse L26
    push puts
    load i
    push ,
    load j
    push ,
    load k
    concat 5
    invokeStk 2
    pop
    jump L10
 label L26
    push {}        
}}

Now testing, with two iterators running out of data before the end:

 % g {A B} {C D E F G}
 A,C,D
 B,E,F
 ,G,

Seems to work right :^D Now to generalize the code generation...