TAL

Tcl Assembly Language. Assembles into stack-based CISC bytecode as used by Tcl's bytecode compiler.

See Also

bytecode
MS's bytecode engine ideas
The anatomy of a bytecoded command
Kevin's Tcl Optimization Note #1 ,KBK ,2012-11-21
explores the sorts of analyses that a hypothetical Tcl optimizer could apply to Tcl programs in order to improve them

Parsing, Bytecodes and Execution:

dis2asm: convert disassembled bytecode to TAL that can be assembled

Visual TAL

Description

The toplevel commands available are

Their instruction names sometimes differ slightly: disassemble indicates the length of the argument in bytes (I suppose), while assemble figures that out itself from the data.Examples:

push1 0     - disassemble: push the first local literal, 1-byte pointer/index 0
push  hello - assemble:    push the literal "hello": put it in the literal list, use its index

What disassembles as

(0) loadScalar1 %v0         # var "x"

shall be assembled as

load x

Also, done that appears at the end of disassemblies is a "bad instruction" (or, expressed friendlier, a pseudo-instruction like END in the old days) for assemble. What worries me more is done in the middle of an disassembly... it seems to imply a return, but it can't be used in assembly.

For an initial (and provably incomplete) converter from disassembly output to assembly input, see dis2asm.

As usual, Tcl's introspection helps in learning about TAL:

% tcl::unsupported::assemble help

bad instruction "help": must be push, add, append, appendArray, appendArrayStk, appendStk, arrayExistsImm, arrayExistsStk, arrayMakeImm, arrayMakeStk, beginCatch, bitand, bitnot, bitor, bitxor, concat, coroName, currentNamespace, dictAppend, dictExists, dictExpand, dictGet, dictIncrImm, dictLappend, dictRecombineStk, dictRecombineImm, dictSet, dictUnset, div, dup, endCatch, eq, eval, evalStk, exist, existArray, existArrayStk, existStk, expon, expr, exprStk, ge, gt, incr, incrArray, incrArrayImm, incrArrayStk, incrArrayStkImm, incrImm, incrStk, incrStkImm, infoLevelArgs, infoLevelNumber, invokeStk, jump, jump4, jumpFalse, jumpFalse4, jumpTable, jumpTrue, jumpTrue4, label, land, lappend, lappendArray, lappendArrayStk, lappendStk, le, lindexMulti, list, listConcat, listIn, listIndex, listIndexImm, listLength, listNotIn, load, loadArray, loadArrayStk, loadStk, lor, lsetFlat, lsetList, lshift, lt, mod, mult, neq, nop, not, nsupvar, over, pop, pushReturnCode, pushReturnOpts, pushResult, regexp, resolveCmd, reverse, rshift, store, storeArray, storeArrayStk, storeStk, strcmp, streq, strfind, strindex, strlen, strmap, strmatch, strneq, strrange, strrfind, sub, tclooClass, tclooIsObject, tclooNamespace, tclooSelf, tryCvtToNumeric, uminus, unset, unsetArray, unsetArrayStk, unsetStk, uplus, upvar, variable, verifyDict, or yield

Lots of help for the enquiring mind... - 130 instructions so far ;^)

To extract the instruction set into a list, you can do

  catch {asm .} res
  set is [join [split "[string range $res 28 end-10] yield" ,]]

push add append appendArray appendArrayStk appendStk arrayExistsImm arrayExistsStk arrayMakeImm arrayMakeStk beginCatch bitand bitnot bitor bitxor concat coroName currentNamespace dictAppend dictExists dictExpand dictGet dictIncrImm dictLappend dictRecombineStk dictRecombineImm dictSet dictUnset div dup endCatch eq eval evalStk exist existArray existArrayStk existStk expon expr exprStk ge gt incr incrArray incrArrayImm incrArrayStk incrArrayStkImm incrImm incrStk incrStkImm infoLevelArgs infoLevelNumber invokeStk jump jump4 jumpFalse jumpFalse4 jumpTable jumpTrue jumpTrue4 label land lappend lappendArray lappendArrayStk lappendStk le lindexMulti list listConcat listIn listIndex listIndexImm listLength listNotIn load loadArray loadArrayStk loadStk lor lsetFlat lsetList lshift lt mod mult neq nop not nsupvar over pop pushReturnCode pushReturnOpts pushResult regexp resolveCmd reverse rshift store storeArray storeArrayStk storeStk strcmp streq strfind strindex strlen strmap strmatch strneq strrange strrfind sub tclooClass tclooIsObject tclooNamespace tclooSelf tryCvtToNumeric uminus unset unsetArray unsetArrayStk unsetStk uplus upvar variable verifyDict yield

For experimenting, here are some little helpers:

interp alias {} asm    {} ::tcl::unsupported::assemble
interp alias {} disasm {} ::tcl::unsupported::disassemble

#-- Code a proc in Tcl, see how its bytecode disassembles:
proc aproc {name argl body} {
    proc $name $argl $body
    disasm proc $name
}

Execution happens immediately, if the asm command is not in a proc body:

% set x hello
hello
% asm "push $x;strlen"
5

Some more examples how to experiment with TAL instructions (error messages are good teachers ;^):

% asm {push 1;expon}
stack underflow
% asm {push 2;push 3;expon}
8
% asm {push 2;push 3;over 1;expon}
stack is unbalanced on exit from the code (depth=2)
% asm {push 2;push 3;reverse 2;expon}
9

This one failed my expectation (I wanted a test for lists of length 0):

% proc lempty lst {asm {push lst;listLength;push 0;eq}}
% lempty a
0
% lempty {}
0
% proc lempty lst {asm {push lst;listLength}}
% lempty {}
1
% lempty a
1

Lesson learned: "push" takes a constant as argument. To dereference a (proc-local) variable, use "load":

% proc lempty lst {asm {load lst;listLength;push 0;eq}}
% lempty {}
1
% lempty a
0

Or start writing the smallest macro assembler in the world - masm :^)

I could have learned that by disassembling the equivalent Tcl proc:

% aproc f x {expr {[llength $x] == 0}}
ByteCode 0x0x88b7640, refCt 1, epoch 15, interp 0x0x88af7a0 (epoch 15)
  Source "expr {[llength $x] == 0}"
  Cmds 2, src 24, inst 7, litObjs 1, aux 0, stkDepth 2, code/src 0.00
  Proc 0x0x8965c60, refCt 1, args 1, compiled locals 1
      slot 0, scalar, arg, "x"
  Commands 2:
      1: pc 0-5, src 0-23        2: pc 0-2, src 7-16
  Command 1: "expr {[llength $x] == 0}"
  Command 2: "llength $x"
    (0) loadScalar1 %v0         # var "x"
    (2) listLength 
    (3) push1 0         # "0"
    (5) eq 
    (6) done 

which as input to assemble would have to be converted to

load x
listLength
push 0
eq

The classic sign (of a number) example in TAL, corresponding to expr {$x>0 - $x<0}:

% proc sign x {asm {load x; push 0; gt; load x; push 0; lt; sub}}
% sign 42
1
% sign -42
-1
% sign 0
0

This illustrates how braced vs. non-braced expressions are byte-compiled differently:

% aproc f x {expr {$x+1}}
ByteCode 0x0x93492f0, refCt 1, epoch 15, interp 0x0x92c07a0 (epoch 15)
  Source "expr {$x+1}"
  Cmds 1, src 11, inst 6, litObjs 1, aux 0, stkDepth 2, code/src 0.00
  Proc 0x0x935b298, refCt 1, args 1, compiled locals 1
      slot 0, scalar, arg, "x"
  Commands 1:
      1: pc 0-4, src 0-10
  Command 1: "expr {$x+1}"
    (0) loadScalar1 %v0         # var "x"
    (2) push1 0         # "1"
    (4) add 
    (5) done 

% aproc f x {expr $x+1}
ByteCode 0x0x93492f0, refCt 1, epoch 15, interp 0x0x92c07a0 (epoch 15)
  Source "expr $x+1"
  Cmds 1, src 9, inst 8, litObjs 1, aux 0, stkDepth 2, code/src 0.00
  Proc 0x0x937b8f0, refCt 1, args 1, compiled locals 1
      slot 0, scalar, arg, "x"
  Commands 1:
      1: pc 0-6, src 0-8
  Command 1: "expr $x+1"
    (0) loadScalar1 %v0         # var "x"
    (2) push1 0         # "+1"
    (4) concat1 2 
    (6) exprStk 
    (7) done 

See more discussion on this at Brace your Expr-essions.

Examples

Fib in TAL, by kbk, 2012-08-21
expr
Presents some disassembly and nice explanation by AMG.

Discussion

CMcC: I've put some incomplete work on tcl script generation of bytecodes here: https://wiki.tcl-lang.org/_repo/bytecode/

It dates from Apr03, and may not even compile - it's certainly incomplete.


Zarutian 2006-10-28: hmm... specifying the grammar of TAL shouldn't be hard.

Backus Naur Forms:

  <statement> ::= [<label>] <instruction> "\n"
  <label>   ::= <alpanumeric string> ":"
  <instruction> ::= <mnemonic name> [<parameter>]*

AMG: This is not the final BNF of TAL, if the example on the assemble page is any indicator. It seems the grammar is even simpler than you envision, since label is itself a mnemonic. Another difference is the existence of comments, which seem to go from # to end of line, and semicolons as statement separators. I'm guessing semicolons and newlines are interchangeable, and semicolons are only strictly needed when multiple statements appear on a line or when a comment appears on the same line as a statement. I'd like to see a formal definition...

Zarutian 28. oktober 2006: Type of instructions needed:

  • Arithmetic and logic operations
  • Input/Output
  • Memory fetching and storage
  • Branching

But the question is: what architecture does TAL target? Tcl bytecodes?

AK 2007-01-10: AFAIK it would target Tcl bytecodes. As for the grammar, I would say it should be a list representation in general, and anything more formatted can be derived from that, i.e., for display, maybe editing, etc.

DKF (same day): It should be noted that currently it is obscenely difficult to manage branching properly, especially when they are forward jumps. It should also be noted that there are some instructions that it is probably difficult to issue outside the main BCC anyway, as they use complex blocks of metadata (especially those relating to foreach loops and switch jump-tables).