TAL

Difference between version 56 and 57 - Previous - Next
'''Tcl [Assembly Language]'''. Assembles into stack-based CISC [bytecode] as used
by Tcl's bytecode compiler.

** See Also **

   [bytecode]:   

   [MS's bytecode engine ideas]:   

   [The anatomy of a bytecoded command]:   
   
   [http://kbk.is-a-geek.net/Tcl/20121121-numeric.txt%|%Kevin's Tcl Optimization Note #1] ,[KBK] ,2012-11-21:   explores the sorts of analyses that a hypothetical Tcl optimizer could apply to Tcl programs in order to improve them

[Parsing, Bytecodes and Execution]: 

[dis2asm]: convert disassembled bytecode to [TAL] that can be assembled 

[Visual TAL]

** Description **

The toplevel commands available are

   * tcl::unsupported::[disassemble] (since 8.5)
   * tcl::unsupported::[assemble]    (since 8.6)


Their instruction names sometimes differ slightly: '''disassemble''' indicates
the length of the argument in bytes (I suppose), while '''assemble''' figures
that out itself from the data.Examples:

======none
push1 0     - disassemble: push the first local literal, 1-byte pointer/index 0
push  hello - assemble:    push the literal "hello": put it in the literal list, use its index
======

What disassembles as

======none
(0) loadScalar1 %v0         # var "x"
======

shall be assembled as

======none
load x
======

Also, ''done'' that appears at the end of disassemblies is a "bad instruction"
(or, expressed friendlier, a pseudo-instruction like END in the old days) for
''assemble''. What worries me more is ''done'' in the middle of an disassembly... it seems to imply a [return], but it can't be used in assembly.

For an initial (and provably incomplete) converter from disassembly output to assembly input, see [dis2asm].

As usual, Tcl's [introspection] helps in learning about TAL:

======none
% tcl::unsupported::assemble help
======
bad instruction "help": must be push, add, append, appendArray, appendArrayStk, appendStk, arrayExistsImm, arrayExistsStk, arrayMakeImm, arrayMakeStk, beginCatch, bitand, bitnot, bitor, bitxor, concat, coroName, currentNamespace, dictAppend, dictExists, dictExpand, dictGet, dictIncrImm, dictLappend, dictRecombineStk, dictRecombineImm, dictSet, dictUnset, div, dup, endCatch, eq, eval, evalStk, exist, existArray, existArrayStk, existStk, expon, expr, exprStk, ge, gt, incr, incrArray, incrArrayImm, incrArrayStk, incrArrayStkImm, incrImm, incrStk, incrStkImm, infoLevelArgs, infoLevelNumber, invokeStk, jump, jump4, jumpFalse, jumpFalse4, jumpTable, jumpTrue, jumpTrue4, label, land, lappend, lappendArray, lappendArrayStk, lappendStk, le, lindexMulti, list, listConcat, listIn, listIndex, listIndexImm, listLength, listNotIn, load, loadArray, loadArrayStk, loadStk, lor, lsetFlat, lsetList, lshift, lt, mod, mult, neq, nop, not, nsupvar, over, pop, pushReturnCode, pushReturnOpts, pushResult, regexp, resolveCmd, reverse, rshift, store, storeArray, storeArrayStk, storeStk, strcmp, streq, strfind, strindex, strlen, strmap, strmatch, strneq, strrange, strrfind, sub, tclooClass, tclooIsObject, tclooNamespace, tclooSelf, tryCvtToNumeric, uminus, unset, unsetArray, unsetArrayStk, unsetStk, uplus, upvar, variable, verifyDict, or yield

Lots of help for the enquiring mind... - 130 instructions so far ;^)

To extract the instruction set into a list, you can do
======
  catch {asm .} res
  set is [join [split "[string range $res 28 end-10] yield" ,]]
======
push  add  append  appendArray  appendArrayStk  appendStk  arrayExistsImm  arrayExistsStk  arrayMakeImm  arrayMakeStk  beginCatch  bitand  bitnot  bitor  bitxor  concat  coroName  currentNamespace  dictAppend  dictExists  dictExpand  dictGet  dictIncrImm  dictLappend  dictRecombineStk  dictRecombineImm  dictSet  dictUnset  div  dup  endCatch  eq  eval  evalStk  exist  existArray  existArrayStk  existStk  expon  expr  exprStk  ge  gt  incr  incrArray  incrArrayImm  incrArrayStk  incrArrayStkImm  incrImm  incrStk  incrStkImm  infoLevelArgs  infoLevelNumber  invokeStk  jump  jump4  jumpFalse  jumpFalse4  jumpTable  jumpTrue  jumpTrue4  label  land  lappend  lappendArray  lappendArrayStk  lappendStk  le  lindexMulti  list  listConcat  listIn  listIndex  listIndexImm  listLength  listNotIn  load  loadArray  loadArrayStk  loadStk  lor  lsetFlat  lsetList  lshift  lt  mod  mult  neq  nop  not  nsupvar  over  pop  pushReturnCode  pushReturnOpts  pushResult  regexp  resolveCmd  reverse  rshift  store  storeArray  storeArrayStk  storeStk  strcmp  streq  strfind  strindex  strlen  strmap  strmatch  strneq  strrange  strrfind  sub  tclooClass  tclooIsObject  tclooNamespace  tclooSelf  tryCvtToNumeric  uminus  unset  unsetArray  unsetArrayStk  unsetStk  uplus  upvar  variable  verifyDict yield


For experimenting, here are some little helpers:

======
interp alias {} asm    {} ::tcl::unsupported::assemble
interp alias {} disasm {} ::tcl::unsupported::disassemble

#-- Code a proc in Tcl, see how its bytecode disassembles:
proc aproc {name argl body} {
    proc $name $argl $body
    disasm proc $name
}
======

Execution happens immediately, if the asm command is not in a proc body:

======
% set x hello
hello
% asm "push $x;strlen"
5
======

Some more examples how to experiment with TAL instructions (error messages are good teachers ;^):

======
% asm {push 1;expon}
stack underflow
% asm {push 2;push 3;expon}
8
% asm {push 2;push 3;over 1;expon}
stack is unbalanced on exit from the code (depth=2)
% asm {push 2;push 3;reverse 2;expon}
9
======

This one failed my expectation (I wanted a test for lists of length 0):

======
% proc lempty lst {asm {push lst;listLength;push 0;eq}}
% lempty a
0
% lempty {}
0
% proc lempty lst {asm {push lst;listLength}}
% lempty {}
1
% lempty a
1
======

Lesson learned: "push" takes a constant as argument. To dereference a
(proc-local) variable, use "load":

======
% proc lempty lst {asm {load lst;listLength;push 0;eq}}
% lempty {}
1
% lempty a
0
======

Or start writing the smallest macro assembler in the world - [masm] :^)

I could have learned that by disassembling the equivalent Tcl proc:
======
% aproc f x {expr {[llength $x] == 0}}
ByteCode 0x0x88b7640, refCt 1, epoch 15, interp 0x0x88af7a0 (epoch 15)
  Source "expr {[llength $x] == 0}"
  Cmds 2, src 24, inst 7, litObjs 1, aux 0, stkDepth 2, code/src 0.00
  Proc 0x0x8965c60, refCt 1, args 1, compiled locals 1
      slot 0, scalar, arg, "x"
  Commands 2:
      1: pc 0-5, src 0-23        2: pc 0-2, src 7-16
  Command 1: "expr {[llength $x] == 0}"
  Command 2: "llength $x"
    (0) loadScalar1 %v0         # var "x"
    (2) listLength 
    (3) push1 0         # "0"
    (5) eq 
    (6) done 
======

which as input to '''assemble''' would have to be converted to

======none
load x
listLength
push 0
eq
======

The classic ''sign'' (of a number) example in TAL, corresponding to ''expr
{$x>0 - $x<0}'':

======
% proc sign x {asm {load x; push 0; gt; load x; push 0; lt; sub}}
% sign 42
1
% sign -42
-1
% sign 0
0
======

This illustrates how braced vs. non-braced [expr]essions are byte-compiled
differently:

======
% aproc f x {expr {$x+1}}
ByteCode 0x0x93492f0, refCt 1, epoch 15, interp 0x0x92c07a0 (epoch 15)
  Source "expr {$x+1}"
  Cmds 1, src 11, inst 6, litObjs 1, aux 0, stkDepth 2, code/src 0.00
  Proc 0x0x935b298, refCt 1, args 1, compiled locals 1
      slot 0, scalar, arg, "x"
  Commands 1:
      1: pc 0-4, src 0-10
  Command 1: "expr {$x+1}"
    (0) loadScalar1 %v0         # var "x"
    (2) push1 0         # "1"
    (4) add 
    (5) done 

% aproc f x {expr $x+1}
ByteCode 0x0x93492f0, refCt 1, epoch 15, interp 0x0x92c07a0 (epoch 15)
  Source "expr $x+1"
  Cmds 1, src 9, inst 8, litObjs 1, aux 0, stkDepth 2, code/src 0.00
  Proc 0x0x937b8f0, refCt 1, args 1, compiled locals 1
      slot 0, scalar, arg, "x"
  Commands 1:
      1: pc 0-6, src 0-8
  Command 1: "expr $x+1"
    (0) loadScalar1 %v0         # var "x"
    (2) push1 0         # "+1"
    (4) concat1 2 
    (6) exprStk 
    (7) done 

====== 

See more discussion on this at [Brace your Expr-essions].



** Examples **

   [Fib in TAL], by [kbk], 2012-08-21:   

   `[expr]`:   Presents some disassembly and nice explanation by [AMG].



** Discussion **

[CMcC]: I've put some incomplete work on tcl script generation of bytecodes
here: http://wiki.tcl.tk/_repo/bytecode/

It dates from Apr03, and may not even compile - it's certainly incomplete.

----

[Zarutian] 2006-10-28:  hmm... specifying the grammar of TAL shouldn't be hard.

Backus Naur Forms:
  <statement> ::= [<label>] <instruction> "\n"
  <label>   ::= <alpanumeric string> ":"
  <instruction> ::= <mnemonic name> [<parameter>]*

[AMG]: This is not the final BNF of TAL, if the example on the [assemble] page
is any indicator.  It seems the grammar is even simpler than you envision,
since `label` is itself a mnemonic.  Another difference is the existence of
comments, which seem to go from # to end of line, and semicolons as statement
separators.  I'm guessing semicolons and newlines are interchangeable, and
semicolons are only strictly needed when multiple statements appear on a line
or when a comment appears on the same line as a statement.  I'd like to see a
formal definition...

[Zarutian] 28. oktober 2006: Type of instructions needed:

   * Arithmetic and logic operations
   * Input/Output
   * Memory fetching and storage
   * Branching

But the question is: what architecture does TAL target? Tcl bytecodes?

[AK] 2007-01-10: AFAIK it would target Tcl bytecodes. As for the grammar, I
would say it should be a list representation in general, and anything more
formatted can be derived from that, i.e., for display, maybe editing, etc.

[DKF] (same day): It should be noted that currently it is obscenely difficult
to manage branching properly, especially when they are forward jumps. It should
also be noted that there are some instructions that it is probably difficult to
issue outside the main BCC anyway, as they use complex blocks of metadata
(especially those relating to foreach loops and switch jump-tables).

<<categories>> Glossary