Proc to bytecodes: when, how does it happen

Summary

An explanation, primarily by MS, of when and how a proc is compiled to bytecodes. Why compile to bytecode is a separate question.

See Also

tcl::unsupported::disassemble
be aware that it forces compilation of the things it disassembles. (Not very surprising that!)

Description

This reflects my present understanding of all this, AFAIK essentially correct - I am aware of a lot of simplifications here, which I hope result in more clarity and not confusion. In particular, there is no mention of Tcl_Obj.

I also do not address the behaviour of variables here.

In case of doubt, read the source, Luke ... and do correct me please!

What happens when you *define* a proc?

Not much, actually: the source is saved as text, a command is created in the corresponding namespace. (The text may still have syntax errors or whatever - see procs as data structures).

What happens when you *invoke* a proc?

  • IF there is a corresponding bytecode, it is checked for validity: if it is not valid (validity is linked to whether certain commands - e.g. set, if - have their current configuration) it is regenerated as if it did not previously exist (see below).
  • IF there is no corresponding bytecode, then the source text is parsed and compiled to bytecodes. The bytecodes are saved for later reuse
  • The bytecodes are executed by the engine - see details below.

What happens when a command B is invoked in the body of proc A

  • Nothing when A is defined - remember that only text is saved
  • (General case) When A is compiled, B's address is looked up in the hashtable of the corresponding namespace. A (pointer to a) structure is compiled into the bytecodes; the structure contains B's name and B's address (if found). B is *not* compiled at this time, as it was not invoked.
  • (Special case) Some core commands (set, if, ...) are "bytecode compiled": their bytecodes are inlined with the bytecodes of the calling proc, their name and/or address are not present in the bytecodes.

What happens when bytecodes are executed?

  • The bytecodes are first checked for validity (There are different situations that can invalidate a bytecode). If invalid, it is regenerated from the source.
  • At each invocation of a command (B in the example above): check B's address for validity - if invalid, look up B by name in the corresponding hashtable.
  • Prepare B's arguments, then invoke B - like a function call

Example

  • Compilation time
       # define B
       proc B x {
           if $x {
               this is an error
           } else {
               set a b
           }
       }

       # define A
       proc A x {B $x}

       # invoke A: this causes compilation of A, start running A,
       # then invoke B (which cause its compilation). Both are 
       # compiled OK
       A 0 ;# returns b
       A 1 ;# runtime error: invalid command name "this"
       A 0 ;# returns b, all is still OK

       # define a proc "this"
       proc this args {return "OK now"}

       # invoke A; previous bytecodes of both A and B are used; 
       # at the invocation of 'this', it is found, parsed, 
       # compiled and run
       A 1 ;# "this" is now found, returns "OK now"

       # redefine B, now using a bcc'ed command. B's previous 
       # bytecodes are discarded, B is saved as text, no error
       proc B x {
           if $x {
               # this is a compile-time error
               set a b c
           } else {
               set a b
           }
       }

       # invoke A: use existing compilation of A, start running A,
       # then invoke B which cause its compilation. The error is at
       # compile time due to the bcc'ed [set] (this is an open bug,
       # actually), even the "correct" case fails:
       A 0 ;# wrong # args: should be "set varName ?newValue?"

The global $tcl_traceCompile permits observation of the compiler's activity: 1 tells you when compilation takes place, 2 shows you the bytecodes too.

$tcl_traceExec permits observation of the execution engine: 1 tells you whenever a proc is invoked, 2 whenever a command is invoked, 3 shows the actual bytecodes being executed.

de: You must compile tcl with the -DTCL_COMPILE_DEBUG flag, to make this work.

Cmcc: Is there any good reason that this isn't enabled by default? Does it impose a significant overhead? I suppose it must.

strick: Indeed it does.

DKF: Compilation debugging is now (8.4) enable-able using a suitable flag to configure, at least on UNIX.

Disabling Compilation of User-Defined Procedures

For Tcl 8.6, there is a one-line patch that will disable all byte compiling except for expressions and substitutions. In TclCompileScript in tclCompile.c, find this code:

if ((cmdPtr != NULL)
          && (cmdPtr->compileProc != NULL)
          && !(cmdPtr->nsPtr->flags&NS_SUPPRESS_COMPILATION)
          && !(cmdPtr->flags & CMD_HAS_EXEC_TRACES)
          && !(iPtr->flags & DONT_COMPILE_CMDS_INLINE)) {
      int code, savedNumCmds = envPtr->numCommands;
      unsigned savedCodeNext =
              envPtr->codeNext - envPtr->codeStart;
      int update = 0;

and change that condition to

if ( 0 && (cmdPtr != NULL)

APN 2014-10-08 The following Tcl'ers chat excerpt from Miguel cast further light on the topic.


clarifying: a 'bcc'ed command' (typical example: set) is a command that has a compileProc, and whose implementation is inlined in the bytecodes of any proc-body that invokes it

when you define 'proc foo $arglst $body', the body is converted to a series of bytecodes before the first run - and bcc;ed commands invoked in $body are inlined in the sequence of bytecodes

however, [foo] itself is NOT bytecoded - any other proc that invokes foo in its body will get bytecodes of the type 'invoke foo', and not an inlined version of foo's body (plus an inlined version of scope creation, argument handling, scope destruction)

miguel trying to clarify what he meant, which surprised at least apn and jima

so we are using "bcc'ed" with two different meanings: 1) a command is bcc'ed if it produces bytecodes that are inlined in it's caller; 2) a script (eg, a proc body) is bcc'ed if it is translated to a series of bytecodes before it runs

jima ahhhhh got you miguel, thanks

'proc foo $arglst $body' hence cause the script $body to be bcc'ed, but not the command [foo]

suchenwi too

it is also true that 8.* implements proc so that the proc body is necessarily bytecompiled - but this is an implementation detail

the definition of proc just says that on invocation: 1) a local scope will be created for the local vars; 2) some local vars (the formal arguments) will be initialized with values coming from the actual arguments; 3) the body will be run as a script in that local scope, returning its last result; 4) the local scope will be destroyed (or at least become meaningless - destruction is there just to avoid leaks)