Version 10 of A Tcl Compiler Idea

Updated 2003-10-20 10:32:25

GPS Oct 19, 2003 - I've been thinking about how to build a Tcl compiler that allows for efficiently storing the compiled structure to the disk and also provides fast execution. I think this might work.

Rather than using a parse-tree a flat array of Tcl_Objs is used with an ordered list for execution. A LIFO stack is used to find the proper order of execution at compile time.

I propose the functions Tcl_Compile() and Tcl_Run() for dealing with these data structures.

Consider the command:

 set v [foo bar [set bar 20] [baz $bar]]

In that example we first create an array of Tcl_Objs. The first element will be for a result of set v ...

 [0 RESULT] [1 set]

We push the index of the set command, which in this case is 1.

 LIFO stack[] = 1

We continue down the line adding objects for strings until we reach another branch.

 [0 RESULT] [1 set] [2 v] [3 RESULT] 

Now we are at a branch, where we have:

 [foo bar [set bar 20] [baz $bar]]

We append another object for the foo command and push the index of the object, which in this case is 4.

 [0 RESULT] [1 set] [2 v] [3 RESULT] [4 foo] [5 bar] 
 LIFO stack[] = 1 4

Now we are at another branch for:

  [set bar 20] [baz $bar]

We append a RESULT object and push an index for the set object.

 [0 RESULT] [1 set] [2 v] [3 RESULT] [4 foo] [5 bar] [6 RESULT] [7 set] [8 bar] [9 20]
 LIFO stack[] = 1 4 7

9 was the end of a command ( ] ), so we pop one item off of the LIFO into our order list:

 order list = 7

Now we just have the [baz $bar] to finish in a similar manner. We push the index of [baz $bar]'s baz onto the LIFO and we end up with:

 LIFO stack[] = 1 4 11

 [0 RESULT] [1 set] [2 v] [3 RESULT] [4 foo] [5 bar] [6 RESULT] [7 set] [8 bar] [9 20] [10 RESULT] [11 baz] [12 $bar] [13 END OF ALL]

The end of the command [baz $bar] is reached so we pop another item off the stack into our order list.

 order list = 7 11 

The end of the [foo ...] command is reached so we pop another and get:

 order list = 7 11 4 

and so forth until the command is complete

 our order is: 7 11 4 1

Compare that order to these and you will see that it's the proper result:

 [0 RESULT] [1 set] [2 v] [3 RESULT] [4 foo] [5 bar] [6 RESULT] [7 set] [8 bar] [9 20] [10 RESULT] [11 baz] [12 $bar] [13 END OF ALL]

One aspect which we haven't discussed is result transfer. Each command copies its result to the object behind its beginning. This makes it so that each command result works properly.

This approach allows us to write the Tcl_Obj structures to the disk more easily, and unlike a parse-tree it doesn't require many little mallocs, frees and evaluation at each branch.

Based on my somewhat naive assesment of the Tcl sources this design would require a new parser, and perhaps some changes to the Tcl_Obj structure or a new Tcl_ObjType. I also propose a preprocessor that would tranlate $foo to [set foo].

To improve performance each head of a command would also have an offset into an array that stores a function pointer. An alternative would be to use an epoch and do hash lookup, but to me that is overkill. The disadvantage of using an array offset is that to add new commands we would need to either increase the array's size and add more pointers, or replace existing offsets with new pointers and risk having some old commands call the new inappropriately.

What do you think?


DKF: I don't yet see my way through this idea far enough. OK, it's Monday morning and I'm not at my sharpest. If the fundamental execution model is stack-based bytecode (and that's an easy target to generate from a compiler), could we have a bit of an illustration of a few opcodes and their effect on the stack? And then perhaps a sample conversion of a Tcl command fragment into those bytecodes? That would help my wooly-thinking very much. Thanks! (And if you've already done some of that above, please forgive me. I'm just having problems seeing where this is all going. Once I 'get' it, I'll probably be able to offer much more useful comments...)

RS would like to know how this differs from the current bytecode compiler, and what advantages it would bring.