GPS Oct 19, 2003 - I've been thinking about how to build a Tcl compiler that allows for efficiently storing the compiled structure to the disk and also provides fast execution. I think this might work.
Rather than using a parse-tree a flat array of Tcl_Objs is used with an ordered list for execution. A LIFO stack is used to find the proper order of execution at compile time.
I propose the functions Tcl_Compile() and Tcl_Run() for dealing with these data structures.
Consider the command:
set v [foo bar [set bar 20] [baz $bar]]
In that example we first create an array of Tcl_Objs. The first element will be for a result of set v ...
[0 RESULT] [1 set]
We push the index of the set command, which in this case is 1.
LIFO stack[] = 1
We continue down the line adding objects for strings until we reach another branch.
[0 RESULT] [1 set] [2 v] [3 RESULT]
Now we are at a branch, where we have:
[foo bar [set bar 20] [baz $bar]]
We append another object for the foo command and push the index of the object, which in this case is 4.
[0 RESULT] [1 set] [2 v] [3 RESULT] [4 foo] [5 bar] LIFO stack[] = 1 4
Now we are at another branch for:
[set bar 20] [baz $bar]
We append a RESULT object and push an index for the set object.
[0 RESULT] [1 set] [2 v] [3 RESULT] [4 foo] [5 bar] [6 RESULT] [7 set] [8 bar] [9 20] LIFO stack[] = 1 4 7
9 was the end of a command ( ] ), so we pop one item off of the LIFO into our order list:
order list = 7
Now we just have the [baz $bar] to finish in a similar manner. We push the index of [baz $bar]'s baz onto the LIFO and we end up with:
LIFO stack[] = 1 4 11 [0 RESULT] [1 set] [2 v] [3 RESULT] [4 foo] [5 bar] [6 RESULT] [7 set] [8 bar] [9 20] [10 RESULT] [11 baz] [12 $bar] [13 END OF ALL]
The end of the command [baz $bar] is reached so we pop another item off the stack into our order list.
order list = 7 11
The end of the [foo ...] command is reached so we pop another and get:
order list = 7 11 4
and so forth until the command is complete
our order is: 7 11 4 1
Compare that order to these and you will see that it's the proper result:
[0 RESULT] [1 set] [2 v] [3 RESULT] [4 foo] [5 bar] [6 RESULT] [7 set] [8 bar] [9 20] [10 RESULT] [11 baz] [12 $bar] [13 END OF ALL]
One aspect which we haven't discussed is result transfer. Each command copies its result to the object behind its beginning. This makes it so that each command result works properly.
This approach allows us to write the Tcl_Obj structures to the disk more easily, and unlike a parse-tree it doesn't require many little mallocs, frees and evaluation at each branch.
Based on my somewhat naive assesment of the Tcl sources this design would require a new parser, and perhaps some changes to the Tcl_Obj structure or a new Tcl_ObjType. I also propose a preprocessor that would tranlate $foo to [set foo].
To improve performance each head of a command would also have an offset into an array that stores a function pointer. An alternative would be to use an epoch and do hash lookup, but to me that is overkill. The disadvantage of using an array offset is that to add new commands we would need to either increase the array's size and add more pointers, or replace existing offsets with new pointers and risk having some old commands call the new inappropriately.
What do you think?
DKF: I don't yet see my way through this idea far enough. OK, it's Monday morning and I'm not at my sharpest. If the fundamental execution model is stack-based bytecode (and that's an easy target to generate from a compiler), could we have a bit of an illustration of a few opcodes and their effect on the stack? And then perhaps a sample conversion of a Tcl command fragment into those bytecodes? That would help my wooly-thinking very much. Thanks! (And if you've already done some of that above, please forgive me. I'm just having problems seeing where this is all going. Once I 'get' it, I'll probably be able to offer much more useful comments...)
RS would like to know how this differs from the current bytecode compiler, and what advantages it would bring.
GPS: The idea is that rather than using a BCC we store the Tcl_Objs in a format that is more easily executed. To execute a command all we have to do is make pointers to the Tcl_Objs in the array, do lookup on the first object for the function pointer (or cache it), and call the function. Basically there is no stack used at runtime (only at Tcl_Compile() time to fetch the order list). I'll work on some examples.
To me the BCC is too complicated. MS and DGP seem to fix a BCC bug a week. It's kind of ridiculous. MS once called Tcl_ExecuteByteCode Medusa, because of the way it snakes around with goto and so on.
MS: I'm not sure I understand correctly, but (modulo possible storage details) I imagine this being the current bcc engine without bcc'ed commands - ie, where INST_INVOKE is the main player. I have thought that that would be a desirable approach for a while. Note that the function pointers are currently cached in the command name Tcl_Objs.
The example above would produce a bytecode stream in the flavor of:
[INST_PUSH "set"] [INST_PUSH "v"] [INST_PUSH "foo"] [INST_PUSH "bar"] [INST_PUSH "set"] [INST_PUSH "bar"] [INST_PUSH "20"] [INST_INVOKE 3] [INST_PUSH "baz"] [INST_LOAD_SCALAR "bar"] [INST_INVOKE 2] [INST_INVOKE 4] [INST_INVOKE 3]
where [INST_INVOKE x] uses the top x stack items to build a command, and pushes the command's result on the stack. Seems to me the idea proposed here is at least somewhat related to storing the object (pointer) list, and separately the instructions:
1. Object list {"set" "v" "foo" "bar" "set" "bar" "20" "baz" "bar"} 2. Instruction list: [PUSH 7] [INVOKE 3] [PUSH 1] [LOAD 1] [INVOKE 2] [INVOKE 4] [INVOKE 3]
which is to be compared to the new approach proposed in this page (as I understand it):
1. Object list {"set" "v" "foo" "bar" "set" "bar" "20" "baz" "bar"} 2. Invocation list (index of object command): {4 7 2 0}
Note: Is this understanding correct, modulo the missing RESULT objects and the not-handling of $bar (to be replaced by a call to set as explained above)?
Details I am missing in the proposed approach:
The problems I wouldn't know how to solve right away in the INVOKE-only-TEBC approach are performance related - and the reason behind the complexity of TEBC. TEBC mainly substitutes jumps for function calls, and indexed access to memory for local variable lookups by name. Some of the work required for caching variable pointers (similar to command name Tcl_Objs) has been started, but I did not manage to bring it to good port yet. Initial timings show a 10-20% core slowdown when using this approach. Replacing the expression operators with function calls (which I had not done) would produce further slowdowns.
GPS: The object pointers would be copied at runtime before a function call. It would use stack allocted memory if possible like the current scheme, and heap otherwise. The proper number of arguments would require another list/array to store the argument index of each object that is on the same level at compile time. Then we would iterate over that array and copy pointers based on that. This way we would avoid recursion and the performance problems that it introduces. More examples may come in the future...