PURPOSE: Provide a working example for Tcl maintainers attempting to add bytecode compilation of a new Tcl command.
Background: Kevin Kenny has been working on a new lset command that makes surgical modifications to Tcl lists. The target is to get decent performance of operations such as Shuffle a list without needing to resort to mysterious usage such as the K combinator.
The lset command takes the syntax:
In its simplest form,
lset myList $i $x
it says, "replace the $i'th element of variable myList with the content of variable x.
Thus, if myList contains:
a b c d
after executing
lset myList 1 e
the variable will contain
a e c d
Additional index parameters index into sublists. For example, if myList contains:
{a b c} {d e f} {g h i}
and we execute
lset myList 1 1 j
then after the operation, myList will contain
{a b c} {d j f} {g h i}
It is an error to address outside any list: each index argument must be at least zero and less than the length of the corresponding list.
The first cut at [lset] is to implement it as an object command. The general flow is:
The C code for the object command is:
int Tcl_LsetObjCmd( clientData, interp, objc, objv ) ClientData clientData; /* Not used. */ Tcl_Interp *interp; /* Current interpreter. */ int objc; /* Number of arguments. */ Tcl_Obj *CONST objv[]; /* Argument values. */ { Tcl_Obj* listPtr; /* Pointer to the list being altered. */ Tcl_Obj* subListPtr; /* Pointer to a sublist of the list */ Tcl_Obj* finalValuePtr; /* Value finally assigned to the variable */ int index; /* Index of the element being replaced */ int result; /* Result to return from this function */ int listLen; /* Length of a list being examined */ Tcl_Obj** elemPtrs; /* Pointers to the elements of a list being examined */ int i; /* Check parameter count */ if ( objc < 4 ) { Tcl_WrongNumArgs( interp, 1, objv, "listVar index ?index...? value" ); return TCL_ERROR; } /* Look up the list variable */ listPtr = Tcl_ObjGetVar2( interp, objv[ 1 ], (Tcl_Obj*) NULL, TCL_LEAVE_ERR_MSG ); if ( listPtr == NULL ) { return TCL_ERROR; } /* Make sure that the list value is unshared. */ if ( Tcl_IsShared( listPtr ) ) { listPtr = Tcl_DuplicateObj( listPtr ); } finalValuePtr = listPtr; /* ** If there are multiple 'index' args, handle each arg except the ** last by diving into a sublist. */ for ( i = 2; i < objc-2; ++i ) { /* ** Take apart the list */ result = Tcl_ListObjGetElements( interp, listPtr, &listLen, &elemPtrs ); if ( result != TCL_OK ) { return result; } /* ** Derive the index of the requested sublist */ result = TclGetIntForIndex( interp, objv[i], (listLen - 1), &index ); if ( result != TCL_OK ) { return result; } if ( ( index < 0 ) || ( index >= listLen ) ) { Tcl_SetObjResult( interp, Tcl_NewStringObj( "list index out of range", -1 ) ); return TCL_ERROR; } /* ** Extract the appropriate sublist, and make sure that it ** is unshared. */ subListPtr = elemPtrs[ index ]; if ( Tcl_IsShared( subListPtr ) ) { subListPtr = Tcl_DuplicateObj( subListPtr ); result = Tcl_ListObjSetElement( interp, listPtr, index, subListPtr ); if ( result != TCL_OK ) { return TCL_ERROR; } } else { Tcl_InvalidateStringRep( listPtr ); } listPtr = subListPtr; } /* Break apart the innermost sublist */ result = Tcl_ListObjGetElements( interp, listPtr, &listLen, &elemPtrs ); if ( result != TCL_OK ) { return result; } /* Convert the last index */ result = TclGetIntForIndex( interp, objv[objc-2], (listLen - 1), &index ); if ( result != TCL_OK ) { return result; } if ( ( index < 0 ) || ( index >= listLen ) ) { Tcl_SetObjResult( interp, Tcl_NewStringObj( "list index out of range", -1 ) ); return TCL_ERROR; } /* Store the result in the list element */ result = Tcl_ListObjSetElement( interp, listPtr, index, objv[objc-1] ); if ( result != TCL_OK ) { return result; } /* Finally, update the variable so that traces fire. */ listPtr = Tcl_ObjSetVar2( interp, objv[1], NULL, finalValuePtr, TCL_LEAVE_ERR_MSG ); if ( listPtr == NULL ) { return TCL_ERROR; } return result; }
This code is much cheaper (30-50% reduction in CPU time on the Shuffle a list trials) than the corresponding logic using the K combinator, but we can do much better still using the bytecode compiler. The chief reason is that the calls to Tcl_ObjGetVar2 and Tcl_ObjSetVar2 are fairly expensive lookups in a hash table, while local variables in the stack frame can be compiled to lookups that operate in constant time. Let's sketch out what the bytecode will have to look like for
lset varName ind1 ind2 value
To begin with, all of the parameters will have to go on the stack, except possibly for the variable. The variable name may or may not go on the stack, depending on whether it uses a frame slot.
In order for traces to fire in order, the parameters must be substituted from left to right. Thus far, then, the compilation procedure for lset has to:
So it appears that we need a new compilation procedure for the lset command, plus two new bytecode instructions: INST_STKINDEX and INST_LSET.
How to add a byte-compiled command:
Most examples are found in tclCompCmds.c, which is the best place to start for examples. Files that must be edited to add a byte-compiled command are (this applies for core commands):
At this time, there is no way to add a byte-compiled command outside of the core.
DKF - if what you're doing requires a new bytecode to compile, then it is probably never going to be practical to do as an extension since the list of core bytecodes is definitely growing. Mind you, if you can make a decent (and generic) case for it, getting a new bytecode into the core is not hard.
KBK - Donal is right, of course; I wrote this piece primarily as a guide for Core maintainers.
Nevertheless, I think there's an important nugget of truth on this page.
There are only a couple of reasons that a command sees a BIG improvement by being bytecode compiled. One is that it's a control structure, and the gains achieved by compiling its dependent scripts inline are significant. The other, and commoner, reason is that it performs some sort of read-modify-write function on a Tcl variable. Examples include [append], [incr], [lappend], and [lset].
I think there's a generic framework that could be exported for these.
Here's a brief explanation of my thoughts with illustrations in Tcl.
We begin with Donal's K combinator:
proc K { x y } { return $x }
Now consider a procedure, [read_modify_write] that accepts a variable name and a function (actually, an incomplete command). It appends the content of the variable to the command using K to do a destructive read. It evaluates the result, and replaces the variables value by the result.
proc read_modify_write { v f } { set cmd1 "K \$[list $v] [list [list set $v {}]]" set cmd2 "$f \[$cmd1\]" set cmd3 "set [list $v] \[$cmd2\]" # Print a trace message to show the command being executed. puts stderr "[info level 0] results in [list $cmd3]" return [uplevel 1 $cmd3] }
What's this good for? Let's do [append] using this command:
proc catenate { y x } { return $x$y } proc Append { xVar args } { upvar 1 $xVar x foreach a $args { read_modify_write x [list catenate $a] } return $x }
And we could do the same with [incr]:
proc y+ { y x } { return [expr { $x + $y }] } proc Incr { xVar { delta 1 } } { upvar 1 $xVar x read_modify_write x [list y+ $delta] }
Or let's do an [Lpop] command that's something like a simplified [lvarpop]: it pulls an element off the front of a list.
proc delete_first { x } { return [lrange $x 1 end] } proc Lpop { listVar } { upvar 1 $listVar q set retval [lindex $q 0] read_modify_write q delete_first return $retval }
Now presume an extension that wanted to bytecode [Lpop]. If we think of [Lpop] as being defined in terms of [delete_first], then the extension can actually bytecode it without introducing any new bytecodes -- assuming that we add Forth-like "exch" and "roll" to the engine. For example, if x is not a local variable, the code burst would look like:
push1 "x" # Get x's name over 0 # Duplicate the name loadScalarStk # Get x's value dup push1 "0" lindexMulti 1 # Extract first element. # at this point stack has retval, $x, "x" exch # now it's $x, retval, "x" push1 "1" push1 "delete_first" invokeStk1 2 # Call 'delete_first' with objc==2 # Now the stack has new_x, retval, "x" roll 3 # "x", new_x, retval exch # new_x, "x", retval storeScalarStk # new_x, retval pop # retval
Now the [delete_first] command has to assume that nobody calls it from outside the [Lpop] procedure: it probably suffices to hide it inside a package. On entry, it knows that the refcount of objv[1] is at least 2: one reference for objv itself, and one reference for the variable. It panics if this condition does not hold, and otherwise decrements the refcount. On return, it increments the count again. In between, Tcl_IsShared gives the correct information.
There's no doubt plenty wrong with this idea, but I think that it could be made to work -- and provide a very generic framework for a certain flavor of bytecoded extension.
AK From discussion on the chat it becomes clear that the framework discussed here would actually be a public API to the compilation environment used by compilation procedures. This would allow extensions to emit bytecodes for their own commands, provided that do not try to declare bytecodes beyond the ones defined in the core. This API would essentially be a low-level bytecode assembler. This meshes well with MS's bytecode engine ideas where he proposes a Tcl Assembly Language (TAL) at the Tcl level. It would simply be a reflection of the lowe-level API up into the tcl level.
DKF 12Dec02 - Just a quick note to people writing bytecoded implementations of structural commands (i.e. those that contain pieces of code that you want to have compiled into the flow of bytecodes; for, if, switch, and while are all structural commands); when you hand off a piece of source for sub-compilation, you must make sure that it is a substring of the string that you yourself are part of (e.g. by handing in one of your parse tokens). Fail to do this (e.g. by passing in a copy instead) and you will get strange and very hard to debug crashes.
DKF 13Sep03 - A few extra notes about the extra non-obvious things to watch out for when writing a new bytecoded expr operator. All new bytecodes must be inserted at the end of the bytecode list to prevent breaking of tbcload. Because that means that expr operator bytecodes will not be contiguous, you must also update IllegalExprOperandType() in tclExecute.c to get the operator string right. Apart from that, adding a new operator should be fairly straight-forward if you grok expr's type-promotion rules.
DKF 16Nov07 - People wishing to experiment should get a copy of Tcl 8.5 and use tcl::unsupported::disassemble.