See also Tcl IO performance, where we reached the
Interim conclusion The time spent in the original benchmark seems to be decomposable in
Further experiments
provide the following timings for the most-exercised opcodes (measured in cpu ticks at 1.6MHz):
8.5a6 time: 1822167696, count: 20004698 ---------------------------------------------------- op %T avgT %ops Nops 6 24.53 446.87 5.00 1000323 INST_INVOKE_STK1 80 22.68 413.32 5.00 1000019 INST_LIST_INDEX 29 10.04 91.47 10.00 2000048 INST_INCR_SCALAR1_IMM 105 9.77 44.52 20.00 4000452 INST_START_CMD 103 7.58 138.16 5.00 1000004 INST_LIST_INDEX_IMM 10 7.20 26.22 25.00 5000750 INST_LOAD_SCALAR1 17 5.78 52.67 10.00 2000236 INST_STORE_SCALAR1 47 5.23 95.32 5.00 1000001 INST_LT 1 4.19 38.11 10.00 2001039 INST_PUSH1 3 2.95 53.83 5.00 1000302 INST_POP
Note: later measurements were done on 5M instead of 1M runs. If you want to compare the total runtime, this first batch should be considered to take 9110838480 clicks (1822167696*5)
(These require an explanation)
Taking the fastest opcode as comparison basis:
(do note that INST_LOAD_SCALAR1 provides a generous upper bound on the cost of opcode dispatch - and the possible savings when improving that part only)
The script being run is
lappend auto_path /home/CVS/emptyFunc/ package require empty exec /usr/bin/taskset -p 0x00000001 [pid] proc main N { set y 0 set a [list foo boo moo] for {set i 0} {$i < $N} {incr i} { empty 1 incr y set z [lindex $a 1] set z 1 lindex $a $z } } if {[llength $argv]} { main [lindex $argv 0] } else { main 1000000 }
2007-11-09 Committed patch #1829248 [L2 ] that caches some frequently accessed TSD fields in the ekeko. INST_INVOKE_STK1 goes down from 503 ticks to 306 (same measurements as above, did it get slower in the meantime??). New timings:
8.5a6 time: 8187552172, count: 100004702 ---------------------------------------------------- op %T avgT %ops Nops 80 25.07 410.45 5.00 5000019 INST_LIST_INDEX 6 18.63 305.00 5.00 5000323 INST_INVOKE_STK1 29 11.37 93.12 10.00 10000048 INST_INCR_SCALAR1_IMM 105 9.67 39.58 20.00 20000453 INST_START_CMD 10 8.20 26.85 25.00 25000750 INST_LOAD_SCALAR1 47 7.35 120.43 5.00 5000001 INST_LT 17 6.07 49.74 10.00 10000236 INST_STORE_SCALAR1 103 5.26 86.07 5.00 5000005 INST_LIST_INDEX_IMM 1 5.11 41.86 10.00 10001040 INST_PUSH1 3 3.26 53.38 5.00 5000302 INST_POP
2007-11-12 A few commits after that: new stack check approach, lindex changes in tebc, improved usage of objType to prevent unneeded C calls, shortcut path for reading ints and indices.
Note: first posting here was better, but a wrong measurement. Sorry.
8.5a6 time: 6083734760, count: 100004702 ---------------------------------------------------- op %T avgT %ops Nops 6 20.32 247.27 5.00 5000321 INST_INVOKE_STK1 29 14.53 88.42 10.00 10000048 INST_INCR_SCALAR1_IMM 105 13.34 40.59 20.00 20000453 INST_START_CMD 10 11.54 28.08 25.00 25000750 INST_LOAD_SCALAR1 17 9.41 57.26 10.00 10000236 INST_STORE_SCALAR1 47 7.75 94.26 5.00 5000001 INST_LT 103 6.41 77.97 5.00 5000005 INST_LIST_INDEX_IMM 80 6.26 76.20 5.00 5000019 INST_LIST_INDEX 1 5.90 35.90 10.00 10001040 INST_PUSH1 3 4.51 54.84 5.00 5000302 INST_POP
By selective removing or duplicating code I estimate that, out of the 250 clicks of INST_INVOKE_STK1 (TEOvI), about 35 each are consumed by Tcl_GetCommandFromObj and Tcl_ResetResult. The trace code (surprisingly?) seems not to make any measurable difference.
Note that commandReform (reducing the cost of Tcl_GetCommandFromObj) has at best a hope of shaving 20 clicks, less than 10% of the total.
Tcl_ResetResult could conceivably be eliminated altogether, but this entails a change in the API: commands are not guaranteed to receive a clean objResult, errorCode, errorInfo or other return options. As they would have to reset the result themselves (when calling Tcl_SetObjResult) , most of the apparent win is not really there: we'd just be saving a Tcl_NewObj/Tcl_IncrRefCount/Tcl_DecrRefCount stanza. This one could conceivably be saved by allowing iPtr->objResultPtr to be NULL (and Tcl_GetObjResult fill it in if necessary), but that will break the deprecated but still used idiom of appending to the result Tcl_Obj. Small win, big pain - a no-go.