**Summary** See also [Tcl IO performance], where we reached the '''Interim conclusion''' The time spent in the original benchmark seems to be decomposable in * 8% in startup and loop overhead (0.19/2.5) * 20% in command call and setting a variable in the called command (0.50/2.50) ''Further experimentation with a C-coded command that just returns TCL_OK shows that the cost of setting the variable is almost negligible'' * 72% doing the file access, reading and conversion within [gets] itself * the cost of setting the variable in the called command seems to be below the measurable threshold for the larger dataset (read1a actually faster than read2 - the difference ''must'' be noise) ---- **Full Results, v1** Further experiments * on a core instrumented with code developed together with [GPS] [https://sourceforge.net/tracker/index.php?func=detail&aid=1828178&group_id=10894&atid=310894] * running a tiny do-almost-nothing script (see below) * using a command '''empty''' defined in C to just ''return Tcl_Ok;'' provide the following timings for the most-exercised opcodes (measured in cpu ticks at 1.6MHz): ====== 8.5a6 time: 1822167696, count: 20004698 ---------------------------------------------------- op %T avgT %ops Nops 6 24.53 446.87 5.00 1000323 INST_INVOKE_STK1 80 22.68 413.32 5.00 1000019 INST_LIST_INDEX 29 10.04 91.47 10.00 2000048 INST_INCR_SCALAR1_IMM 105 9.77 44.52 20.00 4000452 INST_START_CMD 103 7.58 138.16 5.00 1000004 INST_LIST_INDEX_IMM 10 7.20 26.22 25.00 5000750 INST_LOAD_SCALAR1 17 5.78 52.67 10.00 2000236 INST_STORE_SCALAR1 47 5.23 95.32 5.00 1000001 INST_LT 1 4.19 38.11 10.00 2001039 INST_PUSH1 3 2.95 53.83 5.00 1000302 INST_POP ====== ''Note'': later measurements were done on 5M instead of 1M runs. If you want to compare the total runtime, this first batch should be considered to take 9110838480 clicks (1822167696*5) ***Striking observations*** (These require an explanation) Taking the fastest opcode as comparison basis: 1. INST_LOAD_SCALAR1 is the fastest opcode (faster that INST_PUSH1 and INST_POP!), INST_STORE_SCALAR1 is pretty fast too 1. [lindex] is amazingly slow - the non-immediate version is as slow as a command invocation 1. command invocations are expensive (remark that only '''empty''' is invoked in the loop) 1. comparisons (INST_LT) and basic arithmetic (INST_INCR_SCALAR_IMM1) are amazingly slow when compared to basic variable access 1. the "pure loss" INST_START_CMD is amazingly slow (do note that INST_LOAD_SCALAR1 provides a ''generous'' upper bound on the cost of opcode dispatch - and the possible savings when improving that part only) ***Test Script*** The script being run is ====== lappend auto_path /home/CVS/emptyFunc/ package require empty exec /usr/bin/taskset -p 0x00000001 [pid] proc main N { set y 0 set a [list foo boo moo] for {set i 0} {$i < $N} {incr i} { empty 1 incr y set z [lindex $a 1] set z 1 lindex $a $z } } if {[llength $argv]} { main [lindex $argv 0] } else { main 1000000 } ====== ---- **Full Results, v2** 2007-11-09 Committed patch #1829248 [https://sourceforge.net/tracker/index.php?func=detail&aid=1829248&group_id=10894&atid=310894] that caches some frequently accessed TSD fields in the [ekeko]. '''INST_INVOKE_STK1 goes down from 503 ticks to 306''' (same measurements as above, did it get slower in the meantime??). New timings: ====== 8.5a6 time: 8187552172, count: 100004702 ---------------------------------------------------- op %T avgT %ops Nops 80 25.07 410.45 5.00 5000019 INST_LIST_INDEX 6 18.63 305.00 5.00 5000323 INST_INVOKE_STK1 29 11.37 93.12 10.00 10000048 INST_INCR_SCALAR1_IMM 105 9.67 39.58 20.00 20000453 INST_START_CMD 10 8.20 26.85 25.00 25000750 INST_LOAD_SCALAR1 47 7.35 120.43 5 5000001 INST_LT 17 6.07 49.74 10.00 10000236 INST_STORE_SCALAR1 103 5.26 86.07 5.00 5000005 INST_LIST_INDEX_IMM 1 5.11 41.86 10.00 10001040 INST_PUSH1 3 3.26 53.38 5.00 5000302 INST_POP ====== ---- **Full Results, v3** 2007-11-12 A few commits after that: new stack check approach, [lindex] changes in tebc, improved usage of ''objType'' to prevent unneeded C calls, shortcut path for resetting the interp's result in the "normal" state. ====== 8.5a6 time: 5863113191, count: 100004702 ---------------------------------------------------- op %T avgT %ops Nops 6 17.85 209.30 5.00 5000321 INST_INVOKE_STK1 29 15.22 89.24 10.00 10000048 INST_INCR_SCALAR1_IMM 105 14.24 41.74 20.00 20000453 INST_START_CMD 10 10.71 25.13 25.00 25000750 INST_LOAD_SCALAR1 17 10.23 59.99 10.00 10000236 INST_STORE_SCALAR1 47 8.23 96.49 5.00 5000001 INST_LT 103 6.53 76.56 5.00 5000005 INST_LIST_INDEX_IMM 80 6.39 74.98 5.00 5000019 INST_LIST_INDEX 1 6.06 35.55 10.00 10001040 INST_PUSH1 3 4.51 52.93 5.00 5000302 INST_POP ====== ---- !!!!!! %| [Category Internals] | [Category Performance] |% !!!!!!