See also [Tcl IO performance], where we reached the '''Interim conclusion''' The time spent in the original benchmark seems to be decomposable in * 8% in startup and loop overhead (0.19/2.5) * 20% in command call and setting a variable in the called command (0.50/2.50) ''Further experimentation with a C-coded command that just returns TCL_OK shows that the cost of setting the variable is almost negligible'' * 72% doing the file access, reading and conversion within [gets] itself * the cost of setting the variable in the called command seems to be below the measurable threshold for the larger dataset (read1a actually faster than read2 - the difference ''must'' be noise) ---- Further experiments * on a core instrumented with code developed together with [GPS] [https://sourceforge.net/tracker/index.php?func=detail&aid=1828178&group_id=10894&atid=310894] * running a tiny do-almost-nothing script (see below) * using a command '''empty''' defined in C to just ''return Tcl_Ok;'' provide the following timings for the most-exercised opcodes (measured in cpu ticks at 1.6MHz): ====== 8.5a6 time: 1822167696, count: 20004698 ---------------------------------------------------- op %T avgT %ops Nops 6 24.53 446.87 5.00 1000323 INST_INVOKE_STK1 80 22.68 413.32 5.00 1000019 INST_LIST_INDEX 29 10.04 91.47 10.00 2000048 INST_INCR_SCALAR1_IMM 105 9.77 44.52 20.00 4000452 INST_START_CMD 103 7.58 138.16 5.00 1000004 INST_LIST_INDEX_IMM 10 7.20 26.22 25.00 5000750 INST_LOAD_SCALAR1 17 5.78 52.67 10.00 2000236 INST_STORE_SCALAR1 47 5.23 95.32 5.00 1000001 INST_LT 1 4.19 38.11 10.00 2001039 INST_PUSH1 3 2.95 53.83 5.00 1000302 INST_POP ====== '''Striking observations''' (which require an explanation) Taking the fastest opcode as comparison basis: 1. INST_LOAD_SCALAR1 is the fastest opcode (faster that INST_PUSH1 and INST_POP!), INST_STORE_SCALAR1 is pretty fast too 1. [lindex] is amazingly slow - the non-immediate version is as slow as a command invocation 1. command invocations are expensive (remark that only '''empty''' is invoked in the loop) 1. comparisons (INST_LT) and basic arithmetic (INST_INCR_SCALAR_IMM1) are amazingly slow when compared to basic variable access 1. the "pure loss" INST_START_CMD is amazingly slow (do note that INST_LOAD_SCALAR1 provides a ''generous'' upper bound on the cost of opcode dispatch - and the possible savings when improving that part only) ---- The script being run is ====== lappend auto_path /home/CVS/emptyFunc/ package require empty exec /usr/bin/taskset -p 0x00000001 [pid] proc main N { set y 0 set a [list foo boo moo] for {set i 0} {$i < $N} {incr i} { empty 1 incr y set z [lindex $a 1] set z 1 lindex $a $z } } if {[llength $argv]} { main [lindex $argv 0] } else { main 1000000 } ====== ---- 2007-11-09 Committed patch #1829248 [https://sourceforge.net/tracker/index.php?func=detail&aid=1829248&group_id=10894&atid=310894] that caches some frequently accessed TSD fields in the [ekeko]. '''INST_INVOKE_STK1 goes down from 503 ticks to 306''' (same measurements as above, did it get slower in the meantime??). ---- !!!!!! %| [Category Internals] | [Category Performance] |% !!!!!!