** Loop behaviour ** Playing with the bytecode stepper in [Visual TAL], I was a bit surprised to see the bytecode for [foreach] invoking [llength] at every iteration. This made me wonder about the performance of common looping constructs, and somewhat surprising results. As with any microbenchmark, this is probably measuring the wrong thing. *** Specification *** The obvious question is, "how does `foreach` behave if you modify the list under it?". The answer is conveniently omitted from the `foreach` manual, but provided clearly enough by the dodekalogue: `foreach` has its list arguments substituted before it is called (in other words: passed by value) so it cannot be affected by mutations. *** Setup *** Making big lists is easy with `split [string repeat ...`, but I have this in my prelude to start with: ====== proc range {from to} { if {$to<$from} { foreach from to $to $from } set res {} while {$from<=$to} {lappend res $from; incr from} set res } ====== And our test var. I initialise it this way because if pasting in the shell, you don't really want to see the value of `x` echoed back at you. ====== set x [range 0 1e6]; llength x ====== (side note: will such a list always have an odd number of elements? I don't know.) Here are the procs we want to test: ====== proc f {x} { foreach i $x { incr j } set j } proc g {x} { for {set a 0} {$a<[llength $x]} {incr a} { incr j } set j } proc h {x} { set l [llength $x] for {set a 0} {$a<$l} {incr a} { incr j } set j } ====== *** (micro)Benchmarking *** It turns out that iterating over a million-and-one element list is a bit slow on my netbook, so initial timing results are of a small sample: tclsh-8.6.0 % time {f $x} 10 805257.6 microseconds per iteration tclsh-8.6.0 % time {g $x} 10 502229.6 microseconds per iteration tclsh-8.6.0 % time {h $x} 10 285997.6 microseconds per iteration Striking. Of course this benchmark isn't exactly fair, since [foreach] is accessing the contents of the list as well as iterating over it. Suitably handicapping `g` and `h` is easy: ====== rename g g_ proc g {x} { for {set a 0} {$a<[llength $x]} {incr a} { set i [lindex $x $a] incr j } set j } rename h h_ proc h {x} { set l [llength $x] for {set a 0} {$a<$l} {incr a} { set i [lindex $x $a] incr j } set j } ====== This brings the results closer to [foreach], but the same performance degradation is apparent: tclsh-8.6.0 % time {g $x} 10 688677.9 microseconds per iteration tclsh-8.6.0 % time {h $x} 10 460473.2 microseconds per iteration Curiously, `foreach` is still the slowest version. I hope this is an artifact of the system I am testing on. *** Disassembly *** Using [disassemble] on `g`, `h` and their predecessors reveals nothing unexpected. Looking at `f` though: Command 1: "foreach i $x {incr j}" (0) loadScalar1 %v0 # var "x" (2) storeScalar1 %v1 # temp var 1 (4) pop (5) foreach_start4 0 [data=[%v1], loop=%v2 it%v1 [%v3]] (10) foreach_step4 0 [data=[%v1], loop=%v2 it%v1 [%v3]] (15) jumpFalse1 +17 # pc 32 Command 2: "incr j" (17) startCommand +12 1 # next cmd at pc 29 (26) incrScalar1Imm %v4 +1 # var "j" (29) pop (30) jump1 -20 # pc 10 (32) push1 0 # "" (34) pop Command 3: "set j" (35) startCommand +11 1 # next cmd at pc 46 (44) loadScalar1 %v4 # var "j" (46) done we see bytecode entries for `foreach_start4` and `foreach_step4`. It turns out my initial data was wrong: [dis2asm] unrolls these instructions into similar code that we see in `g`'s loop. Indeed these are bytecodes: in `tclExecute.c` you will find about 100 lines of C at jump label `INST_FOREACH_STEP4`, confirming that it is taking the length of every list at every iteration. *** Ruminations *** I don't really know what to draw from this, but it raises some questions. Is this measurement meaningful? If so, why is `foreach` measurably slower than a loop which apparently does the same thing? (I am reminded of the 386's `loop` opcode, which was significantly slower than the unrolled combination `dec cx; jnz` leading to its disuse outside of 4k demos). Is this an indication that rather than more bytecode instructions, what is really needed to boost Tcl performance is commands to support script-level interaction with the compiler? ... I'm dreaming about lisp macros and stuff like [Sugar]). ---- [RS] 2013-12-04 Thanks for trying [Visual TAL]! As I wrote in [dis2asm does macros], I could not find [TAL] instructions corresponding to foreach_start and foreach_step in tcl::unsupported::[assemble], so the "macro expansion" I concocted was a workaround. In Tcl [list] objects, the length is a data field (just as strings know their [string length]), so it does not cost much time to retrieve that. My first attempt had a separate variable for the length, but I was then sure it is not needed. [DKF]: Adding the `foreach*` instructions is one of the TODOs, but they've got some weird side-conditions on them that would need modelling. (Also, TAL can't currently construct auxiliary data blobs, which the foreach instructions use.) The real complexity of the foreach stuff is there to handle the multi-variable multi-list cases. [RS] True... [dis2asm] already fails horribly on the simple variation proc f x {set res {};foreach {i j} $x {if {$i>0} {lappend res $i}};set res} <> Performance