[DKF]:  The aim of this page is to collect together stuff on how to make the
pips squeak in Tcl.  Anything for making the code faster (such as better idioms
for compilation, discussions on reference counting, etc.) is fine, though long
stuff should probably be shunted off somewhere else and only a link to it left
here.


** See Also **

   [Can you run this benchmark 10 times faster]:   

   [Why Tcl is so much slower than Perl]:   

   [speed issues]:   

   [copy-on-write]:   Essential information about Tcl's underpinnings

   [Performance of networking applications]:   

   [Tcl Socket Performance]:   

   [Tk Performance]:   dependent upon Tcl Performance, but has its own set of considerations

   [How to Measure Performance]:   how to avoid the common pitfalls when performing your own tests

   [Compact Data Storage]:   tips for conserving memory

   [Tcl Performance: catch vs. info]:   

   [Comparing Performance of Tcl OO extensions]:   a first attempt at quantifying performance of some of the OO extensions available in the community.  The source is available so that others can implement the code in their favorite OO extension and update the page.

   [Cameron Laird%|%Cameron Laird's] [http://phaseit.net/claird/comp.lang.tcl/tcl_performance.html%|%Personal Notes on Crafting Tcl for Performance]:   

   [timers]:   

   [tclprof]:   

   [Dynamic procs as performance monitors]:   

   [Performance of Various Stack Implementations]:   

   [PerfSuite]:   

   [Computer Language Benchmarks Game]:   

   [Tcl Benchmarks]:   a benchmark suite by [Jeffrey Hobbs] that demonstrates performance changes over time

   [https://github.com/flightaware/tclbench%|%Tcl benchmarks], by [flightaware]:       

   [Radical reform of the execution engine]:   A hot topic in the hallways and bars at the 2007 Tcl conference, including the idea that Tcl could compile down to machine code.

   [http://ptolemy.eecs.berkeley.edu/tycho/tycho0.2.1/tycho0.2.1/doc/coding/performance.html%|%Tycho documentation: performance]:   mentions some Performance Tools and Performance Hints (taken from the first edition of [BOOK Practical Programming in Tcl and Tk, Second edition]). [RWT]: These hints apply primarily to pre-8.0 Tcl.  [AMG]: This link is a 404.  The document you're seeking is inside this tar.gz file: [http://ptolemy.eecs.berkeley.edu/tycho/tycho0.2.1/tycho0.2.1.src.tar.gz].


** Articles about Tcl Performance **

   [http://playcontrol.net/opensource/LuaHashMap/benchmarks.html%|%Hash Table Shootout 2: Rise of the Interpreter Machines] Eric Wing, 2012-12-23:   Eric tries to figure out why the Tcl hash table implementation is beating the pants off everything else.

   [http://raid6.com.au/~onlyjob/posts/arena/%|%Perl, Python, Ruby, PHP, C, C++, Lua, tcl, javascript and Java comparison]:   criticism of this write-up includes that the benchmarks assume naive programmers for the language, making them useless [http://lua-users.org/lists/lua-l/2013-07/msg00545.html%|%1].

   [http://www.reddit.com/r/programming/comments/68280/tcl_regex_implementation_beats_whole_competition/%|%TCL regex implementation beats whole competition. Even C with PCRE and Boost regex. (shootout.alioth.debian.org)], 2008-02-07:    

   [http://www.cs.bell-labs.com/cm/cs/who/bwk/interps/pap.html%|%Timing Trials, or, the Trials of Timing: Experiments with Scripting and User-Interface Languages], [Brian Kernighan], 1998:   older information, but still interesting for its analyses and for the history


** Before anything else, time it! **

`[time]` can help answer questions like "Is this code faster than that code?"


   [Counting Elements in a List]:   includes a basic timing framework.

   `[clock clicks]`:   provides a platform-dependent time stamp.  Useful when measuring, e.g., event-driven Tk applications

   [Profiling Tcl]:   how to profile applications to get a breakout of where the time is being spent

   [How to Measure Performance]:   


** `[return]` **

In versions of Tcl prior to 8.4, using [return] to return a value was slower
than simply arranging for the value to be the last thing in the procedure body.
8.4 corrected this.  See [Tcl 8.0 performance advice] for the discussion, which
was deleted from this page because it has been inaccurate for several years.

[PSE]: '''Wait a minute!'''  What EXACTLY do you mean here??  Lets see what I
get:

======none
#! /bin/env tclsh
proc foo1 {} {return foo}
proc foo2 {} {uplevel set bar foo}
proc foo3 {} {set ::bar foo}
puts [time {foo1} 1000]
set bar {}
puts [time {foo2} 1000]
puts [time {foo3} 1000]
======

======none
31 microseconds per iteration
72 microseconds per iteration
15 microseconds per iteration
======

Hmmm... Tcl7.5 could do a return in 5 us.  What happened?

[DKF]:  The machinery to implement `[return]` is not all that simple, and has a
fair old cost associated with it.  However, in your code above you should be
aware that the first example has completely different semantics to the other
two (returns a value, as opposed to setting a variable in the
surrounding/global context.)  Compare your first example with:

======
proc foo {} {set foo foo}
======

However, my workstation is sufficiently fast that measuring the difference
between these is impossible (the margin of error is just too high!)

[PSE]:  Ack.  Yes.  Same here.  I see today that it is impossible to do
meaningful benchmarks unless they are run as a script that runs the compared
code in as rapid succession as possible, too much network related interference
to make any sense of the results otherwise.  Guess I need to do all my
benchmarking on my Linux box at home and just figure that the sun boxes will
give similar ''relative'' performance

I have my doubts about this ''optimisation''. What I get (NT4.0, TclPro1.2) is:

======none
% proc foo1 {} {return foo}
% proc foo2 {} {set foo foo}
% time {foo1} 1000000
4 microseconds per iteration
% time {foo2} 1000000
3 microseconds per iteration
======

jp:  Maybe this optimisation is obsolete, or its '''actual''' effect is too
light to be under consideration?

DKF:  Note that UNIX has a much smaller time granularity than Windows
(certainly 95/98 and quite probably NT too.)  The best way to benchmark a tiny
snippet of Tcl (especially where it is platform-independent code you're
testing) is to use a slow UNIX box.  The slowness of its operation allows for
sensitivity and the fine granularity of its timer lets you pick up what
differences there are.  And then you have to pray that the relative speeds
scale as you move to different platforms/architectures and/or faster machines.
Which is usually a good assumption if you're not shimmering between numbers and
strings (where performance seems to vary a lot according to the vendor's
implementation of the C library.)


** [Brace your expr-essions] **

See [Brace your expr-essions]

Use braces around all conditionals to get the best efficiency out of the
compiler.  In Tcl7, one might have

======
#warning bad code ahead!
if [...]
======

this should now be

======
if {[...]}
======

`[expr]` expressions [Brace your expr-essions%|%should also be braced].  There
will be a few cases where this won't work (building up exprs in a var).


** Constant Strings **

Try to only ever evaluate constant strings.  Where this is not possible, try to
make sure not to change any particular string very often.  Cache it if
possible.  Tk binding scripts are a prime example of how ''not'' to get maximum
speed out of Tcl8.

----

TFW 2001-01-31: Here's a surprising timing reported by Tom Wilkason,
<wilkason@hom.net>, on news:comp.lang.tcl on an Win 2K 375Mhz box (updated)

======none
# Q is a local drive mounted as a share
time {cd q:/utilities;cd [file dirname [pwd]]} 1000
2163 microseconds per iteration
time {cd q:/utilities;cd q:/} 1000
2083 microseconds per iteration
# This is the same dir as above but from the raw mount
time {cd d:/source/utilities;cd [file dirname [pwd]]} 1000
601 microseconds per iteration
time {cd d:/source/utilities;cd d:/source} 1000
551 microseconds per iteration
======

[DKF] 2000-02-14:

[http://www.ruthannzaroff.com/wonderland/curiouser.htm%|%Curiouser and
Curioser], since when I perform the timings with Tcl8.0.4 on my
ultrasparc/Solaris box, I get the following timings...

======none
% time {cd /home/fellowsd/arch; cd /home/fellowsd} 100000
48 microseconds per iteration
% time {cd /home/fellowsd/arch; cd [file dirname [pwd]]} 100000
129 microseconds per iteration
% time {cd /home/fellowsd/arch; cd [file join [pwd] ..]} 100000
131 microseconds per iteration
======

So things are not as straight-forward as they might seem.  (Side note: is NT
''really'' so slow at changing directories, or is it just that the particular
NT machine that the above test was done on is grossly underpowered anyway?)


** Trace Byte-Compiled Code **

[DKF]: You can find out just what the byte-code compiler is doing by activating
internal tracing.  You can set the Tcl variable '''tcl_traceExec''' to one of
the following values, and internal trace functions will generate messages on
stdout for every byte-code executed.

======none
0: no execution tracing
1: trace invocations of Tcl procs only
2: trace invocations of all (not compiled away) commands
3: display each instruction executed
======

You can set the variable '''`$tcl_traceCompile`''' to one of the following
values to get information during byte code compilation of a procedure or
toplevel command.  

======none
0: no compile tracing
1: one line summary
2: detailed listing of byte codes
======

This is documented in the
[http://www.tcl.tk/man/tcl/TclCmd/tclvars.htm%|%'''tclvars(n)''' reference],
right at the obttom of the 8.0 version.  Note that the output is always
generated on stdout at a very low level.  Mac people cannot do profiling
''(unless Apple fixes this in MacOSX)'' and Windows people will need to run
from a console ''(which might be a problem with wish, IIRC.)'' Still, something
that is only ever useful in some situations is almost always better than
nothing at all.

[DKF]: Also, in Tcl 8.5 you can use
'''`::tcl::unsupported::[disassemble]`''' to pull apart some code and show
the bytecodes. No guarantees about formatting, etc, but it is very convenient
''and'' returns the result as a string so you can do further processing, show
in a text widget, and so on.


** Unsharing Variable Values **

See `[K]`.


** Consolidate Trips to C-land **

This and the next few sections come from Jeff Hobbs:

RWT: Just because I want to be able to find this again later... I've copied
some [http://www.dejanews.com/getdoc.xp?AN=457938595%|%notes] from Jeff Hobbs.
(To be updated when he rewrites it. :-)


Compress multiple Tcl commands as often as possible.

This is based on the idea that getting into the C side of Tcl as often
as possible is the best way to go.  Or, avoid running through Tcl_Eval
as often as possible.  This is often best used by compounding string
calls into single regexps or magic use of regsubs.  A classic example
is the 9 line HTML parser from Stephen Uhler.


** Comparing Strings **

Use `[string compare ...]` when you want string comparisons in conditionals.

======
if {![string compare $a $b]} ...
======

is faster than

======
if {$a == $b}
======

but don't do this for numbers, because

======
0x3==3 is true
======

but

======
[string compare 0x3 3]==0
======

is not true (they aren't string equal)

Interestingly, in tcl8.2

======
if { [ string equal $a $b ] }
======

is always nearly twice as slow as the two above. But in tcl8.3

======
if { [ string equal $a $b ] }
if { $a == $b }
if { ! [ string compare $a $b ] }
======

are all within 4usec of eac other on solaris2.7, and 1usec on Windows NT.  On
solaris, the first two are always quicker than the last, and on Windows NT, the
middle is slower than the other two. YMMV. 

Warning!  Before Tcl 8.4.x, `[string] compare` has did not always return a
correct value!  In fixing that problem, string compare is now as much as 30%
slower than before.  If you are wanting to compare two strings accurately,
`[string equal]` is the best choice.

A [JH] example where `[string compare]` failed, which comes from the
[ActiveState] [ActiveTcl] mailing list:

======none
woset [~] 106 > echo 'puts [string compare \x00 \x01]' | tclsh8.3
1
woset [~] 107 > echo 'puts [string compare \x00 \x01]' | tclsh8.4
-1
======


** Be aware of [Tcl_Obj]'s **

don't change object representation without reason.

This starts with [Changes in Tcl/Tk 8.0%|%Tcl8.0] and the fact that lists and
strings no longer have equal internal representations.  Switching between these
comes at a cost, and it is often easy to avoid when you become "one with the
Tcl".  For example, many program Tcl procs with an 'args'.  This will come is
as a list to your proc.  Don't do

======
[string compare {} $args]
======

to see if it has something in it, instead do

======
[llength $args].
======

Doing the string compare would convert the `$args` list to a string, then back
to list when it was actually used, a phenomenon known as [shimmering].  Keep
this in mind for numbers as well, with the caveats of point 2.  For example,
`[incr]` is faster if the internal representation remains a number.


** [Changes in Tcl/Tk 8.0%|%Tcl8] lists provide a fast representation of 1D arrays **

`[lindex]` is now [http://en.wikipedia.org/wiki/Big_O_notation%|%O](1) on lists
that already have an internal [representation].  This also assists in things
like `[lrange]` and `[lreplace]`, and even `[lappend]`.  If your indices are
numeric, the list is significantly faster than arrays.

'''Note''': I believe it was Paul Duffin that has presented us with a full
Tcl_Obj'ified version of arrays, which will also speed up the array
implementation in Tcl8.

KBK: But be aware that you may inadvertently copy a [Tcl_Obj] if you're not
careful, and destroy the performance advantage.  See [Shuffle a list] for an
example.


** Walking Lists with `[foreach]` **

It's faster than `[for]` and `[lindex]`.  This effect is particularly
pronounced before [hanges in Tcl/Tk 8.0%|%8.0], since you only had to parse a
string into a list once, but it is still noticeable after.

======
set cities [list Rochester Minneapolis St.Paul Mankato Bemidji]

set ncities [llength $cities]
for {set i 0} {$i<$ncities} {incr i} {
    set city [lindex $cities $i]
    # This is slower
}

foreach city $cities {
    # This is faster, and more elegant
}
======

Even if you need to know the item index, the following `[foreach]` [idiom] is
faster than the `[for]` form:

======
set i -1; foreach city $cities {
    incr i
    # Normal body comes here
}
======

`[incr]` has to come first in the body if the body might `[continue]`, as the
index would otherwise not always be updated to match.

`[foreach]` can operate on several [list%|%lists] in parallel:

======
set temperatures [list -10 -12 -14 -18 -32]

foreach city $cities temp $temperatures {
     #  Fast, elegant, and even cool
}
# with those temperatures, its downright chilly - glennj
======


** Comments and Whitespace Have No Effect in Compiled Procedures **

Once a proc is first compiled (upon first use), any amount of whitespace and
commenting in it is removed in the compiled representation.  The load time for
heavily commented procs will be slightly longer (Tcl has to read all that in
before it knows what to avoid), but after its first call, Tcl will byte-compile
it, and use that rep.  Of course, if you keep changing the procs around, you're
going to have problems.  I believe there are also points like redefining
certain Tcl core commands that invalidate all byte-compiled proc
representations.


** Save Time with Extended `[foreach]` Syntax  **

Combining points "multiple commands" and "don't change objects", we can reach
the conclusion that

======
foreach {a b c d e} [cmdReturningList] {break}
======

is faster than

======
set list [cmdReturningList]
set a [lindex $list 0]
set b [lindex $list 1]
....
======

`[break]` is helpful to ensure that we don't actually do any loops in case
`cmdReturningList` is changed to return more than 5 args.  Note that when we
want the result from two similar calls, we get even better advantages out of:

======
foreach {a b c} [cmdRetList] {a' b' c'} [cmdRetList'] {break}
======

From [Changes in Tcl/Tk 8.5%|%8.5] onwards, you can use the more readable
`[lassign]`.

[AMG]: When 8.5 isn't available and I use `[foreach]` in this way, I usually
omit `[break]` when the list is guaranteed to iterate only once.


** Caching Array Variables Can Be a Win **

Yes, accessing stuff in the hash table does take a little longer, so if you
have a number that you want to loop 1000 times on, put it into a simple
variable first.


** Put Everything in a [proc] **

Inline Tcl code doesn't get all the optimizations that a [proc%|%procedure] can
get.  For example, this command outside a proc

======
for {set i 0} {$i<10000} {incr i} {lappend a $i}
======

takes three times longer (on my machine, running Tcl 8.0.4) than

======
proc init_me {} {
    global b
    for {set i 0} {$i<10000} {incr i} {lappend b $i}
}
init_me
======

The differences can be even more dramatic in Tcl 8.1, because the compiler was
split into a parser front-end and a compiler back-end.  Global code is just
parsed into tokens, similarly to the way that Tcl operated before 8.0 as this
offers a smaller performance hit when running scripts that only execute once
(notably Tk bindings, but many other kinds of callback come into this category
too.)  Only `[proc]` and `[apply]` bodies are fully compiled.


** Preset Variables for `[uplevel]` **

For `[uplevel]` and other commands that arrange to evaluate scripts, that
evaluation will have better performance if those scripts refer to variables
that already exist in the scope they are evaluated in.  Setting a variable to
the empty string, or declaring it as a namspace variable, is sufficient.

Reference: [MS], [Tcl Chatroom], 2015-12-10.

Illustration ($l is a list with 100000 elements):

======
proc p l {
   uplevel 1 [list foreach v $l {}]
}
proc q l {
   p $l
}
proc r l {
   set v {}
   p $l
}

% time {q $l} 10
32848.0 microseconds per iteration
% time {r $l} 10
27137.4 microseconds per iteration
======

Of course, this implies you know what variables the uplevel is going to
access.
** [Regular Expressions%|%Regular Expression] Tricks **

Regular expressions are expensive to compile, and non-constant `[regexp]`'s
must be compiled every time they are scanned for.

Suppose you want to match `a b c` with arbitrary (including none) amounts of
whitespace between the letters. In 8.0, the following is the most obvious way
to do it, but it is hard to write:

======
regexp "a\[ \t\n\r\]*b\[ \t\n\r\]*c\[ \t\n\r\]" $string
======

For clarity's sake it is far better to do:

======
set ws "\[ \t\n\r\]"
regexp "a${ws}*b${ws}*c" $string
======

However, this is not very efficient since it forces recompilation each time
(unlike the first version where the string is constant, and therefore can
support Tcl_Obj-based saving of compiled values.)  The following does not
suffer from this problem, and is probably clearer as well in practical
programs:

======
# Only execute these lines once, at startup time
set ws "\[ \t\n\r\]"
set abcRE "a${ws}*b${ws}*c"

# Now refer to the variable each time you want to match
regexp $abcRE $string
======

If you are using 8.1 or later, you have access to the new regular expression
engine, which allows several things to make this bite less (special escapes for
common sequences, regular expression cache (in later versions), and
''implemented'' cacheing of REs in [Tcl_Obj]s...)

The best solution for our example problem is probably

======
set pat {a\s*b\s*c}    ;# Globally

regexp $pat $string    ;# Each time needed
======

See comment below for additional speedup.

In general, do whatever you can to avoid losing the object that contains the
compiled expression.

Solution: We need to change the 8.1 implementation to introduce a second level
cache similar to the one in 8.0.  This will allow us to avoid recompilation in
cases where the same computed string value is passed in, even when it comes
from a different object.  We can probably use a thread-local cache, instead of
a per-interp cache to improve sharing and make the regexp code more modular.

-- JC

Actually, for performance sake, the best solution is to inline the pattern:

======
regexp {a\s*b\s*c} $string
======

This will be even faster (yes, I've verified this with timings).  For simple
regexps, this is more readable anyway since it keeps the `[regexp]`
definition where it is being used.

Another significant way to speed up regular expressions is to avoid making the
regexp retry matches (i.e., backtrack) if possible.  As a simple case, consider
this:

======
regexp a.*b.*c abbbbbbbbbbbbbbbbbbbb
======

[DL]: In this statement, the regexp engine is going to match `a`, then `b` and
then fail to match `c`.  But it won't stop there, it will match every single
`b` and, using those matches, will try, and fail, to match `c` after each
subsequent `b`.  This is extremely inefficient.  It is much more efficient to
break this into to two searches so that you can prevent the regexp engine from
all the pointless work backtracking.  This is just the tip of the iceberg,
alas.


[JH]: I think the above backtrack issue went away with the updated 8.1+ RE, but
the problem does still exist in 8.4.16/8.5b2 for the equivalent `[glob]` case:

======
string match *a*b*c* abbbbbbbbbbbbbbbbb
======

That you end up with pointless backtracking.


** `[read]`ing Large Data Files **

[RWT] & [DKF]: 

In Tcl 8.1 and above, this has been fixed, and `[read]` performance on large
files is satisfactory.  See the READ tests on the [Tcl Benchmarks] page.

One common Tcl-ism is to read an entire data file into memory in one fell
swoop, instead of reading and processing each line in turn.  This works well if
your file is small compared to system memory, and you can afford to do all the
processing in memory.  You can also use the ''split'' command to break the file
down into a list of individual lines, like this:

======
set fp [open myFile r]
set data [split [read $fp] \n]
close $fp
======

But for larger files (say in the megabyte range) Tcl 8.0's `[read]` can be very
slow because it is using a buffer size of 4096 bytes.  You can improve
performance by increasing the channel buffer size with `[fconfigure]`, but the
most efficient way is to tell `[read]` exactly how much data it will be
reading.

======
set fp [open myFile r]
set size [file size myFile]
set data [split [read $fp $size] \n]
close $fp
======


** Bootstrap Scripts **

[http://web.archive.org/web/20000823092517/http://www-dse.doc.ic.ac.uk/~np2/patterns/scripting/%|%Patterns for Scripted Applications], by Nat Pryce, particularly Bootstrap Script.


** Global env Array Slower than Most **

From <news:37379A64.5DE1679F@Scriptics.com> by Scott Stanton comes this Tcl 8.1 performance tip:

    :   ... you should not use the global [env] array as a general data storage area.  There is a lot of mechanism behind "env" that is needed to keep it in sync with multiple interps and the C environ array.


** More 8.1 Notes **

   * Apparently, the compiler has been changed somewhat in 8.1 so as to handle one-off scripts better.  The compiler has been split up into two bits, a parser (front-end) and a byte-code generator (back-end), and an alternative back-end has also been provided that just takes the code intermediate stuff out of the parser and evaluates it.  I don't know what the performance implications of this (for other than Tk binding scripts, where it pretty-much obviously improves performance) as I've not switched to 8.1 yet.  http://www.deja.com/getdoc.xp?AN=477828246

   * Also, string manipulation is substantially slower (in 8.1.0 - work is being put in to correct this) due to the Unicode support.  This is because Tcl currently uses UTF as its internal encoding, and because that encodes Unicode characters using a variable number of octets ''(that's charset speak for bytes)'' it makes operations like taking the length of a string or getting a character at a particular position into O(n) operations from O(1).  Massively yucky, I know!  http://www.deja.com/getdoc.xp?AN=475452431

   * Regexps are now handled more sensibly.  This makes it much better to put your (non-constant) regexps in a variable instead of rebuilding them on the fly.  [http://www.deja.com/getdoc.xp?AN=476397350]


======
set someLetters {[abcxyz]+}

# The bad way to do it
foreach string $someList {
    set m($string) [regexp ${someLetters}0${someLetters} $string]
}

# The good way to do it
set RE ${someLetters}0${someLetters
foreach string $someList {set m($string) [regexp $RE $string]}
======

'''DKF'''

----

[DKF]: OK, the string performance bug has now been fixed properly in 8.1.2, and
virtually all operations are now as fast in 8.1.2 as in 8.0.*  Much kudos to
[Ajuba Solutions] for fixing it so quickly and in such a non-blecherous
fashion. [http://www.deja.com/=dnc/getdoc.xp?AN=490351575]


** Operations on Lists of Data **

Comparing various methods of summing a list of numbers:

======
#! /bin/env tclsh

for {set i 15} {$i < 100} {incr i} {
    lappend L $i
}

set iterations 10000
set trial 0


proc sumup list {
    set sum 0
    foreach n $list {
        incr sum $n
    }
    set sum
}

set time [time {set sum [sumup $L]} $iterations]
puts "sumup      : $time, $sum"


set time [time {set sum [expr [join [concat 0 $L] +]]} $iterations]
puts "join/concat: $time, $sum"


proc reduce {f i l} {
    foreach e $l {
        set i [$f $i $e]
    }
    return $i
}
proc sum {x y} {expr {$x+$y}}

set time [time {set sum [reduce sum 0 $L]} $iterations]
puts "reduce     : $time, $sum"

======

Results with Tcl-8.6, Intel Core 2 Duo, 3.06 GHz, OS X 10.6.8:

======none
mathop     : 7.2078723 microseconds per iteration, 4845
sumup      : 12.1550874 microseconds per iteration, 4845
join/concat: 43.089464500000005 microseconds per iteration, 4845
reduce     : 43.4680041 microseconds per iteration, 4845
======


** When Does the Bytecode Compiler Kick in? **

[DKF]:  In 8.2 (and presumably 8.1 as well) it seems that the bytecode compiler
is only used some of the time (the most notable time is when you have a
procedure body, of course) so any benchmark - it is usually benchmarks that hit
this - that does not use the compiler will take a noticeable hit, especially
when they use a loop of some kind.  Avoid this by the use of procedures
wherever possible.

Thus, this is slow:

======
for {set a 0} {$a<100000} {incr a} {expr $a + 77}
======

But this is much faster:

======
proc main {} {
    for {set a 0} {$a<100000} {incr a} {
        expr $a + 77
    }
}
main
======

And this is faster still ''(due to expression optimisation)'':

======
proc main {} {
    for {set a 0} {$a<100000} {incr a} {
        expr {$a + 77}
    }
}
main
======


** Splitting strings faster than `[string index]` **


The tip for walking down lists applies also to strings (amazingly?)

======
#! /bin/env tclsh

set str {
The tyrannous and bloody deed is done. 
The most arch of piteous massacre 
That ever yet this land was guilty of. 
Dighton and Forrest, whom I did suborn 
To do this ruthless piece of butchery, 
Although they were flesh'd villains, bloody dogs, 
Melting with tenderness and kind compassion 
Wept like two children in their deaths' sad stories. 
'Lo, thus' quoth Dighton, 'lay those tender babes:' 
'Thus, thus,' quoth Forrest, 'girdling one another 
Within their innocent alabaster arms: 
Their lips were four red roses on a stalk, 
Which in their summer beauty kiss'd each other. 
A book of prayers on their pillow lay; 
Which once,' quoth Forrest, 'almost changed my mind; 
But O! the devil'--there the villain stopp'd 
Whilst Dighton thus told on: 'We smothered 
The most replenished sweet work of nature, 
That from the prime creation e'er she framed.' 
Thus both are gone with conscience and remorse; 
They could not speak; and so I left them both, 
To bring this tidings to the bloody king. 
And here he comes.
}

set iterations 10000

set time [time {
    foreach {a} [split $str {}] {
        set b $a
    }
} $iterations]


puts "split: $time"

set time [time {
    for {set a 0} {$a < [string length $str]} {incr a} {
        set b [string index $str $a]
    }
} $iterations]

puts "index: $time"
======

Results with Tcl-8.6, Intel Core 2 Duo, 3.06 GHz, OS X 10.6.8:

======none
split: 597.7267553 microseconds per iteration
index: 967.8046357000001 microseconds per iteration
======


** Performance Enhancements in 8.3 **

[DKF]: OK, today's goodies are a little discussion of one of the speed
optimisations in Tcl8.3.

It is part of the definition of canonically-formatted lists that when you
evaluate them, the first word of the list will be the name of the command, the
second word of the list will be the first argument, etc.  But in 8.0, 8.1 and
8.2 this case was always handled by converting to a string (putting in all the
appropriate quoting) and then evaluating (which just strips all the quoting
back out again before handing off to the command handler in question.)  Which
is stupidly slow, especially when you have many arguments that would not
otherwise have string forms.

So, for ''any'' xxx, yyy and zzz (including variable and command substitutions)
the following two lines are precisely equivalent:

======
xxx yyy zzz
eval [list xxx yyy zzz]
======

So, to fix this, 8.3 short-cuts this and hands off the contents of the list
directly to the command handler when it knows that doing so will not change the
semantics of the language.  Which turns out to be exactly when the command to
evaluate is a ''pure'' list (i.e. a list whose object form does not have a
string representation) as that is the only time you can prove that there are no
hidden gotchas lurking in the string form because anything unpleasant.  This
gives us a nice implementation of `[lconcat]` that is nearly as efficient as a
[C] version (whose only advantage would be that they could know to pre-allocate
the correct length of result list before copying.)

======
proc lconcat args {
    set result [list]
    foreach list $args {
        eval [linsert $list 0 lappend result]
    }
    set result; # FASTER THAN RETURN
}
======

To make this sort of thing easier to write, there is an additional optimisation
for `[concat]` (and other places that use concat internally, like `[eval]`) so
that where all its arguments are pure lists, the result is a pure list formed
by sticking the contents of the arguments together in the obvious way.  The
only reason I didn't use that above (which would have been noticeably clearer,
admittedly) is that this prevents trouble with non-pure list arguments.

[DGP]: Note that this performance optimization is available for `[uplevel]`
as well as `[eval]`, but to take advantage of it you must explicitly include
the optional ''level'' argument of `[uplevel]`:

======
proc test args {
    uplevel 1 $args     ;# Can use pure list optimization
    uplevel $args       ;# Attempt to interpret $args as a level
                        ;# generates a string rep, which prevents
                        ;# use of the pure list optimization
}
======


** Performance enhancements in 8.4 **

[DKF]:

   * '''`[return]`''' is now a byte-compiled command (finally!) so using it to get a value out of a procedure is efficient, and should be used as it is also the ''clearest'' technique...

   * '''`[expr]`''' now includes string comparison operators, and these are much faster than either `[string compare]` or `[string equal]` for the most common kinds of comparison.

   * '''`[split]`''' now works efficiently when splitting into individual characters, so the idiom ''[[foreach char [[split $longString {}]] {...}]]'' is now reasonable, even when the string being split is several megabytes long.

   * ...?


** Numerical Performance **

''Remember'', the actual numerical performance of computers varies widely with
operating system and processor speed, and comparing actual figures is very
tricky.  If you're going to put up benchmarking info, at least do everyone the
favour of [Finding Out Your Processor and Operating System Configuration]...


** Be aware of when Tcl objects will be copied **

See [Shuffle a list] for an example.

[File I/O performance] is a considerable subject in its own right.

The possible impact on performance of different ways to use indirection to call
a proc is (crudely) measured in [Procedure calling timed] and [Speed of
different method dispatching methods].

See [Can you run this benchmark 10 times faster] for an example of how to use
self-modifying scripts to improve performance.


** `[lsort]` **

The performance of the '''-command''' option of [lsort] is awful.  Avoid it
wherever possible; the [lsort] page shows how.


** Shootout **

Has anyone taken a good look at http://shootout.alioth.debian.org/ and the
reported performance of Tcl to see if either

   * the results point out places where Tcl needs to be improved or
   * the results point out places where the benchmarks need to be improved?


** Inspecting Bytecodes **

[DKF]:  It is often interesting (both from a performance and a debugging
point-of-view) to '''see the bytecode''' for a piece of code.  Here's how with
8.4:

   * rebuild Tcl with the '''TCL_COMPILE_DEBUG''' defined (there is a suitable line to uncomment in the ''Makefile'')

   * declare a procedure like this one

======
proc traced {script {compileLevel 2} {execLevel 3}} {
    proc traced_body {} "set ::tcl_traceExec 0\n\
            set ::tcl_traceCompile 0\n$script"
    set ::tcl_traceExec $execLevel
    set ::tcl_traceCompile $compileLevel
    uplevel 1 traced_body
}
======

   * now use that '''traced''' procedure and you will get printed out (by default) the bytecode for the code and what each instruction does as it executes.  For example:

======
traced {foreach {a b c} [list 1 2 3 4 5 6] {puts [expr {$a+$b/$c}]}}
======

What could be easier?

[AMG]: In 8.6, [[[tcl::unsupported::disassemble]]] is easier!

----

[Dossy] 2003-08-31:  In dealing with arrays, I find that I often have
non-trivial references to arrays.  Sometimes, the array name and the element
are both stored in seperate vars, so when I need to access the array value, I
have two idioms:

======
set a(key) value
set b a
set c key

set value [set [set b]($c)]
======

OR

======
set value [set $b\($c)]
======

Obviously, I'm curious which one people prefer and consider "idiomatic Tcl" and
which one performs better.  The first question I'm hoping to answer by posting
this question to the wiki and gathering some opinions, but the latter I could
do by benchmarking:

======
proc foo {n} {
    set a(j) abc
    set b a
    set k j
    for {set i 0} {$i < $n} {incr i} {
        set $b\($k)
    }
}

proc bar {n} {
    set a(j) abc
    set b a
    set k j
    for {set i 0} {$i < $n} {incr i} {
        set [set b]($k)
    }
}

% info patchlevel
8.4.3
% time "foo 1000000"
2504742 microseconds per iteration
% time "bar 1000000"
2523562 microseconds per iteration
======

It's not significant, but it seems the "$arrayName\($arrayKey)" is minimally
faster.

So, if performance isn't a concern, which style do people prefer?  Me, being a
purist, I prefer the `[set] [[[set] arrayName]]($arrayKey)` way.


----

[JMN]: It hadn't occurred to me to use:

======
set value [set $b\($c)]
======

I tend use to use 

======
set value [set ${b}($c)]
======

for a once-off as it's a more general way to specify the extent of a variable
and I like to keep it consistent with other substitution situations such as:

======
set x ca
puts ${x}b
======

where puts $x\b of course doesn't give the desired effect.

If accessing an indirected array variable in a loop where performance matters,
I'd use upvar to bring it back to 'normal' syntax. It seems significantly
faster.

======
proc baz {n} {
   set a(j) abc
   set b a
   set k j
   upvar 0 $b arr
   for {set i 0} {$i < $n} {incr i} {
     set arr($k)
   }
}
======

[Dossy]:  Julian, excellent points.  And, yes, upvar does seem the way to go:

======none
% time "baz 1000000"
471348 microseconds per iteration
======

That's a huge difference.  Neat.

----

[JMN]: If startup time of a Tcl/Tk application is a concern, consider that
'package require' of a package that doesn't exist on your system can take a
relatively long time if you have a large number of packages installed on your
auto_path. Some packages try to load other packages using something like:

======
if {[catch {package require tcllibc}]} {
======

On one of my systems such calls can take around 900msec

If you're using Tcl >= 8.5, consider the use of [tcl modules] to reduce the
time taken by the system in loading packages.

[1S] ... Or, you could check [Gathering packages' ifneeded scripts] as well.

----

** Performance enhancements in 8.5 **

The major change on the performance front in Tcl 8.5 was the increasing scope of coverage of bytecoded commands. These notably include:
   * [switch]
   * [info exists]
As well as (most of) [dict], which was new in 8.5 anyway.

Be aware that bignums are ''substantially'' slower than native integers; try to make your code avoid them if it is important to be quick.

Use `tcl::unsupported::[disassemble]` if you want to get into optimizing at the low level.

[[To write up...]]


** Performance enhancements in 8.6 **

Tcl 8.6 is slower than Tcl 8.5; the non-recursive execution engine imposes a measurable performance penalty, though for most code the cost will be fairly small. (It's particularly the cost of starting executing a procedure that increased, and this is doubly noticeable with [TclOO] methods.)

New tools that can help track down performance problems include `tcl::unsupported::[representation]`.

8.6.1 (will) include(s) a (very basic!) bytecode optimizer.

<<categories>> Performance