The aim of this page is to collect together stuff on how to make the pips squeak in Tcl.  Anything for making the code faster (such as better idioms for compilation, discussions on reference counting, etc.) is fine, though long stuff should probably be shunted off somewhere else and only a link to it left here.
'''DKF'''

A good companion reference is [Tcl Benchmarks], a benchmark suite by [Jeffrey Hobbs] that demonstrates performance changes over time.

[Tk Performance] is dependent upon Tcl Performance, but has its own set of considerations.

In the hallways and bars at the 2007 Tcl conference, there was a fair amount of discussion about [Radical reform of the execution engine], up to and including the idea that Tcl could compile down to machine code.

If you are looking to perform your own tests, you should read up on [How to Measure Performance] so you can avoid the common pitfalls.
See also [Tcl Performance: catch vs. info]

If you run out of memory rather than time, then [Compact Data Storage] might give you some hints.

----
**Before anything else, time it!**

Tcl has a wonderful '''[time]''' command, which can help you answer
questions like "Is this code faster than that code?"  Check out the
man page.  Or check out this example of [Counting Elements in a List]
which includes a basic timing framework.

If you need to record time stamps (e.g., event driven
Tk applications), then take a look at '''clock clicks''' which
gives you a (platform dependent) time stamp.

Sometimes you may want to profile an application to find out where all the time is spent. There are several solutions. See [Profiling Tcl] for
some suggestions. 

RWT

(See [How to Measure Performance]!)

----
Here's a surprising timing reported by "Tom Wilkason" <wilkason@hom.net>
on news:comp.lang.tcl on an Win 2K 375Mhz box (updated) '''TFW''' 2001-01-31:
 # Q is a local drive mounted as a share
 time {cd q:/utilities;cd [file dirname [pwd]]} 1000
 2163 microseconds per iteration
 time {cd q:/utilities;cd q:/} 1000
 2083 microseconds per iteration
 # This is the same dir as above but from the raw mount
 time {cd d:/source/utilities;cd [file dirname [pwd]]} 1000
 601 microseconds per iteration
 time {cd d:/source/utilities;cd d:/source} 1000
 551 microseconds per iteration

'''DKF''' 2000-02-14 - Curiouser and curiouser [http://www.ruthannzaroff.com/wonderland/curiouser.htm] since when I perform the timings with Tcl8.0.4 on my ultrasparc/Solaris box, I get the following timings...
 % time {cd /home/fellowsd/arch; cd /home/fellowsd} 100000
 48 microseconds per iteration
 % time {cd /home/fellowsd/arch; cd [file dirname [pwd]]} 100000
 129 microseconds per iteration
 % time {cd /home/fellowsd/arch; cd [file join [pwd] ..]} 100000
 131 microseconds per iteration
So things are not as straight-forward as they might seem.  (Side note: is NT ''really'' so slow at changing directories, or is it just that the particular NT machine that the above test was done on is grossly underpowered anyway?)

----
Allow me to shamelessly plug my package ''timers.tcl'' [http://wiki.tcl.tk/671] -PSE

----
Simple stuff for making the compiled code faster in 8.0:

   * In versions of Tcl prior to 8.4, using [return] to return a value was slower than simply arranging for the value to be the last thing in the procedure body.  8.4 corrected this.  See [Tcl 8.0 performance advice] for the discussion, which was deleted from this page because it has been inaccurate for several years.

   * Enclose all expressions in {} unless you really know what you are doing and ''want'' double-evaluation.

   * Try to only ever evaluate constant strings.  Where this is not possible, try to make sure that you don't change any particular string very often.  Cache it if you can.  Tk binding scripts are a prime example of how ''not'' to get maximum speed out of Tcl8.

   * ''Any more?''

'''DKF'''

----
**What is the byte-code compiler doing, anyway?**

You can find out just what the byte-code compiler is doing by activating internal tracing.  You can set the Tcl variable '''tcl_traceExec''' to one of the following values, and internal trace functions will generate messages on stdout for every byte-code executed.

   0: no execution tracing
   1: trace invocations of Tcl procs only
   2: trace invocations of all (not compiled away) commands
   3: display each instruction executed

You can set the variable '''tcl_traceCompile''' to one of the 
following values to get information during byte code compilation 
of a procedure or toplevel command.  

   0: no compile tracing
   1: one line summary
   2: detailed listing of byte codes

This is documented in the '''tclvars(n)''' manpage (right at the bottom of the 8.0 version.)  Note that the output is always generated on stdout at a very low level.  Mac people cannot do profiling 
''(unless Apple fixes this in MacOSX)'' and Windows people will need to run from a console
''(which might be a problem with wish, IIRC.)''
Still, something that is only ever useful in some situations is almost always better than nothing at all.

'''DKF'''

[DKF]: Also, in Tcl 8.5 you can use '''`::tcl::unsupported::[disassemble]`''' to pull apart some code and show the bytecodes. No guarantees about formatting, etc, but it is very convenient ''and'' returns the result as a string so you can do further processing, show in a text widget, and so on.

----
**A fast in-place list update scheme.**

This uses [K] to make the [lreplace] work in-place.

   proc K {x y} {set x}
   set theList [lreplace [K $theList [set theList {}]] 7 42]

See [http://www.dejanews.com/getdoc.xp?AN=453831830] for details.
Since this links seems to be broken as of 2010-04-01, try [http://groups.google.com/group/comp.lang.tcl/tree/browse_frm/thread/d001f2a7c8defbef/b1f0e3a4ea625dc5?rnum=1&_done=%2Fgroup%2Fcomp.lang.tcl%2Fbrowse_frm%2Fthread%2Fd001f2a7c8defbef%3F#doc_70dbdac2e65abd46]

    :   ''[DKF]: It seems from digging around that it depended on a discovery of mine from no later than 1999, i.e., that several Tclers were writing posts to c.l.t at the time describing it as a well-known technique. I've not yet identified what it started, but it may have started from [http://groups.google.co.uk/group/comp.lang.tcl/browse_frm/thread/518c2ec3b0030a7f/1e73df85c34b2fe5?lnk=gst&q=k+lreplace#1e73df85c34b2fe5%|%this thread%|%]. It's quite difficult to search back in the 1998–99 period; you keep finding irrelevant stuff from later in the way.''

'''DKF'''

Donal, this deserves to be made more widely known; it just came up again as I was benchmarking Bob Techentin's code to [Shuffle a list].  It's incredible what a difference it made in the performance.  '''KBK'''

[DKF]: In 8.5 you can also do this:
  set theList [lreplace $theList[set theList {}] 7 42]

[pooryorick] 2012-10: [MS] explained on the IRC channel that `$theList[[set theList {}]]` is more effective than `$theList[[unset theList]]` because (1) you do not have to update the varname hashtable (assuming the variable is actually in a hastable); (2) you don't have to worry about the variable being at either end of a link, and (3) because [set] is bytecompiled but [unset] is not (too complicated).  In short, setting to {} is faster because it does a lot less. 

That depends on the fact that the low-level (bytecode) string concatenation operator knows not to do anything to an object when joining an empty string onto the end of it.

[DKF]: In 8.6, [unset] is bytecode compiled. (And don't use [[`catch {unset foo}`]]; the operationally-equivalent [[`unset -nocomplain foo`]] generates much better bytecode.)

----
Just because I want to be able to find this again later... I've copied some notes[http://www.dejanews.com/getdoc.xp?AN=457938595] from Jeff Hobbs.  (To be updated when he rewrites it. :-)
RWT
tivtivity and the fine granularity of its timer lets you pick up what differences there are.  And then you have to pray that the relative speeds scale as you move to different platforms/architectures and/or faster machines.  Which is usually a good assumption if you're not shimmering between numbers and strings (where performance seems to vary a lot according to the vendor's implementation of the C library.)

   * Enclose all expressions in {} unless you really know what you are doing and ''want'' double-evaluation.

   * Try to only ever evaluate constant strings.  Where this is not possible, try to make sure that you don't change any particular string very often.  Cache it if you can.  Tk binding scripts are a prime example of how ''not'' to get maximum speed out of Tcl8.

   * ''Any more?''

'''DKF'''

----
'''What is the byte-code compiler doing, anyway?'''

You can find out just what the byte-code compiler is doing by activating internal tracing.  You can set the Tcl variable '''tcl_traceExec''' to one of the following ''3'''

you have a situation where the string compare would be false and the other true.

Interestingly, in tcl8.2
    if { [ string equal $a $b ] }
is always nearly twice as slow as the two above. But in tcl8.3
    if { [ string equal $a $b ] }
    if { $a == $b }
    if { ! [ string compare $a $b ] }
are all within 4usec of eac other on solaris2.7,
and 1usec on Windows NT.
On solaris, the first two are always quicker than the last, and on Windows NT, the middle is slower than the other two. YMMV. 

Warning!  Before Tcl 8.4.x, ''[string] compare'' has did not always return a correct value!  In fixing that problem, string compare is now as much as 30% slower than before.  If you are wanting to compare two strings accurately,  ''string equal'' is the best choice.

A [JH] example where ''string compare'' failed, which comes from the [ActiveState] [ActiveTcl] mailing list:

 woset [~] 106 > echo 'puts [string compare \x00 \x01]' | tclsh8.3
 1
 woset [~] 107 > echo 'puts [string compare \x00 \x01]' | tclsh8.4
 -1

----
'''Be aware of Tcl_Obj's''' - don't change object representation without reason.

This starts with Tcl8.0 and the fact that lists and strings no longer have equal internal representations.  Switching between these comes at a cost, and it is often easy to avoid when you become "one with the Tcl".  For example, many program Tcl procs with an 'args'.  This will come is as a list to your proc.  Don't do
        [string compare {} $args]
to see if it has something in it, instead do
        [llength $args].
Doing the string compare would convert the $args list obj to a string obj, then back to list when it was actually used.  Keep this in mind for numbers as well, with the caveats of point 2 (incr is faster if you keep treating the var as a number).

----
'''Tcl8 lists provide a fast representation of 1D arrays.'''

lindex is now O(1) on lists that are already in list obj rep.  This also
assists in things like lrange and lreplace, and even lappend.  If your indices are numeric, the list is significantly faster than arrays.

(NOTE: I believe it was Paul Duffin that has presented us with a full
Tcl_Obj'ified version of arrays, which will also speed up the array
implementation in Tcl8).

''KBK:'' But be aware that you may inadvertently copy a Tcl_Obj if you're not careful, and destroy the performance advantage.  See [Shuffle a list] for an example.

----
**Walking Lists**

Walking lists with '''[foreach]'''
is faster than '''[for]'''
and '''[lindex]'''.  This effect is particularly pronounced before 8.0 (since you only had to parse a string into a list once) but it is still noticeable after.

     set cities [list Rochester Minneapolis St.Paul Mankato Bemidji]

     set ncities [llength $cities]
     for {set i 0} {$i<$ncities} {incr i} {
          set city [lindex $cities $i]
          # This is slower
     }

     foreach city $cities {
          # This is faster, and more elegant
     }

Even if you need to know the item index, the following [foreach] [idiom] is faster than the [for] form:
     set i -1; foreach city $cities {incr i
          # Normal body comes here
     }
(The [incr] has to come first in the body if the body might do a [continue], as the index would otherwise not always be updated to match.)

Remember that the ''foreach'' command can operate on several lists in parallel.  So you can work with multiple parallel lists.

     set temperatures [list -10 -12 -14 -18 -32]

     foreach city $cities temp $temperatures {
          #  Fast, elegant, and even cool
     }
     # with those temperatures, its downright chilly - glennj

----
**Brace Expressions**

Use braces around all conditionals to get the best efficiency out of the compiler.  In Tcl7, one might have
        if [...]
this should now be
        if {[...]}

expr expressions should also be braced.  There will be a few cases where
this won't work (building up exprs in a var).  [Brace your expr-essions]!

----
**Comments and whitespace have no effect in compiled procs**

Once a proc is first compiled (upon first use), any amount of whitespace
and commenting in it is removed in the compiled representation.  The load time for heavily commented procs will be slightly longer (Tcl has to read all that in before it knows what to avoid), but after its first call, Tcl will byte-compile it, and use that rep.  Of course, if you keep changing the procs around, you're going to have problems.  I believe there are also points like redefining certain Tcl core commands that invalidate all byte-compiled proc representations.

----
**Using extended foreach syntax is a time saver**

Combining points "multiple commands" and "don't change objects", we can reach the conclusion that
        foreach {a b c d e} [cmdReturningList] {break}
is faster than
        set list [cmdReturningList]
        set a [lindex $list 0]
        set b [lindex $list 1]
        ....

The break is helpful to ensure that we don't actually do any loops, in
case cmdReturningList is changed to return more than 5 args.  Note that
when we want the result from two similar calls, we get even better
advantages out of:
        foreach {a b c} [cmdRetList] {a' b' c'} [cmdRetList'] {break}

Note 8.5 onwards, you can use the more readable [lassign] command.

[AMG]: When 8.5 isn't available and I use [[[foreach]]] in this way, I usually skip the [[[break]]] when the list is guaranteed to iterate only once.

----
**Caching array variables can be a win**

Yes, accessing stuff in the hash table does take a little longer, so if
you have a number that you want to loop 1000 times on, put it into a
simple variable first.

----
**Put everything in a proc**

Inline Tcl code doesn't get all the optimizations
that a [proc]edure can get.  For example, the inline statement

      for {set i 0} {$i<10000} {incr i} {lappend a $i}

takes three times longer (on my machine, running Tcl 8.0.4) than

      proc init_me {} {
          global b
          for {set i 0} {$i<10000} {incr i} {lappend b $i}
      }
      init_me

The differences can be even more dramatic in Tcl 8.1, because the
compiler was split into a parser front-end and a compiler back-end.
Global code is just parsed into tokens, similarly to the way that Tcl operated before 8.0 as this offers a smaller performance hit when running scripts that only execute once (notably Tk bindings, but many other kinds of callback come into this category too.)  Only procedure bodies ''(and lambda terms)'' are fully compiled.

----
**RegExp tricks**
Regular expressions are expensive to compile, and non-constant regexps must be compiled every time they are scanned for.

Suppose you want to check for the case where there is "a b c" with arbitrary (including none) amounts of whitespace between the letters. In 8.0, the following is the most obvious way to do it, but it is hard to write:

    regexp "a\[ \t\n\r\]*b\[ \t\n\r\]*c\[ \t\n\r\]" $string

For clarity's sake it is far better to do:

    set ws "\[ \t\n\r\]"
    regexp "a${ws}*b${ws}*c" $string

However, this is not very efficient since it forces recompilation each time (unlike the first version where the string is constant, and therefore can support Tcl_Obj-based saving of compiled values.)  The following does not suffer from this problem, and is probably clearer as well in practical programs:

    # Only execute these lines once, at startup time
    set ws "\[ \t\n\r\]"
    set abcRE "a${ws}*b${ws}*c"

    # Now refer to the (probably global) variable each time you want to match
    regexp $abcRE $string

If you are using 8.1 or later, you have access to the new regular expression engine, which allows several things to make this bite less (special escapes for common sequences, regular expression cache (in later versions), and ''implemented'' cacheing of REs in Tcl_Objs...)

The best solution for our example problem is probably

    set pat {a\s*b\s*c}    ;# Globally

    regexp $pat $string    ;# Each time needed

(See comment below for additional speedup.)

In general, do whatever you can to avoid losing the object that contains
the compiled expression.

Solution: We need to change the 8.1 implementation to introduce a second
level cache similar to the one in 8.0.  This will allow us to avoid recompilation in cases where the same computed string value is passed in, even when it comes from a different object.  We can probably use a thread-local cache, instead of a per-interp cache to improve sharing and make the regexp code more modular.

-- JC

Actually, for performance sake, the best solution is to inline the pattern:

    regexp {a\s*b\s*c} $string

This will be even faster (yes, I've verified this with timings).  For simple regexps, this is more readable anyway since it keeps the regexp definition where it is being used.

Another significant way to speed up regular expressions is to avoid making the regexp retry matches (i.e., backtrack) if possible.  As a simple case, consider this:

  regexp {a.*b.*c} "abbbbbbbbbbbbbbbbbbbb"

In this statement, the regexp engine is going to match the "a", then the "b" and then fail to match the "c".  But it won't stop there, it will match every single b and using those matches will try (and fail to match the c in every position (where there are b's)).  This is extremely inefficient.  It is much more efficient to break this into to two searches so that you can prevent the regexp engine from all the pointless work backtracking.

This is just the tip of the iceberg, alas.

-- DL

[JH] I think the above backtrack issue went away with the updated 8.1+ RE, but the problem does still exist in 8.4.16/8.5b2 for the equivalent glob case:

  string match "*a*b*c*" "abbbbbbbbbbbbbbbbb"

That you end up with pointless backtracking.

----
**"Slurping up" data files**

One common Tcl-ism is to read an entire data file into memory in one fell swoop, instead of reading and processing each line in turn.  This works well if your file is small compared to system memory, and you can afford to do all the processing in memory.  You can also use the ''split'' command to break the file down into a list of individual lines, like this:

     set fp [open myFile r]
     set data [split [read $fp] "\n"]
     close $fp

But for larger files (say in the megabyte range) Tcl 8.0's ''read'' can be very slow because it is
using a buffer size of 4096 bytes.  You can improve performance
by increasing the channel buffer size with 
the ''fconfigure'' command.  But the most efficient way is to tell
the ''read'' command exactly how much data it will be reading.

     set fp [open "myFile" r]
     set size [file size "myFile"]
     set data [split [read $fp $size] "\n"]
     close $fp

Note that this is only a problem with Tcl 8.0.  More recent versions of [read] have been fixed.  See the READ tests on the [Tcl Benchmarks] page.

RWT & DKF

----
Cameron Laird has a page with some stuff on Tcl performance stuff too. [http://www.phaseit.net/claird/comp.lang.tcl/tcl_performance.html]

The Tycho package website has a page mentioning some Performance Tools and Performance Hints (taken from the first edition of [BOOK Practical Programming in Tcl and Tk, Second edition]), see
[http://ptolemy.eecs.berkeley.edu/tycho/tycho0.2.1/tycho0.2.1/doc/coding/performance.html] -- JC  (These hints apply primarily to pre-8.0 Tcl. -- RWT)

-----
Nat Pryce has some patterns for scripting and Tcl  [http://www-dse.doc.ic.ac.uk/~np2/patterns/scripting/]. These patterns include some for improving performance, particularly Bootstrap Script.

----
**Global env array slower than most**

From <news:37379A64.5DE1679F@Scriptics.com> by Scott Stanton comes this Tcl 8.1 performance tip:

"... you should not use the global "env" array as a general data storage area.  There is a lot of mechanism behind "env" that is needed to keep it in sync with multiple interps and the C environ array."
----
'''More 8.1 Notes:'''

   * Apparently, the compiler has been changed somewhat in 8.1 so as to handle one-off scripts better.  The compiler has been split up into two bits, a parser (front-end) and a byte-code generator (back-end), and an alternative back-end has also been provided that just takes the code intermediate stuff out of the parser and evaluates it.  I don't know what the performance implications of this (for other than Tk binding scripts, where it pretty-much obviously improves performance) as I've not switched to 8.1 yet.  http://www.deja.com/getdoc.xp?AN=477828246
   * Also, string manipulation is substantially slower (in 8.1.0 - work is being put in to correct this) due to the Unicode support.  This is because Tcl currently uses UTF as its internal encoding, and because that encodes Unicode characters using a variable number of octets ''(that's charset speak for bytes)'' it makes operations like taking the length of a string or getting a character at a particular position into O(n) operations from O(1).  Massively yucky, I know!  http://www.deja.com/getdoc.xp?AN=475452431
   * Regexps are now handled more sensibly.  This makes it much better to put your (non-constant) regexps in a variable instead of rebuilding them on the fly.  [http://www.deja.com/getdoc.xp?AN=476397350]

    set someLetters {[abcxyz]+}
  
    # The bad way to do it
    foreach string $someList {set m($string) [regexp "${someLetters}0${someLetters}" $string]}

    # The good way to do it
    set RE "${someLetters}0${someLetters"
    foreach string $someList {set m($string) [regexp $RE $string]}

'''DKF'''

----
OK, the string performance bug has now been fixed properly in 8.1.2, and virtually all operations are now as fast in 8.1.2 as in 8.0.*  Much kudos to [Ajuba Solutions] for fixing it so quickly and in such a non-blecherous fashion. [http://www.deja.com/=dnc/getdoc.xp?AN=490351575]

'''DKF'''

----
**Operations on Lists of Data**

Sometimes, you can wind up with some really surprising performance results.  In particular, if you want to apply a general binary operation to a long list of numbers, the fastest technique is:
    set sum [eval expr [join [concat 0 $L] +]]

([ZB] Actually, why that "eval"? There doesn't seem to be a need for any script evaluation...)

This is substantially (nearly twice) faster ''(given a list of about 60 numbers to add up)'' than the (slightly more flexible) technique based on a generalised command application procedure:
    proc reduce {f i l} {
        foreach e $l {
            set i [$f $i $e]
        }
        return $i
    }
    proc sum {x y} {expr {$x+$y}}
    set sum [reduce sum 0 $L]
But the fastest technique (around twice as fast again as the method with [[join]]) of the lot is to use something utterly application specific:
    proc sumup list {set sum 0; foreach n $list {incr sum $n}; set sum}
    set sum [sumup $L]
The advice given earlier in this page really applies here - whenever you want to check the performance of several ways of implementing a particular operation, use [[time]].

'''DKF'''

In 8.5, consider using '''[tcl::mathop]::[+]'''.

----
**When does the bytecode compiler kick in?**

In 8.2 (and presumably 8.1 as well) it seems that the bytecode compiler is only used some of the time (the most notable time is when you have a procedure body, of course) so any benchmark - it is usually benchmarks that hit this - that does not use the compiler will take a noticeable hit, especially when they use a loop of some kind.  Avoid this by the 
use of procedures wherever possible.

Thus, this is slow:
    for {set a 0} {$a<100000} {incr a} {expr $a + 77}
But this is much faster:
    proc main {} {
        for {set a 0} {$a<100000} {incr a} {expr $a + 77}
    }
    main
And this is faster still ''(due to expression optimisation)'':
    proc main {} {
        for {set a 0} {$a<100000} {incr a} {
            expr {$a + 77}
        }
    }
    main

'''DKF'''

----
**Splitting strings faster than [string index]**

The tip for walking down lists applies also to strings (amazingly?). 
It is faster to do
   foreach {a} [split $str {}] {
       set b $a
   }
rather than 
   for {set a 0} {$a < [string length $str]} {incr a} {
       set b [string index $str $a]
   }
The advantage is not lost even if you do need the position number inside 
the loop and do an incr ...

----
**Performance enhancements in 8.3**

OK, today's goodies are a little discussion of one of the speed optimisations in Tcl8.3.

It is part of the definition of canonically-formatted lists that when you evaluate them, the first word of the list will be the name of the command, the second word of the list will be the first argument, etc.  But in 8.0, 8.1 and 8.2 this case was always handled by converting to a string (putting in all the appropriate quoting) and then evaluating (which just strips all the quoting back out again before handing off to the command handler in question.)  Which is stupidly slow, especially when you have many arguments that would not otherwise have string forms.

So, for ''any'' xxx, yyy and zzz (including variable and command substitutions) the following two lines are precisely equivalent:
  xxx yyy zzz
  eval [list xxx yyy zzz]

So, to fix this, 8.3 short-cuts this and hands off the contents of the list directly to the command handler when it knows that doing so will not change the semantics of the language.  Which turns out to be exactly when the command to evaluate is a ''pure'' list (i.e. a list whose object form does not have a string representation) as that is the only time you can prove that there are no hidden gotchas lurking in the string form because anything unpleasant.  This gives us a nice implementation of [[lconcat]] that is nearly as efficient as a C version (whose only advantage would be that they could know to pre-allocate the correct length of result list before copying.)

  proc lconcat args {
      set result [list]
      foreach list $args {
          eval [linsert $list 0 lappend result]
      }
      set result; # FASTER THAN RETURN
  }

To make this sort of thing easier to write, there is an additional optimisation for [[concat]] (and other places that use concat internally, like eval) so that where all its arguments are pure lists, the result is a pure list formed by sticking the contents of the arguments together in the obvious way.  The only reason I didn't use that above (which would have been noticeably clearer, admittedly) is that this prevents trouble with non-pure list arguments.

'''DKF'''

Note that this performance optimization is available for [[uplevel]]
as well as [[eval]], but to take advantage of it you must explicitly
include the optional ''level'' argument of [[uplevel]]:

  proc test args {
    uplevel 1 $args     ;# Can use pure list optimization
    uplevel $args       ;# Attempt to interpret $args as a level
                        ;# generates a string rep, which prevents
                        ;# use of the pure list optimization
  }

'''DGP'''

----
**Performance enhancements in 8.4**

   * '''return''' is now a byte-compiled command (finally!) so using it to get a value out of a procedure is efficient, and should be used as it is also the ''clearest'' technique...
   * '''expr''' now includes string comparison operators, and these are much faster than either [[string compare]] or [[string equal]] for the most common kinds of comparison.
   * '''split''' now works efficiently when splitting into individual characters, so the idiom ''[[foreach char [[split $longString ""]] {...}]]'' is now reasonable, even when the string being split is several megabytes long.
   * ...?

'''DKF'''

----
''Remember'', the actual numerical performance of computers varies widely with operating system and processor speed, and comparing actual figures is very tricky.  If you're going to put up benchmarking info, at least do everyone the favour of [Finding Out Your Processor and Operating System Configuration]...

----
**Be aware of when Tcl objects will be copied**

See [Shuffle a list] for an example.

[File I/O performance] is a considerable subject in its own right.

The possible impact on performance of different ways to use indirection to call a proc is (crudely) measured in [Procedure calling timed] and [Speed of different method dispatching methods].

See [Can you run this benchmark 10 times faster] for an example of how to use self-modifying scripts to improve performance.

----
The performance of the '''-command''' option of [lsort] is awful.
Avoid it wherever possible; the [lsort] page shows how.

----
Has anyone taken a good look at http://shootout.alioth.debian.org/ and the reported performance of Tcl to see if either

   * the results point out places where Tcl needs to be improved or
   * the results point out places where the benchmarks need to be improved?

----
[[Explain pertinence of page on [Performance of networking applications].]]

----
''DKF'' - It is often interesting (both from a performance and a debugging point-of-view) to '''see the bytecode''' for a piece of code.  Here's how with 8.4:

   * rebuild Tcl with the '''TCL_COMPILE_DEBUG''' defined (there is a suitable line to uncomment in the ''Makefile'')
   * declare a procedure like this one

        proc traced {script {compileLevel 2} {execLevel 3}} {
            proc traced_body {} "set ::tcl_traceExec 0\n\
                    set ::tcl_traceCompile 0\n$script"
            set ::tcl_traceExec $execLevel
            set ::tcl_traceCompile $compileLevel
            uplevel 1 traced_body
        }

   * now use that '''traced''' procedure and you will get printed out (by default) the bytecode for the code and what each instruction does as it executes.  For example:

        traced {foreach {a b c} [list 1 2 3 4 5 6] {puts [expr {$a+$b/$c}]}}

What could be easier?

----
See [Comparing Performance of Tcl OO extensions] for a first attempt at
quantifying performance of some of the OO extensions available in the community.
The source is available so that others can implement the code in their favorite
OO extension and update the page.

----
31aug2003:  In dealing with arrays, I find that I often have non-trivial references to arrays.  Sometimes, the array name and the element are both stored in seperate vars, so when I need to access the array value, I have two idioms:

  set a(key) value
  set b a
  set c key

  set value [set [set b]($c)]

OR

  set value [set $b\($c)]

Obviously, I'm curious which one people prefer and consider "idiomatic Tcl" and which one performs better.  The first question I'm hoping to answer by posting this question to the wiki and gathering some opinions, but the latter I could do by benchmarking:

  proc foo {n} {
    set a(j) abc
    set b a
    set k j
    for {set i 0} {$i < $n} {incr i} {
      set $b\($k)
    }
  }
  
  proc bar {n} {
    set a(j) abc
    set b a
    set k j
    for {set i 0} {$i < $n} {incr i} {
      set [set b]($k)
    }
  }

  % info patchlevel
  8.4.3
  % time "foo 1000000"
  2504742 microseconds per iteration
  % time "bar 1000000"
  2523562 microseconds per iteration

It's not significant, but it seems the "$arrayName\($arrayKey)" is minimally faster.

So, if performance isn't a concern, which style do people prefer?  Me, being a purist, I prefer the "set [[set arrayName]]($arrayKey)" way.

-- [Dossy]  

----
[JMN]
It hadn't occurred to me to use:

 set value [set $b\($c)]

I tend use to use 

 set value [set ${b}($c)]

for a once-off as it's a more general way to specify the extent of a variable and I like to keep it consistent with other substitution situations such as:

 set x ca
 puts ${x}b

where puts $x\b of course doesn't give the desired effect.

If accessing an indirected array variable in a loop where performance matters, I'd use upvar to bring it back to 'normal' syntax. It seems significantly faster.

 proc baz {n} {
    set a(j) abc
    set b a
    set k j
    upvar 0 $b arr
    for {set i 0} {$i < $n} {incr i} {
      set arr($k)
    }
 }

[Dossy] - Julian, excellent points.  And, yes, upvar does seem the way to go:

  % time "baz 1000000"
  471348 microseconds per iteration

That's a huge difference.  Neat.

----
[JMN] If startup time of a Tcl/Tk application is a concern, consider that 'package require' of a package that doesn't exist on your system can take a relatively long time if you have a large number of packages installed on your auto_path. Some packages try to load other packages using something like:

 if {[catch {package require tcllibc}]} {

On one of my systems such calls can take around 900msec

If you're using Tcl >= 8.5, consider the use of [tcl modules] to reduce the time taken by the system in loading packages.

[1S] ... Or, you could check [Gathering packages' ifneeded scripts] as
well.

----
**Performance enhancements in 8.5**
[[To write up...]]

----
   * [timers.tcl]
   * [speed issues]
   * [tclprof]
   * [Dynamic procs as performance monitors]
   * [Tcl Socket Performance]
   * [Performance of Various Stack Implementations]
   * [PerfSuite]

----
[Computer Language Benchmarks Game]

<<categories>> Performance