How to Measure Performance

How to Measure Performance

See Also

Profiling Tcl
time
timers.tcl
PSE: Allow me to shamelessly plug my package

Description

Before anything else time it! For newer revisions of Tcl 8.6, the timerate command is preferable. See TIP 527 .

The wonderful time command can help you answer questions like "Is this code faster than that code?" Check out the man page. Or check out this example of Counting Elements in a List which includes a basic timing framework.

If you need to record time stamps (e.g., event driven Tk applications), then take a look at clock clicks which gives you a time stamp of the finest granularity available on the platform.

Sometimes you may want to profile an application to find out where all the time is spent. There are several solutions. See Profiling Tcl for some suggestions.

One idea is to build in time routines into an extension or application. For instance, md5pure::time lets anyone time the md5pure extension and compare the results to those of others (see the chart of timings at the md5pure).

Tools

PDQ
an analytical solver in which resources are represented by queues

Timing Example

Use time to compare to different coding techniques. For this simple example, we will compare list creation using the lappend list command and the append string command. But the technique can apply to many different situations.

Put your test code in a proc so that it gets byte-compiled. Run enough iterations of the test code to make a meaningful timed test. (Or maybe just use a slower computer.)

proc test1 {} {
    for {set i 0} {$i < 1000} {incr i} {
        lappend mylist $i
    }
}

proc test2 {} {
    for {set i 0} {$i < 1000} {incr i} {
        append mylist "$i "
    }
}
% time test1 100
349 microseconds per iteration
% time test2 100
1018 microseconds per iteration

DKF: You should be aware of the fact that UNIX has a much finer time granularity than Windows (certainly 95/98 and quite probably NT too.)

KBK 2000-12-24: The problem with Windows clock resolution is fixed in the latest alpha of 8.4.

The best way to benchmark a tiny snippet of Tcl (especially where it is platform-independent code you're testing) is to use a slow UNIX box. The slowness of its operation allows for sensitivity and the fine granularity of its timer lets you pick up what differences there are. And then you have to pray that the relative speeds scale as you move to different platforms/architectures and/or faster machines. Which is usually a good assumption if you're not shimmering between numbers and strings (where performance seems to vary a lot according to the vendor's implementation of the C library.)

Unix has millisecond-sized granularity (i.e. around 1000Hz or 1kHz.) Windows has, for reasons relating to the hardware used in the original IBM-PC, 42.7Hz (or thereabouts) granularity. (KBK 2000-12-24: The latest 8.4 alpha uses the 1.193 MHz 8254 timer in place of the Windows 1- or 10-ms time quantum.) Obviously, this makes it harder to perform finer measurements. Good thing most operations in Tcl are actually platform-independent, so the relative differences in execution speed tend to carry across.

KBK: The patch for using the 8254 clock in earlier Tcl releases is at [L1 ] (AMG: 404). Note that multi-CPU Windows systems are still going to be limited to the 1-ms resolution, because of a bug in the Hardware Abstraction Layer that Microsoft ships.


Remember, the actual numerical performance of computers varies widely with operating system and processor speed, and comparing actual figures is very tricky. If you're going to put up benchmarking info, at least do everyone the favour of Finding Out Your Processor and Operating System Configuration...


MAK: While working on optimizations for TkPNG, alternating between using time and gprof for various measurements, I was sometimes seeing my numbers change in odd ways with some going up and some going down for a change that should affect all images in one direction or the other. So I was interested in seeing error margins to prove to myself I wasn't insane. So I whipped up this "errortime" proc. It works similarly to time, but it runs the script through time with a given group size and does so the specified number of times, then calculates a new average and error margin.

proc errortime {script groupsize {iters 2}} {
    set i 0
    set times {} 
    if {$iters < 2} {set iters 2}

    for {set i 0} {$i < $iters} {incr i} {
        set result [uplevel [list time $script $groupsize]]
        lappend times [lindex $result 0]
    }

    set average 0.0
    set s2 0.0

    foreach time $times {
        set average [expr {$average + double($time)/$iters}]
    }

    foreach time $times {
        set s2 [expr {$s2 + (($time-$average)*($time-$average) / ($iters-1))}]
    }

    set sigma [expr {int(sqrt($s2))}]
    set average [expr int($average)]

    return "$average +/- $sigma microseconds per iteration"
}

Example:

proc test1 {} {
   for {set i 0} {$i < 1000} {incr i} {
       lappend mylist $i
   }
}

proc test2 {} {
    for {set i 0} {$ i <1000} {incr i} {
        append mylist "$i "
}
% time test1 10000
855 microseconds per iteration
% time test2 10000
2376 microseconds per iteration
% errortime test1 100 100
843 +/- 42 microseconds per iteration
% errortime test2 100 100
2316 +/- 29 microseconds per iteration

nektomk - 2023-06-20 02:13:54

time results unstable and depends from position in script ?

proc mimo1 { a b } {
        tcl::mathfunc::min $a $b
}
proc mimo2 { a b } {
        expr { min($a,$b) } 
}

mimo1 200 20
mimo2 200 20
time {}
# slowest results
set c 0
puts [ time {
        incr c [ mimo2 100 10 ]
} 50000 ]
puts "----"
# faster results
set c 0
puts [ time {
        incr c [ mimo2 100 10 ]
} 50000 ]

usually the second result is faster, regardless of the measured functions (mimo1 or mimo2);

HE - 2023-06-21: This is nothing I would subscribe. Without looking inside the source code my expectation of time is to take a timestamp in micro seconds in the beginning, execute the code the given number of times, take afterwards another timestamp and calculate the time per iteration.

Doing so the time measurement catches everything what happens to your computer during measurement. That can be additional load influencing the execution time of your script like regular background tasks. Or simply change of processor speed during execution.

You (nektomk) not wrote how long your tests take in average. On my computer around 0.9 microseconds. The differences between measurements with 50000 repetitions are around 0.3 microseconds. With 500000 repetitions it is only 0.02 microseconds. And it was in average faster than the smaller number of repetitions.

Why 'time' needs repetitions? One run of the code is far to fast executed to be possible to be measured. The available fasted reliable time fragment in Tcl is microseconds. And 0.9 is below the measurable. So Tcl repeat it x times and divide the result by repetitions.

Also we have nearly the same overhead for every measurements. That means it influence our result less with higher repetitions. On my computer it is 0.05 microseconds per test (calling time with an empty body) or 0.2 microseconds by also calculating the call of an empty procedure.

Such short time measurements are more accurate when the measurement time is around one or two seconds. If it is to short measurement errors are increasing. If it takes to long the risk is higher that the processor steps up to optimize the execution. Or during today summer weather, step down because of increasing processor temperature.

I also often execute tests a couple of times in changing sequence like test1, test2, test3, test3, test2, test1. That average influence from outside like changed processor speed.

As always it is important to check what we want to measure and how we build up the measurement setup.

% proc mimo0 { a b } {
        return
}
% proc mimo1 { a b } {
        tcl::mathfunc::min $a $b
}
% proc mimo2 { a b } {
        expr { min($a,$b) } 
}
% 
% time {mimo1 100 10 } 50000
0.94096 microseconds per iteration
% time {mimo1 100 10 } 50000
0.9596 microseconds per iteration
% time {mimo1 100 10 } 50000
0.93186 microseconds per iteration
% time {mimo1 100 10 } 50000
0.96122 microseconds per iteration
% time {mimo2 100 10 } 50000
0.90258 microseconds per iteration
% time {mimo2 100 10 } 50000
0.91626 microseconds per iteration
% time {mimo2 100 10 } 50000
0.92198 microseconds per iteration
% time {mimo2 100 10 } 50000
0.9034 microseconds per iteration

% time {mimo0 100 10 } 5000000
0.2092264 microseconds per iteration
% time {mimo0 100 10 } 5000000
0.2164206 microseconds per iteration
% time {mimo0 100 10 } 5000000
0.2109234 microseconds per iteration
% time {mimo0 100 10 } 5000000
0.2165014 microseconds per iteration

% time {mimo1 100 10 } 5000000
0.9046864 microseconds per iteration
% time {mimo1 100 10 } 5000000
0.9301398 microseconds per iteration
% time {mimo1 100 10 } 5000000
0.924608 microseconds per iteration
% time {mimo1 100 10 } 5000000
0.9071432 microseconds per iteration

% time {mimo2 100 10 } 5000000
0.8818558 microseconds per iteration
% time {mimo2 100 10 } 5000000
0.8812846 microseconds per iteration
% time {mimo2 100 10 } 5000000
0.8886016 microseconds per iteration
% time {mimo2 100 10 } 5000000
0.8705476 microseconds per iteration

% time {} 5000000
0.0523706 microseconds per iteration
% time {} 5000000
0.0517532 microseconds per iteration
% time {} 5000000
0.0524266 microseconds per iteration
% time {} 5000000
0.0512242 microseconds per iteration

nektomk - 2023-06-22 00:26:34

rewriting the first test to increase the number of repetitions (to 500000) and simplify printing:

proc mimo1 { a b } {
        tcl::mathfunc::min $a $b
}
proc mimo2 { a b } {
        expr { min($a,$b) } 
}

mimo1 200 20
mimo2 200 20
time {}
# first time invocation: slowest results
set c 0
set n1 [ lindex [ time {
        incr c [ mimo2 100 2 ]
} 500000 ] 0 ]
# second time invocation: faster results
set c 0
set n2 [ lindex [ time {
        incr c [ mimo2 100 2 ]
} 500000 ] 0 ]
puts "$n1 $n2"

we are doing an external test, for comparisons and counting the results:

set test1  "test_minmax.tcl"
set repeats 100
set k 0
for { set pass 0 } { $pass < $repeats } { incr pass } {
        set data [ exec tclsh $test1 ]
        lassign $data n1 n2
        if { $n1>$n2 } {
                incr k 
        }
        puts -nonewline [ format "%03d %0.9f %0.9f %03d\r" $pass $n1 $n2 $k ]
        flush stdout
}
if { $pass != 0 } {
        set percents [ expr { 100.0 * double($k) / double($pass) } ]
        puts "finally passes $pass k=$k percents=$percents"
}
puts $tcl_patchLevel
parray tcl_platform

under windows10 on my pc (Ryzen7 2700 with 16Gb ram), typical results seen as:

C:\src>tclsh -encoding utf-8 test_minmax_2.tcl
finally passes 100 k=86 percents=86.0
8.6.13
tcl_platform(byteOrder)     = littleEndian
tcl_platform(engine)        = Tcl
tcl_platform(machine)       = amd64
tcl_platform(os)            = Windows NT
tcl_platform(osVersion)     = 10.0
tcl_platform(pathSeparator) = ;
tcl_platform(platform)      = windows
tcl_platform(pointerSize)   = 8
tcl_platform(threaded)      = 1
tcl_platform(user)          = Professional
tcl_platform(wordSize)      = 4

in more than 80% of cases, the second call to the "time" function will show the best result. This goes beyond randomness and calls into question some performance tests (and comparationis,optimizations).

maybe it affects the features of the scheduler, the "warm-up" of the program and the real-time timer is used

then we need a function to measure performance more correctly

HE 2023-06-24

I tried your code by storing it in separate file and execute them. Similar you did on Windows. The result:

$ ./howToMeasurePerformance_ctrl.tcl 
finally passes 100 k=55 percents=55.0
8.6.13
tcl_platform(byteOrder)     = littleEndian
tcl_platform(engine)        = Tcl
tcl_platform(machine)       = x86_64
tcl_platform(os)            = Linux
tcl_platform(osVersion)     = 5.15.94-1-MANJARO
tcl_platform(pathSeparator) = :
tcl_platform(platform)      = unix
tcl_platform(pointerSize)   = 8
tcl_platform(threaded)      = 1
tcl_platform(user)          = foo
tcl_platform(wordSize)      = 8

So the result is much different from yours even if the 1st attempt is slightly slower (by 5% of the attempts).

For the following I will provide the results without tcl_patchlevel and tcl_platform. And I use a notebook with an Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz and 16GB RAM.

Another try where I only add to collect some time measurement leads to an even better result:

        set test1  "howToMeasurePerformance.tcl"
        set repeats 100
        set k 0
        set timeStart [clock microseconds]
        for { set pass 0 } { $pass < $repeats } { incr pass } {
                        set data [ exec tclsh $test1 ]
                        lassign $data n1 n2
                        if { $n1>$n2 } {
                                        incr k 
                        }
                        puts -nonewline [ format "%03d %0.9f %0.9f %03d\r" $pass $n1 $n2 $k ]
                        flush stdout
        }
        set timeEnd [clock microseconds]
        if { $pass != 0 } {
                        set percents [ expr { 100.0 * double($k) / double($pass) } ]
                        puts "finally passes $pass k=$k percents=$percents"
        }
        puts "Duration of the test loop:"
        puts "\tIn us: [set timeDiff [expr {$timeEnd - $timeStart}]]"
        puts "\tIn ms: [expr {$timeDiff/1000}]"
        puts "\tIn s:  [expr {$timeDiff/1000000}]"

I tried it a couple of times:

$ ./howToMeasurePerformance_ctrl.tcl 
finally passes 100 k=51 percents=51.0
Duration of the test loop:
        In us: 90599801
        In ms: 90599
        In s:  90
$ ./howToMeasurePerformance_ctrl.tcl 
finally passes 100 k=38 percents=38.0
Duration of the test loop:
        In us: 90810254
        In ms: 90810
        In s:  90
$ ./howToMeasurePerformance_ctrl.tcl 
finally passes 100 k=49 percents=49.0
Duration of the test loop:
        In us: 90190455
        In ms: 90190
        In s:  90
$ ./howToMeasurePerformance_ctrl.tcl 
finally passes 100 k=54 percents=54.0
Duration of the test loop:
        In us: 91319530
        In ms: 91319
        In s:  91
$ ./howToMeasurePerformance_ctrl.tcl 
finally passes 100 k=57 percents=57.0
Duration of the test loop:
        In us: 90449596
        In ms: 90449
        In s:  90

The overall result is 51 + 38 + 49 + 54 + 57 = 249 of 5 * 100 = 500 times the 1st run was slower which is in the range of the expected average.

Ok, let us try one run of howToMeasurePerformance_ctrl.tcl with 500 retries:

$ ./howToMeasurePerformance_ctrl.tcl 
finally passes 500 k=250 percents=50.0
Duration of the test loop:
        In us: 460193622
        In ms: 460193
        In s:  460

We can see it has the expected result but, the test run becomes slighty slower. Nothing to worry. Because the processor had to execute everything without interruption and possibly started to step down.

Keep in mind that a repetition of the 500 runs will also provide slightly different results.

I tried the same on another Manjaro system with an AMD Ryzen 7 5700U with Radeon Graphics and 16GB RAM:

8.6.12
tcl_platform(byteOrder)     = littleEndian
tcl_platform(engine)        = Tcl
tcl_platform(machine)       = x86_64
tcl_platform(os)            = Linux
tcl_platform(osVersion)     = 6.1.26-1-MANJARO
tcl_platform(pathSeparator) = :
tcl_platform(platform)      = unix
tcl_platform(pointerSize)   = 8
tcl_platform(threaded)      = 1
tcl_platform(user)          = foo
tcl_platform(wordSize)      = 8

Beside that it is overall slower there are no real differences. Here only the 500 runs result:

$ ./howToMeasurePerformance_ctrl.tcl
finally passes 500 k=249 percents=49.8
Duration of the test loop:
        In us: 611350423
        In ms: 611350
        In s:  611

You found the differences on a Windows system. I remembered that I had it on the same computer. Restart into win10 and another test run. I started directly with the 500 retry test:

>C:\HE\tcl\bin\tclsh howToMeasurePerformance_ctrl.tcl
finally passes 500 k=255 percents=51.0
Duration of the test loop:
        In us: 678437127
        In ms: 678437
        In s:  678

We see again the expected result.

I don't know what is with your computer that you see something different.

But, I stay to my statement "As always it is important to check what we want to measure and how we build up the measurement setup."

And we do not need "a function to measure performance more correctly". we have to learn how to use our measurement equipment.

Keep in mind that you always measure other usage of your computer happening in parallel. This can't be detected by any internal performance measurement. The only way is to build up an empty physical device where only the test program is running. And even then you are victim of power supply, temperature and other physical influences. A virtual machine is not enough because it is still be influenced by other virtual machines and the host itself. I doubt that it is possible to setup such a system with a windows kernel.