'''How to Measure Performance''' ** See Also ** [Profiling Tcl]: [time]: [timers].tcl: [PSE]: Allow me to shamelessly plug my package ** Description ** Before anything else '''`[time]`''' it! For newer revisions of Tcl 8.6, the http://www.tcl-lang.org/man/tcl8.6/TclCmd/timerate.htm%|%timerate%|% command is preferable. See https://core.tcl-lang.org/tips/doc/trunk/tip/527.md%|%TIP 527%|%. The wonderful `[time]` command can help you answer questions like "Is this code faster than that code?" Check out the man page. Or check out this example of [Counting Elements in a List] which includes a basic timing framework. If you need to record time stamps (e.g., event driven Tk applications), then take a look at `[clock clicks]` which gives you a time stamp of the finest granularity available on the [platform]. Sometimes you may want to '''profile''' an application to find out where all the time is spent. There are several solutions. See [Profiling Tcl] for some suggestions. One idea is to build in time routines into an extension or application. For instance, `md5pure::time` lets anyone time the [md5pure] extension and compare the results to those of others (see the chart of timings at the [md5pure]). ** Tools ** [PDQ with Tcl%|%PDQ]: an analytical solver in which resources are represented by queues ** Timing Example ** Use `[time]` to compare to different coding techniques. For this simple example, we will compare list creation using the lappend list command and the append string command. But the technique can apply to many different situations. Put your test code in a proc so that it gets byte-compiled. Run enough iterations of the test code to make a meaningful timed test. (Or maybe just use a slower computer.) ====== proc test1 {} { for {set i 0} {$i < 1000} {incr i} { lappend mylist $i } } proc test2 {} { for {set i 0} {$i < 1000} {incr i} { append mylist "$i " } } % time test1 100 349 microseconds per iteration % time test2 100 1018 microseconds per iteration ====== ---- [DKF]: '''You should be aware of the fact that [UNIX] has a much finer time granularity than Windows''' (certainly [Windows 95%|%95]/[Winodws 98%|%98] and quite probably NT too.) ''[KBK] 2000-12-24: The problem with Windows clock resolution is fixed in the latest alpha of 8.4.'' The best way to benchmark a tiny snippet of Tcl (especially where it is platform-independent code you're testing) is to use a slow [UNIX] box. The slowness of its operation allows for sensitivity and the fine granularity of its timer lets you pick up what differences there are. And then you have to pray that the relative speeds scale as you move to different platforms/architectures and/or faster machines. Which is usually a good assumption if you're not [shimmering] between numbers and strings (where performance seems to vary a lot according to the vendor's implementation of the [C] library.) [Unix] has millisecond-sized granularity (i.e. around 1000Hz or 1kHz.) Windows has, for reasons relating to the hardware used in the original IBM-PC, 42.7Hz (or thereabouts) granularity. (''KBK 2000-12-24: The latest 8.4 alpha uses the 1.193 MHz 8254 timer in place of the Windows 1- or 10-ms time quantum.'') Obviously, this makes it harder to perform finer measurements. Good thing most operations in Tcl are actually platform-independent, so the relative differences in execution speed tend to carry across. [KBK]: The patch for using the 8254 clock in earlier Tcl releases is at [http://www.deja.com/getdoc.xp?AN=666545441&fmt=text] (''[AMG]: 404''). Note that multi-CPU Windows systems are still going to be limited to the 1-ms resolution, because of a bug in the Hardware Abstraction Layer that Microsoft ships. ---- ''Remember'', the actual numerical performance of computers varies widely with operating system and processor speed, and comparing actual figures is very tricky. If you're going to put up benchmarking info, at least do everyone the favour of [Finding Out Your Processor and Operating System Configuration]... ---- [MAK]: While working on optimizations for [TkPNG], alternating between using [time] and gprof for various measurements, I was sometimes seeing my numbers change in odd ways with some going up and some going down for a change that should affect all images in one direction or the other. So I was interested in seeing error margins to prove to myself I wasn't insane. So I whipped up this "errortime" proc. It works similarly to [time], but it runs the script through [time] with a given group size and does so the specified number of times, then calculates a new average and error margin. ====== proc errortime {script groupsize {iters 2}} { set i 0 set times {} if {$iters < 2} {set iters 2} for {set i 0} {$i < $iters} {incr i} { set result [uplevel [list time $script $groupsize]] lappend times [lindex $result 0] } set average 0.0 set s2 0.0 foreach time $times { set average [expr {$average + double($time)/$iters}] } foreach time $times { set s2 [expr {$s2 + (($time-$average)*($time-$average) / ($iters-1))}] } set sigma [expr {int(sqrt($s2))}] set average [expr int($average)] return "$average +/- $sigma microseconds per iteration" } ====== Example: ====== proc test1 {} { for {set i 0} {$i < 1000} {incr i} { lappend mylist $i } } proc test2 {} { for {set i 0} {$ i <1000} {incr i} { append mylist "$i " } ====== ======none % time test1 10000 855 microseconds per iteration % time test2 10000 2376 microseconds per iteration % errortime test1 100 100 843 +/- 42 microseconds per iteration % errortime test2 100 100 2316 +/- 29 microseconds per iteration ====== ---- '''[nektomk] - 2023-06-20 02:13:54''' `time` results unstable and depends from position in script ? ====== proc mimo1 { a b } { tcl::mathfunc::min $a $b } proc mimo2 { a b } { expr { min($a,$b) } } mimo1 200 20 mimo2 200 20 time {} # slowest results set c 0 puts [ time { incr c [ mimo2 100 10 ] } 50000 ] puts "----" # faster results set c 0 puts [ time { incr c [ mimo2 100 10 ] } 50000 ] ====== usually the second result is faster, regardless of the measured functions (mimo1 or mimo2); '''[HE] - 2023-06-21''': This is nothing I would subscribe. Without looking inside the source code my expectation of time is to take a timestamp in micro seconds in the beginning, execute the code the given number of times, take afterwards another timestamp and calculate the time per iteration. Doing so the time measurement catches everything what happens to your computer during measurement. That can be additional load influencing the execution time of your script like regular background tasks. Or simply change of processor speed during execution. You ([nektomk]) not wrote how long your tests take in average. On my computer around 0.9 microseconds. The differences between measurements with 50000 repetitions are around 0.3 microseconds. With 500000 repetitions it is only 0.02 microseconds. And it was in average faster than the smaller number of repetitions. Why 'time' needs repetitions? One run of the code is far to fast executed to be possible to be measured. The available fasted reliable time fragment in Tcl is microseconds. And 0.9 is below the measurable. So Tcl repeat it x times and divide the result by repetitions. Also we have nearly the same overhead for every measurements. That means it influence our result less with higher repetitions. On my computer it is 0.05 microseconds per test (calling test with an empty body) or 0.2 microseconds by also calculating the call of an empty procedure. Such short time measurements are more accurate when the measurement time is around one or two seconds. If it is to short measurement errors are increasing. If it takes to long the risk is higher that the processor steps up to optimize the execution. Or during today summer weather, step down because of increasing processor temperature. I also often execute tests a couple of times in changing sequence like test1, test2, test3, test3, test2, test1. That average influence from outside like changed processor speed. As always it is important to check what we want to measure and how we build up the measurement setup. ====== % proc mimo0 { a b } { return } % proc mimo1 { a b } { tcl::mathfunc::min $a $b } % proc mimo2 { a b } { expr { min($a,$b) } } % % time {mimo1 100 10 } 50000 0.94096 microseconds per iteration % time {mimo1 100 10 } 50000 0.9596 microseconds per iteration % time {mimo1 100 10 } 50000 0.93186 microseconds per iteration % time {mimo1 100 10 } 50000 0.96122 microseconds per iteration % time {mimo2 100 10 } 50000 0.90258 microseconds per iteration % time {mimo2 100 10 } 50000 0.91626 microseconds per iteration % time {mimo2 100 10 } 50000 0.92198 microseconds per iteration % time {mimo2 100 10 } 50000 0.9034 microseconds per iteration % time {mimo0 100 10 } 5000000 0.2092264 microseconds per iteration % time {mimo0 100 10 } 5000000 0.2164206 microseconds per iteration % time {mimo0 100 10 } 5000000 0.2109234 microseconds per iteration % time {mimo0 100 10 } 5000000 0.2165014 microseconds per iteration % time {mimo1 100 10 } 5000000 0.9046864 microseconds per iteration % time {mimo1 100 10 } 5000000 0.9301398 microseconds per iteration % time {mimo1 100 10 } 5000000 0.924608 microseconds per iteration % time {mimo1 100 10 } 5000000 0.9071432 microseconds per iteration % time {mimo2 100 10 } 5000000 0.8818558 microseconds per iteration % time {mimo2 100 10 } 5000000 0.8812846 microseconds per iteration % time {mimo2 100 10 } 5000000 0.8886016 microseconds per iteration % time {mimo2 100 10 } 5000000 0.8705476 microseconds per iteration % time {} 5000000 0.0523706 microseconds per iteration % time {} 5000000 0.0517532 microseconds per iteration % time {} 5000000 0.0524266 microseconds per iteration % time {} 5000000 0.0512242 microseconds per iteration ====== ---- '''[nektomk] - 2023-06-22 00:26:34''' rewriting the first test to increase the number of repetitions (to 500000) and simplify printing: ====== proc mimo1 { a b } { tcl::mathfunc::min $a $b } proc mimo2 { a b } { expr { min($a,$b) } } mimo1 200 20 mimo2 200 20 time {} # first time invocation: slowest results set c 0 set n1 [ lindex [ time { incr c [ mimo2 100 2 ] } 500000 ] 0 ] # second time invocation: faster results set c 0 set n2 [ lindex [ time { incr c [ mimo2 100 2 ] } 500000 ] 0 ] puts "$n1 $n2" ====== we are doing an external test, for comparisons and counting the results: ====== set test1 "test_minmax.tcl" set repeats 100 set k 0 for { set pass 0 } { $pass < $repeats } { incr pass } { set data [ exec tclsh $test1 ] lassign $data n1 n2 if { $n1>$n2 } { incr k } puts -nonewline [ format "%03d %0.9f %0.9f %03d\r" $pass $n1 $n2 $k ] flush stdout } if { $pass != 0 } { set percents [ expr { 100.0 * double($k) / double($pass) } ] puts "finally passes $pass k=$k percents=$percents" } puts $tcl_patchLevel parray tcl_platform ====== under windows10 on my pc (Ryzen7 2700 with 16Gb ram), typical results seen as: ====== C:\src>tclsh -encoding utf-8 test_minmax_2.tcl finally passes 100 k=86 percents=86.0 8.6.13 tcl_platform(byteOrder) = littleEndian tcl_platform(engine) = Tcl tcl_platform(machine) = amd64 tcl_platform(os) = Windows NT tcl_platform(osVersion) = 10.0 tcl_platform(pathSeparator) = ; tcl_platform(platform) = windows tcl_platform(pointerSize) = 8 tcl_platform(threaded) = 1 tcl_platform(user) = Professional tcl_platform(wordSize) = 4 ====== in more than 80% of cases, the '''second call to the "time" function will show the best result'''. This goes beyond randomness and calls into question some performance tests (and comparationis,optimizations). maybe it affects the features of the scheduler, the "warm-up" of the program and the real-time timer is used then we need a function to measure performance more correctly '''[HE] 2023-06-24''' I tried your code by storing it in separate file and execute them. Similar you did on Windows. The result: ====== $ ./howToMeasurePerformance_ctrl.tcl finally passes 100 k=55 percents=55.0 8.6.13 tcl_platform(byteOrder) = littleEndian tcl_platform(engine) = Tcl tcl_platform(machine) = x86_64 tcl_platform(os) = Linux tcl_platform(osVersion) = 5.15.94-1-MANJARO tcl_platform(pathSeparator) = : tcl_platform(platform) = unix tcl_platform(pointerSize) = 8 tcl_platform(threaded) = 1 tcl_platform(user) = foo tcl_platform(wordSize) = 8 ====== So the result is much different from yours even if the 1st attempt is slightly slower (by 5% of the attempts). For the following I will provide the results without tcl_patchlevel and tcl_platform. And I use a notebook with an Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz and 16GB RAM. Another try where I only add to collect some time measurement leads to an even better result: ====== set test1 "howToMeasurePerformance.tcl" set repeats 100 set k 0 set timeStart [clock microseconds] for { set pass 0 } { $pass < $repeats } { incr pass } { set data [ exec tclsh $test1 ] lassign $data n1 n2 if { $n1>$n2 } { incr k } puts -nonewline [ format "%03d %0.9f %0.9f %03d\r" $pass $n1 $n2 $k ] flush stdout } set timeEnd [clock microseconds] if { $pass != 0 } { set percents [ expr { 100.0 * double($k) / double($pass) } ] puts "finally passes $pass k=$k percents=$percents" } puts "Duration of the test loop:" puts "\tIn us: [set timeDiff [expr {$timeEnd - $timeStart}]]" puts "\tIn ms: [expr {$timeDiff/1000}]" puts "\tIn s: [expr {$timeDiff/1000000}]" ====== I tried it a couple of times: ====== $ ./howToMeasurePerformance_ctrl.tcl finally passes 100 k=51 percents=51.0 Duration of the test loop: In us: 90599801 In ms: 90599 In s: 90 $ ./howToMeasurePerformance_ctrl.tcl finally passes 100 k=38 percents=38.0 Duration of the test loop: In us: 90810254 In ms: 90810 In s: 90 $ ./howToMeasurePerformance_ctrl.tcl finally passes 100 k=49 percents=49.0 Duration of the test loop: In us: 90190455 In ms: 90190 In s: 90 $ ./howToMeasurePerformance_ctrl.tcl finally passes 100 k=54 percents=54.0 Duration of the test loop: In us: 91319530 In ms: 91319 In s: 91 $ ./howToMeasurePerformance_ctrl.tcl finally passes 100 k=57 percents=57.0 Duration of the test loop: In us: 90449596 In ms: 90449 In s: 90 ====== The overall result is 51 + 38 + 49 + 54 + 57 = 249 of 5 * 100 = 500 times the 1st run was slower which is in the range of the expected average. Ok, let us try one run of howToMeasurePerformance_ctrl.tcl with 500 retries: ====== $ ./howToMeasurePerformance_ctrl.tcl finally passes 500 k=250 percents=50.0 Duration of the test loop: In us: 460193622 In ms: 460193 In s: 460 ====== We can see it has the expected result but, the test run becomes slights slower. Nothing to worry. Because the processor had to execute everything without interruption and possibly started to step down. Keep in mind that a repetition of the 500 runs will also provide slightly different results. I tried the same on another Manjaro system with an AMD Ryzen 7 5700U with Radeon Graphics and 16GB RAM: ====== 8.6.12 tcl_platform(byteOrder) = littleEndian tcl_platform(engine) = Tcl tcl_platform(machine) = x86_64 tcl_platform(os) = Linux tcl_platform(osVersion) = 6.1.26-1-MANJARO tcl_platform(pathSeparator) = : tcl_platform(platform) = unix tcl_platform(pointerSize) = 8 tcl_platform(threaded) = 1 tcl_platform(user) = foo tcl_platform(wordSize) = 8 ====== Beside that it is overall slower there are no real differences. Here only the 500 runs result: ====== $ ./howToMeasurePerformance_ctrl.tcl finally passes 500 k=249 percents=49.8 Duration of the test loop: In us: 611350423 In ms: 611350 In s: 611 ====== You found the differences on a Windows system. I remembered that I had it on the same computer. Restart into win10 and another test run. I started directly with the 500 retry test: ====== >C:\HE\tcl\bin\tclsh howToMeasurePerformance_ctrl.tcl finally passes 500 k=255 percents=51.0 Duration of the test loop: In us: 678437127 In ms: 678437 In s: 678 ====== We see again the expected result. I don't know what is with your computer that you see something different. But, I stay to my statement "As always it is important to check what we want to measure and how we build up the measurement setup." And we do not need "a function to measure performance more correctly". we have to learn how to use our measurement equipment. Keep in mind that you always measure other usage of your computer happening in parallel. This can't be detected by any internal performance measurement. The only way is to build up an empty physical device where only the test program is running. And even then you are victim of power supply, temperature and other physical influences. A virtual machine is not enough because it is still be influenced by other virtual machines and the host itself. I doubt that it is possible to setup such a system with a windows kernel. <> Performance