Version 13 of Profiling with execution traces

Updated 2011-11-01 17:21:45 by dkf

Richard Suchenwirth 2008-07-16 - The following little script demonstrates how performance data for a program (number how often a command is called, average time) can be collected via trace execution. You can specify a number of commands (procs or built-ins) for which at start and end, the microseconds timestamp is taken. The difference between the two is added to an array indexed by command name.

I first tried just asking for profiling [[info commands], but that took so much time that I couldn't stand it. See discussion below.

 #---------------------------------------- begin package profiler
 package require Tcl 8.5
 namespace eval profiler {variable T; variable S}
 proc profiler::enter args {
    variable S
    ::profiler::push S [::tcl::clock::microseconds]
 }
 proc profiler::leave {str args} {
    variable T; variable S
    set t0 [::profiler::pop S]
    ::profiler::push T([lindex $str 0]) [expr {[::tcl::clock::microseconds]-$t0}]
 }
 proc profiler::do {what cmdlist} {
    set cmd ""
    foreach i $cmdlist {
        if {!([string match *profiler::* $i])} {
            append cmd "trace $what execution $i enter ::profiler::enter\n"
            append cmd "trace $what execution $i leave ::profiler::leave\n"
        }
    }
    eval $cmd
 }
 interp alias {} profiler::push {} lappend
 proc profiler::pop _stack {
    upvar 1 $_stack stack
    K [lindex $stack end] [set stack [lrange $stack 0 end-1]]
 }
 proc profiler::K {a b} {set a}
 proc profiler::stats list {
    set sum [expr double([join $list +])]
    format "%10.0f %6d %10.2f" $sum [llength $list]  [expr {$sum/[llength $list]}]
 }
 proc profiler::report {{channel stdout}}  {
    variable T
    foreach i [lsort -dic [array names T]] {
        puts $channel [format "%-20s %s" $i [stats $T($i)]]
    }
 }
 #---------------------------------------- end package profiler

Here we build our "test bed", a main and a very simple function, which we then instrument:

 proc main argv {
    foreach i {1 2 3 4} {puts ==[f $i]}
 }
 proc f x {expr {$x*$x}}
 #-- We activate profiling on all procs, and a few selected Tcl commands:
 profiler::do add [concat [info procs] {expr lindex foreach upvar puts clock}]
 main $argv
 profiler::do remove [info commands] ;# no error if more removed than added

 profiler::report

This is seen on stdout of my desktop machine:

 ==1
 ==4
 ==9
 ==16
 expr                137662     84    1638.83
 f                    29036      4    7259.00
 foreach              14879      1   14879.00
 lindex                     105756    116     911.69
 main                 15969      1   15969.00
 puts                 34906      4    8726.50
 upvar               134692     92    1464.04

I noticed that it is dangerous to trace some essential commands. Adding set to the above list slowed the process down very much, and brought these ugly figures:

 expr                   51470212    654   78700.63
 f                 12083216      4 3020804.00
 foreach            7130413      1 7130413.00
 lindex                   41924889    848   49439.73
 main               8306913      1 8306913.00
 puts              15131320      4 3782830.00
 set               43121372    906   47595.33
 upvar             51070069    722   70734.17

Not a pretty sight. If someone can explain, I'd gladly listen :^)

Lars H: Does it make a difference if you change the definition of K to

 proc profiler::K {a b} {return $a}

? Get rid of the t0 variable in profiler::leave? It seems every call to this procedure does 3 sets, 2 lindexes, and only one each of all other commands, which could make a difference if the profiler spends a lot of time profiling itself. I think traces on a command are disabled inside the trace procedure for that command, but this is on a per-command basis, so traces on all traced commands in profiler::leave will fire before trace-of-trace recursion stops. It could simply be that adding set to the mix is what is required to make the recursive tracing overhead dominate the I/O latency in puts.

RS K can be dropped altogether, I just used it to avoid the set res ...; set stack ...; return $res pattern. Less fancy, but probably more efficient:

     set res   [lindex $stack end] 
     set stack [lrange $stack 0 end-1]
     return $res

But t0 is needed as intermediate variable - otherwise I get negative times... Wiki issue: the tables above were reformatted to alignment by me several times, but it looks as if browsers tabify spaces, so it came up unaligned again :(

Lars H: It's not the browser, AFAICT (sockspy is your friend), but the checkout routine (that gives you a http://wiki.tcl.tk/_edit/ page) that for some bizarre reason converts all stretches of eight spaces to tabs. (I meant to file a bug on the matter several months ago, but the bug tracking system on google is so awkward, at least IMHO.)

Regarding the need for t0, it sounds as though you have a race condition (or something): The item you pop off S is not always the one that was at the top of S when you called profiler::pop? If so, this error is not prevented by having the t0 variable, but it becomes less noticable; rearranging the terms as

 proc profiler::leave {str args} {
    variable T; variable S
    ::profiler::push T([lindex $str 0]) [expr {-[::profiler::pop S] + [::tcl::clock::microseconds]}]
 }

should have the same effect (no negative timinigs, but some very small positive timings). This bug should only hit if you trace commands called from the traces, however.

Maybe it is better to just lappend all data (time, enter/leave, command name) to a large list and not compute anything until you've done? This should reduce overhead, and also prevent "race condition" errors. FF 2008-07-17 - That's how (another) command profiler does. (although, pushing execution times in a stack, produces a wrong total time if commands are nested; it needs some tweaking)