Version 9 of Profiling with execution traces

Updated 2008-07-16 10:57:40 by suchenwi

Richard Suchenwirth 2008-07-16 - The following little script demonstrates how performance data for a program (number how often a command is called, average time) can be collected via trace execution. You can specify a number of commands (procs or built-ins) for which at start and end, the microseconds timestamp is taken. The difference between the two is added to an array indexed by command name.

I first tried just asking for profiling [[info commands], but that took so much time that I couldn't stand it. See discussion below.

 #---------------------------------------- begin package profiler
 package require Tcl 8.5
 namespace eval profiler {variable T; variable S}
 proc profiler::enter args {
    variable S
    ::profiler::push S [::tcl::clock::microseconds]
 proc profiler::leave {str args} {
    variable T; variable S
    set t0 [::profiler::pop S]
    ::profiler::push T([lindex $str 0]) [expr {[::tcl::clock::microseconds]-$t0}]
 proc profiler::do {what cmdlist} {
    set cmd ""
    foreach i $cmdlist {
        if {!([string match *profiler::* $i])} {
            append cmd "trace $what execution $i enter ::profiler::enter\n"
            append cmd "trace $what execution $i leave ::profiler::leave\n"
    eval $cmd
 interp alias {} profiler::push {} lappend
 proc profiler::pop _stack {
    upvar 1 $_stack stack
    K [lindex $stack end] [set stack [lrange $stack 0 end-1]]
 proc profiler::K {a b} {set a}
 proc profiler::stats list {
    set sum [expr double([join $list +])]
    format "%10.0f %6d %10.2f" $sum [llength $list]  [expr {$sum/[llength $list]}]
 proc profiler::report {{channel stdout}}  {
    variable T
    foreach i [lsort -dic [array names T]] {
        puts $channel [format "%-20s %s" $i [stats $T($i)]]
 #---------------------------------------- end package profiler

if 0 {Here we build our "test bed", a main and a very simple function, which we then instrument:}

 proc main argv {
    foreach i {1 2 3 4} {puts ==[f $i]}
 proc f x {expr {$x*$x}}
 #-- We activate profiling on all procs, and a few selected Tcl commands:
 profiler::do add [concat [info procs] {expr lindex foreach upvar puts clock}]
 main $argv
 profiler::do remove [info commands] ;# no error if more removed than added


if 0 {This is seen on stdout of my desktop machine:

 expr                137662     84    1638.83
 f                    29036      4    7259.00
 foreach              14879      1   14879.00
 lindex                     105756    116     911.69
 main                 15969      1   15969.00
 puts                 34906      4    8726.50
 upvar               134692     92    1464.04

I noticed that it is dangerous to trace some essential commands. Adding set to the above list slowed the process down very much, and brought these ugly figures:

 expr                   51470212    654   78700.63
 f                 12083216      4 3020804.00
 foreach            7130413      1 7130413.00
 lindex                   41924889    848   49439.73
 main               8306913      1 8306913.00
 puts              15131320      4 3782830.00
 set               43121372    906   47595.33
 upvar             51070069    722   70734.17

Not a pretty sight. If someone can explain, I'd gladly listen :^)

Lars H: Does it make a difference if you change the definition of K to

 proc profiler::K {a b} {return $a}

? Get rid of the t0 variable in profiler::leave? It seems every call to this procedure does 3 sets, 2 lindexes, and only one each of all other commands, which could make a difference if the profiler spends a lot of time profiling itself. I think traces on a command are disabled inside the trace procedure for that command, but this is on a per-command basis, so traces on all traced commands in profiler::leave will fire before trace-of-trace recursion stops. It could simply be that adding set to the mix is what is required to make the recursive tracing overhead dominate the I/O latency in puts.

RS K can be dropped altogether, I just used it to avoid the set res ...; set stack ...; return $res pattern. Less fancy, but probably more efficient:

     set res   [lindex $stack end] 
     set stack [lrange $stack 0 end-1]
     return $res

But t0 is needed as intermediate variable - otherwise I get negative times...