Version 1 of Profiling Tcl with gprof

Updated 2005-03-03 00:51:03

George Peter Staplin: Mar 2, 2004 - I've been curious about where Tcl spends most of its time during the evaluation/execution of scripts, so I learned how to use gprof to profile the tcl core.

The steps are:

  1. get tcl from cvs here: http://sourceforge.net/projects/tcl/
  2. get the tclbench module from cvs here: http://sourceforge.net/projects/tcllib
  3. mkdir tcl_bld
  4. export CC="gcc -pg"
  5. ../tcl/unix/configure --prefix=/tmp --disable-shared
  6. make
  7. env SCRIPT="../tclbench/runbench.tcl -verbose -notk -iterations 500" make shell
  8. gprof tclsh gmon.out | less

Note: there is no need for 'make install'.

With NetBSD 2.0 running on a Pentium 4 1.8 GHz system, the following was the result:

 Flat profile:

 Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 11.43      0.04     0.04     5097     0.01     0.01  read
  5.71      0.06     0.02    75266     0.00     0.00  malloc_bytes
  5.71      0.08     0.02     4372     0.00     0.01  Tcl_ParseCommand
  5.71      0.10     0.02      526     0.04     0.12  TclExecuteByteCode
  5.71      0.12     0.02      132     0.15     0.15  __fork
  2.86      0.13     0.01    71098     0.00     0.00  TclpAlloc
  2.86      0.14     0.01    70100     0.00     0.00  memcpy
  2.86      0.15     0.01    61545     0.00     0.00  free_bytes
  2.86      0.16     0.01    27661     0.00     0.00  HashStringKey
  2.86      0.17     0.01    22235     0.00     0.00  Tcl_CreateHashEntry
  2.86      0.18     0.01    20035     0.00     0.00  CompareStringKeys
  2.86      0.19     0.01    17996     0.00     0.00  Tcl_UniCharIsLower
  2.86      0.20     0.01    13392     0.00     0.00  Tcl_DStringInit
  2.86      0.21     0.01    11408     0.00     0.00  Tcl_LimitReady
  2.86      0.22     0.01     4974     0.00     0.00  Tcl_ExternalToUtf
  2.86      0.23     0.01     4964     0.00     0.01  GetInput
  2.86      0.24     0.01     4664     0.00     0.00  Tcl_ListObjAppendElement
  2.86      0.25     0.01     2935     0.00     0.00  TclVarTraceExists
  2.86      0.26     0.01     2633     0.00     0.01  Tcl_FindCommand
  2.86      0.27     0.01      884     0.01     0.01  close
  2.86      0.28     0.01      512     0.02     0.02  __stat13
  2.86      0.29     0.01      365     0.03     0.03  open
  2.86      0.30     0.01      290     0.03     0.03  Tcl_SpliceChannel
  2.86      0.31     0.01      287     0.03     0.03  Tcl_Close
  2.86      0.32     0.01      287     0.03     0.24  Tcl_ReadChars
  2.86      0.33     0.01      191     0.05     0.05  TclExpandTokenArray
  2.86      0.34     0.01      151     0.07     0.08  WriteChars
  1.43      0.34     0.01     1657     0.00     0.00  FreeListInternalRep
  1.43      0.35     0.01                             TclListObjSetElement
  0.00      0.35     0.00   112762     0.00     0.00  Tcl_UtfToUniChar

The following items in the list were 0.00, so they didn't take as much time during the run as other items, and have been omitted.

For further study of the Tcl sources I will compile the some of the files for use with gcov, so I can see which parts of Tcl_ParseCommand and TclExecuteByteCode are consuming the most time.