Profiling Tcl with gprof

George Peter Staplin: Mar 2, 2004 - I've been curious about where Tcl spends most of its time during the evaluation/execution of scripts, so I learned how to use gprof to profile the tcl core.

The steps are:

  1. get tcl from cvs here: http://sourceforge.net/projects/tcl/
  2. get the tclbench module from cvs here: http://sourceforge.net/projects/tcllib
  3. mkdir tcl_bld ; cd tcl_bld
  4. export CC="gcc -pg"
  5. ../tcl/unix/configure --prefix=/tmp --disable-shared
  6. make
  7. create do_bench.sh
  8. sh do_bench.sh
  9. gprof tclsh test_gmon.out

Note: there is no need for 'make install'.

Here is the do_bench.sh script:

 #!/bin/sh

 for f in ../tclbench/tcl/*.bench 
 do
  echo $f
  env SCRIPT="../tclbench/runbench.tcl -verbose -notk -iterations 500 $f" make shell
  mv gmon.out `basename $f`_gmon.out
 done

The statistics are written upon the exit of the tclsh, and currently tclbench execs each test file, so we can't run a group of tests and have valid results.

With NetBSD 2.0 running on a Pentium 4, 1.8 GHz system, the following was the result for gprof tclsh array.bench_gmon.out:

 Flat profile:

 Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 11.11      0.01     0.01    20394     0.00     0.00  malloc_bytes
 11.11      0.02     0.01    19490     0.00     0.00  TclpAlloc
 11.11      0.03     0.01    14686     0.00     0.00  free_bytes
 11.11      0.04     0.01    10926     0.00     0.00  memcpy
 11.11      0.05     0.01     1290     0.01     0.01  Tcl_ParseBraces
 11.11      0.06     0.01      386     0.03     0.03  __stat13
 11.11      0.07     0.01      357     0.03     0.03  Tcl_ObjGetVar2
 11.11      0.08     0.01      221     0.05     0.05  InitCompiledLocals
 11.11      0.09     0.01       94     0.11     0.11  TclCompileReturnCmd
  0.00      0.09     0.00    35662     0.00     0.00  TclParseWhiteSpace

For further study we may use gcov to get finer results for individual functions.


MS george, thanks for this. Note that tclbench may not be that well balanced for this particular purpose. In any case, it was not designed for it. Not that I'd know what well balanced might mean.

(after talking with MS in the chat) George Peter Staplin: yep, and we will work something out. (MS and I discussed using new interps for each test file, rather than new processes.)


George Peter Staplin: I wrote a hacky patch for tclbench tonight to allow running all benchmarks in a single process. The patch for the tclbench HEAD is here: http://www.xmission.com/~georgeps/engineering/software/patches/tcl_bench_Mar_2_2005_b.diff (DEAD LINK)

It isn't a complete patch, but it provided enough capability for me to find the gprof results with all benchmarks.

My script to start runbench is:

 #!/bin/sh

 env SCRIPT="../tclbench/runbench.tcl -verbose -notk -iterations 500 -singleprocess" make shell

Here are the gprof tclsh gmon.out results:

 Flat profile:

 Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 20.89     31.34    31.34  4530233     0.00     0.00  TclExecuteByteCode
  9.51     45.61    14.27 42151760     0.00     0.00  memcpy
  3.59     51.00     5.39   925617     0.00     0.00  FreeListInternalRep
  3.05     55.58     4.58   501918     0.00     0.00  DupListInternalRep
  2.98     60.05     4.47   906974     0.00     0.00  memmove
  2.97     64.50     4.45 46178718     0.00     0.00  malloc_bytes
  2.23     67.85     3.35 46151153     0.00     0.00  free_bytes
  1.93     70.75     2.90     3364     0.00     0.00  TclUniCharMatch
  1.73     73.34     2.60 44449352     0.00     0.00  Tcl_CreateHashEntry
  1.25     75.21     1.87  3252510     0.00     0.00  TableToUtfProc
  1.25     77.08     1.87   935198     0.00     0.00  __syscall
  1.11     78.74     1.66 17069796     0.00     0.00  TclEvalObjvInternal
  1.09     80.38     1.64  8228996     0.00     0.00  miss
  1.02     81.91     1.53 38764876     0.00     0.00  ResetObjResult
  0.95     83.34     1.43 35481596     0.00     0.00  Tcl_ListObjAppendElement
  0.90     84.69     1.35 34128887     0.00     0.00  Tcl_GetCommandFromObj
  0.87     85.99     1.30   211379     0.00     0.00  __stat13
  0.86     87.28     1.29  1438022     0.00     0.00  Tcl_StringObjCmd
  0.82     88.50     1.23 52457749     0.00     0.00  Tcl_GetThreadData
  0.80     89.70     1.20 45503044     0.00     0.00  Tcl_Alloc
  0.77     90.86     1.15 58877099     0.00     0.00  Tcl_GetStringFromObj
  0.77     92.00     1.15  3936217     0.00     0.00  TclObjInterpProc
  0.76     93.14     1.14 98732472     0.00     0.00  Tcl_InvalidateStringRep
  0.75     94.27     1.12 46605190     0.00     0.00  ifree
  0.71     95.33     1.06   152642     0.00     0.00  SetListFromAny
  0.70     96.38     1.05 45995249     0.00     0.00  malloc
  0.69     97.41     1.03 38764718     0.00     0.00  Tcl_ResetResult
  0.68     98.43     1.02 49664344     0.00     0.00  Tcl_UniCharToLower
  0.68     99.45     1.02   228790     0.00     0.00  TableFromUtfProc
  0.67    100.45     1.00 18099244     0.00     0.00  Tcl_SetObjResult
  0.66    101.44     0.99 45453081     0.00     0.00  Tcl_SetLongObj
  0.62    102.38     0.93  7886398     0.00     0.00  pickss
  0.61    103.30     0.92  8598571     0.00     0.00  Tcl_UniCharToUtfDString
  0.57    104.15     0.85 14440572     0.00     0.00  AppendUtfToUtfRep
  0.56    105.00     0.84 47111137     0.00     0.00  Tcl_GetString
  0.55    105.83     0.83 46003853     0.00     0.00  free
  0.55    106.65     0.82    35201     0.00     0.00  Tcl_SplitObjCmd
  0.55    107.47     0.82 34336900     0.00     0.00  Tcl_ListObjLength
  0.54    108.28     0.81 24284081     0.00     0.00  HashStringKey
  0.54    109.09     0.81     3073     0.00     0.00  brk
  0.53    109.89     0.80  4676175     0.00     0.00  TclLookupSimpleVar
  0.51    110.66     0.76 46612082     0.00     0.00  imalloc
  0.49    111.39     0.74   123229     0.00     0.00  open
  0.44    112.06     0.67  9766679     0.00     0.00  TclFindElement
  0.44    112.72     0.67  1512790     0.00     0.00  Tcl_GetsObj
  0.43    113.37     0.65  4512497     0.00     0.00  TclFormatInt
  0.41    113.99     0.62 84895926     0.00     0.00  Tcl_UniCharToUtf
  0.40    114.59     0.60 23306949     0.00     0.00  CompareStringKeys
  0.39    115.17     0.58 13720118     0.00     0.00  TclPtrSetVar
  0.39    115.75     0.58  9477816     0.00     0.00  TclLindexFlat
  0.39    116.33     0.58   184250     0.00     0.00  chdir
  0.38    116.91     0.57  8107886     0.00     0.00  Tcl_FindHashEntry
  0.38    117.47     0.57    52545     0.00     0.00  Tcl_RegsubObjCmd
  0.37    118.03     0.56 14865956     0.00     0.00  TclLookupArrayElement
  0.35    118.56     0.53 45837682     0.00     0.00  TclpAlloc
  0.35    119.09     0.53  1231201     0.00     0.00  shortest
  0.33    119.59     0.50 26374110     0.00     0.00  SetStringFromAny
  0.33    120.09     0.50 21594049     0.00     0.00  TclpCheckStackSpace
  0.32    120.57     0.48 45809952     0.00     0.00  Tcl_Free
  0.32    121.05     0.48 45809952     0.00     0.00  TclpFree
  0.30    121.50     0.46  2977677     0.00     0.00  Tcl_BinaryObjCmd
  0.30    121.95     0.45  4379643     0.00     0.00  strstr
  0.30    122.41     0.45    14118     0.00     0.00  write
  0.29    122.84     0.44 28352669     0.00     0.00  TclGetIntForIndex
  0.27    123.25     0.40 21594049     0.00     0.00  TclInterpReady
  0.27    123.64     0.40    74603     0.00     0.00  read
  0.26    124.03     0.39  3884612     0.00     0.00  TclDeleteCompiledLocalVars
  0.26    124.42     0.39  7886398     0.00     0.00  getvacant
  0.25    124.80     0.38  5508812     0.00     0.00  TclObjLookupVar
  0.25    125.18     0.38   101609     0.00     0.00  __getcwd
  0.25    125.56     0.38  2000100     0.00     0.00  Tcl_SetByteArrayLength
  0.25    125.92     0.37 26304535     0.00     0.00  Tcl_AsyncReady
  0.25    126.30     0.37 17684600     0.00     0.00  Tcl_DStringSetLength
  0.24    126.66     0.36 17779843     0.00     0.00  Tcl_GetObjResult
  0.23    127.01     0.35   123481     0.00     0.00  close
  0.23    127.35     0.34  6408784     0.00     0.00  ScanNumber
  0.23    127.69     0.34  3096591     0.00     0.00  Tcl_ExternalToUtf
  0.22    128.02     0.33 17229166     0.00     0.00  Tcl_NewLongObj
  0.22    128.34     0.33  8490527     0.00     0.00  Tcl_UniCharToUpper
  0.22    128.67     0.33  7897768     0.00     0.00  SortCompare
  0.21    128.99     0.32    53161     0.00     0.00  access
  0.21    129.31     0.32 17085127     0.00     0.00  TclCleanupCommand
  0.21    129.62     0.31 11585918     0.00     0.00  Tcl_ListObjGetElements
  0.20    129.92     0.30 14622665     0.00     0.00  TclFreeObj
  0.20    130.22     0.30    89167     0.00     0.00  __lstat13
  0.19    130.51     0.29 26299501     0.00     0.00  Tcl_LimitReady
  0.19    130.80     0.29 16778728     0.00     0.00  SelectObjFromSublist
  0.19    131.09     0.29   173882     0.00     0.00  malloc_make_chunks
  0.19    131.37     0.28    89217     0.00     0.00  fchdir
  0.18    131.65     0.28 21603229     0.00     0.00  Tcl_LimitExceeded
  0.18    131.91     0.27  4524251     0.00     0.00  TclCompEvalObj
  0.18    132.19     0.27  4129276     0.00     0.00  Tcl_AppendObjToObj
  0.17    132.44     0.26  7877404     0.00     0.00  TclStackAlloc
  0.17    132.71     0.26  1997210     0.00     0.00  AllocStringEntry
  0.17    132.96     0.25  6229846     0.00     0.00  Tcl_NewStringObj
  0.17    133.21     0.25  3282618     0.00     0.00  Tcl_GetAssocData
  0.16    133.44     0.24  1993138     0.00     0.00  ExprRandFunc
  0.16    133.68     0.23  3671835     0.00     0.00  Tcl_SetObjLength
  0.15    133.90     0.22  7877402     0.00     0.00  TclStackFree
  0.15    134.12     0.22  4836977     0.00     0.00  Tcl_ListObjIndex
  0.15    134.34     0.22    16968     0.00     0.00  CopyAndTranslateBuffer
  0.14    134.56     0.21  8598568     0.00     0.00  AppendUnicodeToUtfRep
  0.14    134.77     0.21  5894322     0.00     0.00  Tcl_NewIntObj
  0.14    134.98     0.21 11227377     0.00     0.00  Tcl_AppendUnicodeToObj
  0.13    135.18     0.20  1617303     0.00     0.00  Tcl_ListObjReplace
  0.13    135.38     0.20    24868     0.00     0.00  TranslateInputEOL
  0.13    135.57     0.20     3453     0.00     0.00  TclThreadDataKeyGet
  0.13    135.76     0.19 14908149     0.00     0.00  Tcl_UtfToUniChar
  0.13    135.96     0.19  9478817     0.00     0.00  TclLindexList
  0.13    136.15     0.19  2497064     0.00     0.00  TclLookupVar
  0.13    136.34     0.19  1752393     0.00     0.00  ExprIntFunc
  0.12    136.52     0.18  1567820     0.00     0.00  FilterInputBytes
  0.12    136.70     0.18  3610585     0.00     0.00  Tcl_NewDoubleObj
  0.12    136.88     0.18  3279889     0.00     0.00  Tcl_GetChannel
  0.12    137.06     0.18  2494299     0.00     0.00  TclVarTraceExists
  0.12    137.24     0.18   809092     0.00     0.00  Tcl_AttemptSetObjLength
  0.12    137.41     0.17  8627161     0.00     0.00  Tcl_GetLongFromObj
  0.12    137.59     0.17  4377459     0.00     0.00  UpdateStringOfInt
  0.11    137.76     0.17  7256550     0.00     0.00  Tcl_GetIndexFromObj
  0.11    137.93     0.17  3936217     0.00     0.00  TclProcCompileProc
  0.11    138.09     0.16  1998880     0.00     0.00  UpdateStringOfByteArray
  0.11    138.25     0.16   374304     0.00     0.00  Tcl_ScanObjCmd
  0.10    138.41     0.15  9624841     0.00     0.00  Tcl_DStringFree
  0.10    138.56     0.15  4801045     0.00     0.00  Tcl_NewWideIntObj
  0.10    138.72     0.15  3678888     0.00     0.00  Tcl_GetCharLength
  0.10    138.87     0.15  6441980     0.00     0.00  Tcl_GetDoubleFromObj
  0.10    139.01     0.14  3936939     0.00     0.00  Tcl_PushCallFrame
  0.09    139.15     0.14   716160     0.00     0.00  __qdivrem
  0.09    139.29     0.14   209406     0.00     0.00  vfprintf_unlocked
  0.09    139.43     0.14    16882     0.00     0.00  Tcl_UtfToUpper
  0.09    139.56     0.14  9765386     0.00     0.00  TclCopyAndCollapse
  0.09    139.70     0.14  3521845     0.00     0.00  Tcl_ObjSetVar2
  0.09    139.84     0.14                             L2
  0.09    139.97     0.13  9627551     0.00     0.00  Tcl_DStringInit
  0.09    140.09     0.13  8490706     0.00     0.00  UtfCount
  0.09    140.22     0.13  4031242     0.00     0.00  Tcl_NewObj
  0.09    140.35     0.13    23583     0.00     0.00  Tcl_NumUtfChars
  0.08    140.47     0.12  4415780     0.00     0.00  Tcl_NextHashEntry
  0.08    140.59     0.12  3081154     0.00     0.00  FormatNumber
  0.08    140.72     0.12  2121681     0.00     0.00  memset
  0.08    140.84     0.12   110080     0.00     0.00  FillUnicodeRep
  0.07    140.94     0.11  5158785     0.00     0.00  GetFormatSpec
  0.07    141.06     0.11  3936939     0.00     0.00  TclPushStackFrame
  0.07    141.16     0.11  3936938     0.00     0.00  Tcl_PopCallFrame
  0.07    141.28     0.11  2494299     0.00     0.00  InfoExistsCmd
  0.07    141.38     0.11  1237399     0.00     0.00  Tcl_LreplaceObjCmd
  0.07    141.50     0.11  1231201     0.00     0.00  find
  0.07    141.60     0.11  1022100     0.00     0.00  Tcl_DeleteHashTable
  0.07    141.72     0.11   862680     0.00     0.00  Tcl_Tell
  0.07    141.82     0.11                             Tcl_DbCkfree
  0.07    141.93     0.10  2496778     0.00     0.00  Tcl_InfoObjCmd
  0.07    142.03     0.10   873733     0.00     0.00  longest
  0.07    142.13     0.10  2104934     0.00     0.00  initialize
  0.07    142.24     0.10  1648122     0.00     0.00  strtoul
  0.07    142.34     0.10  1527933     0.00     0.00  FileWatchProc
  0.07    142.44     0.10  1306318     0.00     0.00  MergeLists
  0.07    142.53     0.10   692920     0.00     0.00  Tcl_DuplicateObj
  0.07    142.63     0.10   594475     0.00     0.00  Tcl_EvalObjEx
  0.07    142.74     0.10   374304     0.00     0.00  ValidateFormat
  0.07    142.84     0.10   295163     0.00     0.00  Tcl_FormatObjCmd
  0.07    142.94     0.10    19709     0.00     0.00  RebuildTable
  0.06    143.03     0.10  2290165     0.00     0.00  Tcl_ScanCountedElement
  0.06    143.12     0.10  1511163     0.00     0.00  Tcl_GetsObjCmd
  0.06    143.22     0.09  1231201     0.00     0.00  TclReExec
  0.06    143.31     0.09   862680     0.00     0.00  Tcl_TellObjCmd
  0.06    143.40     0.09   606234     0.00     0.00  free_pages

(Further entries omitted)


VK 2005-03-03: It is no surprise that "malloc_bytes" is used often, and I, personally, tried and succeeded following trick. This trick takes into collaboration Perl, and I performed this trick on Win32. I compiled Perl and Tcl/Tk to use common memory allocator. I used Perl's memory allocator when building Perl, and replaced in tclAlloc.c call to malloc with Perl_malloc (and 2 other: free and realloc)

After building this, I got Tcl.dll dependent on Perl.dll (because now Tcl uses Perl's allocator) and gained faster Tcl.

It appears that

  1. Perl's memory allocator is faster than Tcl's, and I think it is quite normal to benefit from this fact.
  2. This trick works for building on PocketPC, and Tcl/Tk became visually much faster with this approach.
  3. Rebuilding Tk after this trick with Tcl was not necessary.

DKF: It seems non-trivial to find out what is going on inside Perl without grabbing the whole source tree; haven't those people heard of online-browsable source trees? (Or is that something that Perforce doesn't support?)

Aside from that, on point 3 I'd just like to note that Tk is rigorous about only ever using Tcl's memory allocation API. The mess that is Win32 compilation environments is largely responsible too... :^/