Version 1 of Optimized compilation of tcl

Updated 2002-07-22 15:26:57

MS I ran the tclbench suite on tclsh compiled with three different compilers and several optimisation combinations. This page summarizes the results.

These tests were run on a PIII/600Mhz/192MB laptop running linux RedHat7.2.

The compilers were:

Results

  SPEED   SIZE  COMPILER
  1.00    1.00   gcc2.96 -O  -march=pentiumpro
  1.05    1.00   gcc2.96 -O  -march=pentiumpro -fomit-frame-pointer
  1.01    1.01   gcc2.96 -O2 -march=pentiumpro
  1.07    1.02   gcc2.96 -O2 -march=pentiumpro -fomit-frame-pointer
  1.01    0.97   gcc2.96 -Os -march=pentiumpro
  1.05    0.98   gcc2.96 -Os -march=pentiumpro -fomit-frame-pointer
  0.99    1.07   gcc3.1  -O  -march=pentium3
  1.03    1.08   gcc3.1  -O  -march=pentium3 -fomit-frame-pointer
  1.02    1.12   gcc3.1  -O2 -march=pentium3
  1.06    1.13   gcc3.1  -O2 -march=pentium3 -fomit-frame-pointer
  1.03    1.14   gcc3.1  -O3 -march=pentium3
  1.08    1.15   gcc3.1  -O3 -march=pentium3 -fomit-frame-pointer
  1.04    0.97   gcc3.1  -Os -march=pentium3
  1.06    0.97   gcc3.1  -Os -march=pentium3 -fomit-frame-pointer
  1.11    1.47   icc6.0  -O3 -xK -ip

Conclusions (?)

  • The "-fomit-frame-pointer" flag produces faster code with gcc. It is a question if it is worth the loss of a traceable core file - tcl shouldn't dump core
  • The default optimisation flag for gcc "-O" seems suboptimal; both GNU compilers produce faster and smaller code with "-Os"
  • Intel's compiler produces slightly faster code than gcc (as measured by tclbench), but a much larger image (in the only tested configuration).

Notes

  • These were all static builds of tclsh from the current (01-22-02) HEAD
  • The data presented is size/speed relative to the reference build "gcc2.96 -O". This produced a 702kB tclsh which ran the tclbench suite in 00:04:35.
  • All compilers were set to produce binaries exploiting the processors features ("-march" and "-x" flags).
  • The intel compiler was benchmarked in a single configuration, which I suppose gives the best optimisation. I have not checked for the intel equivalent to gcc's "-Os" flag.
  • The "-fomit-frame-pointer" flag to gcc produces code that is non-debuggable - the stack trace in core files is not usable. This behaviour is also present (I think) in the optimised code produced by icc.