Optimized compilation of tcl

MS I ran the tclbench suite on tclsh compiled with three different compilers and several optimisation combinations. This page summarizes the results.

These tests were run on a PIII/600Mhz/192MB laptop running linux RedHat7.2.

The compilers were:

Results

SPEED SIZE COMPILER OPTIONS
1.00 1.00 gcc2.96 -O -march=pentiumpro
1.05 1.00 gcc2.96 -O -march=pentiumpro -fomit-frame-pointer
1.01 1.01 gcc2.96 -O2 -march=pentiumpro
1.07 1.02 gcc2.96 -O2 -march=pentiumpro -fomit-frame-pointer
1.01 0.97 gcc2.96 -Os -march=pentiumpro
1.05 0.98 gcc2.96 -Os -march=pentiumpro -fomit-frame-pointer
0.99 1.07 gcc3.1 -O -march=pentium3
1.03 1.08 gcc3.1 -O -march=pentium3 -fomit-frame-pointer
1.02 1.12 gcc3.1 -O2 -march=pentium3
1.06 1.13 gcc3.1 -O2 -march=pentium3 -fomit-frame-pointer
1.03 1.14 gcc3.1 -O3 -march=pentium3
1.08 1.15 gcc3.1 -O3 -march=pentium3 -fomit-frame-pointer
1.04 0.97 gcc3.1 -Os -march=pentium3
1.06 0.97 gcc3.1 -Os -march=pentium3 -fomit-frame-pointer
1.06 1.24 icc6.0 -O3 -xK
1.11 1.47 icc6.0 -O3 -xK -ip

Conclusions (?)

  • The "-fomit-frame-pointer" flag produces faster code with gcc. It is a question if it is worth the loss of a traceable core file - tcl shouldn't dump core
  • The default optimisation flag for gcc "-O" seems suboptimal; both GNU compilers produce faster and smaller code with "-Os"
  • The new gcc3.1 is not a big improvement on 2.96 for our purposes.
  • Intel's compiler with "-ip" produces slightly faster code than gcc (as measured by tclbench), but a much larger image. Otherwise, icc produces larger but not faster code.

Notes

  • These were all static builds of tclsh from the current (01-22-02) HEAD
  • The data presented is size/speed relative to the reference build "gcc2.96 -O". This produced a 702kB tclsh which ran the tclbench suite in 00:04:35.
  • All compilers were set to produce binaries exploiting the processors features ("-march" and "-x" flags).
  • I have not checked for the intel equivalent to gcc's "-Os" flag.
  • The "-fomit-frame-pointer" flag to gcc produces code that is non-debuggable - the stack trace in core files is not usable. This behaviour is also present (I think) in the optimised code produced by icc.

Brett Schwarz These links may be of interest as well: http://www.coyotegulch.com/acovea/index.html http://freshmeat.net/articles/view/730/