MS I ran the tclbench suite on tclsh compiled with three different compilers and several optimisation combinations. This page summarizes the results.
These tests were run on a PIII/600Mhz/192MB laptop running linux RedHat7.2.
The compilers were:
Results
SPEED SIZE COMPILER
1.00 1.00 gcc2.96 -O -march=pentiumpro
1.05 1.00 gcc2.96 -O -march=pentiumpro -fomit-frame-pointer
1.01 1.01 gcc2.96 -O2 -march=pentiumpro
1.07 1.02 gcc2.96 -O2 -march=pentiumpro -fomit-frame-pointer
1.01 0.97 gcc2.96 -Os -march=pentiumpro
1.05 0.98 gcc2.96 -Os -march=pentiumpro -fomit-frame-pointer
0.99 1.07 gcc3.1 -O -march=pentium3
1.03 1.08 gcc3.1 -O -march=pentium3 -fomit-frame-pointer
1.02 1.12 gcc3.1 -O2 -march=pentium3
1.06 1.13 gcc3.1 -O2 -march=pentium3 -fomit-frame-pointer
1.03 1.14 gcc3.1 -O3 -march=pentium3
1.08 1.15 gcc3.1 -O3 -march=pentium3 -fomit-frame-pointer
1.04 0.97 gcc3.1 -Os -march=pentium3
1.06 0.97 gcc3.1 -Os -march=pentium3 -fomit-frame-pointer
1.11 1.47 icc6.0 -O3 -xK -ip
Conclusions (?)
- The "-fomit-frame-pointer" flag produces faster code with gcc. It is a question if it is worth the loss of a traceable core file - tcl shouldn't dump core
- The default optimisation flag for gcc "-O" seems suboptimal; both GNU compilers produce faster and smaller code with "-Os"
- Intel's compiler produces slightly faster code than gcc (as measured by tclbench), but a much larger image (in the only tested configuration).
Notes
- These were all static builds of tclsh from the current (01-22-02) HEAD
- The data presented is size/speed relative to the reference build "gcc2.96 -O". This produced a 702kB tclsh which ran the tclbench suite in 00:04:35.
- All compilers were set to produce binaries exploiting the processors features ("-march" and "-x" flags).
- The intel compiler was benchmarked in a single configuration, which I suppose gives the best optimisation. I have not checked for the intel equivalent to gcc's "-Os" flag.
- The "-fomit-frame-pointer" flag to gcc produces code that is non-debuggable - the stack trace in core files is not usable. This behaviour is also present (I think) in the optimised code produced by icc.