MS (Miguel Sofer):
While trying to improve Tcl's results in The Great Computer Language Shootout [L1 ] (page from Internet Archive), I did manage to improve the speed of one particular benchmark by a factor of 10. I thought it might possibly be interesting to document what I did ...
The problem
The program defines a pseudo-random number generator, and then generates a large number (900000 in The shootout, 100000 here) of pseudo-random numbers.
The most accurate description is code: this is the original program in [L2 ] (remark that it was updated since then to incorporate some of the improvements described here):
#!/usr/local/bin/tclsh # Program 0: 100% runtime set IM 139968 set IA 3877 set IC 29573 set last 42 proc gen_random {max} { global IM IA IC last set last [expr {($last * $IA + $IC) % $IM}] expr {$max * $last / $IM} } proc main {} { global argv set N [lindex $argv 0] set result 0.0 while {$N} { set result [gen_random 100.0] incr N -1 } puts [format "%.13f" $result] } main
The results (to get you to read further??)
Running the original Program 0 and the succesive improvements on this page reduces the runtime by a factor 10 (almost, anyway ...)
% runall 6 100000 Program 0: 13853369 microseconds (100.00 %) - Result is 56.5686442615455 Program 1: 12506569 microseconds ( 90.00 %) - Result is 56.5686442615455 Program 2: 5555411 microseconds ( 40.00 %) - Result is 56.5686442615455 Program 3: 5444521 microseconds ( 39.00 %) - Result is 56.5686442615455 Program 4: 4849454 microseconds ( 35.00 %) - Result is 56.5686442615455 Program 5: 1686481 microseconds ( 12.00 %) - Result is 56.5686442615455 Program 6: 1592469 microseconds ( 11.00 %) - Result is 56.5686442615455
Properties of the problem
Improving on Program 0
1. Global variables are expensive (but see footnote 1); let us reduce the number of global variable accesses by passing the parameters in a list.
#!/usr/local/bin/tclsh # Program 1: 90% runtime set params {139968 3877 29573} set last 42 proc gen_random {max} { global params last foreach {IM IA IC} $params {} set last [expr {($last * $IA + $IC) % $IM}] expr {$max * $last / $IM} } proc main {} { global argv set N [lindex $argv 0] set result 0.0 while {$N} { set result [gen_random 100.0] incr N -1 } puts [format "%.13f" $result] } main
2. Well, that was nice. Let us reduce global variable accesses still further and pass the parameters as arguments:
#!/usr/local/bin/tclsh # Program 2: 40% runtime set params {139968 3877 29573} set last 42 proc gen_random {max IM IA IC} { global last set last [expr {($last * $IA + $IC) % $IM}] expr {$max * $last / $IM} } proc main {} { global argv params set N [lindex $argv 0] foreach {IM IA IC} $params {} set result 0.0 while {$N} { set result [gen_random 100.0 $IM $IA $IC] incr N -1 } puts [format "%.13f" $result] } main
3. Amazing! Are we still being hit by unnecessary global variable accesses? There's one easy reduction: do not read the value of 'last' after setting it, use the result of the expr directly:
#!/usr/local/bin/tclsh # Program 3: 39% runtime set params {139968 3877 29573} set last 42 proc gen_random {max IM IA IC} { global last expr {$max * [set last [expr {($last * $IA + $IC) % $IM}]] / $IM} } proc main {} { global argv params set N [lindex $argv 0] foreach {IM IA IC} $params {} set result 0.0 while {$N} { set result [gen_random 100.0 $IM $IA $IC] incr N -1 } puts [format "%.13f" $result] } main
4. There is no more milk in that cow ... The parameters (IM,IA,IC) do not vary from call to call; how about inlining their values? But we still want to be able to change them and have the program use the new values automatically, so let us update the generator with a trace:
#!/usr/local/bin/tclsh # Program 4: 35% runtime trace variable params w make_gen_random proc make_gen_random {args} { global params foreach {IM IA IC} $params {} set parsub [list \$IM $IM \$IA $IA \$IC $IC] set body [string map $parsub { global last expr {($max * [set last [expr {($last * $IA + $IC) % $IM}]]) / $IM} }] proc gen_random max $body } set params {139968 3877 29573} set last 42 proc main {} { global argv set N [lindex $argv 0] set result 0.0 while {$N} { set result [gen_random 100.0] incr N -1 } puts [format "%.13f" $result] } main
5. Cannot reduce variable accesses anymore. Are we being hit by the function call overhead? Let us try to get rid of it by inlining the complete generator:
#!/usr/local/bin/tclsh # Program 5: 12% runtime trace variable params w make_main proc make_main {args} { global params set randBody [string map $params { expr {(100.0 * [set last [expr {($last * IA + IC) % IM}]]) / IM} }] set mainBody [string map [list randBody $randBody] { global argv last set N [lindex $argv 0] set result 0.0 while {$N} { set result [randBody] incr N -1 } puts [format "%.13f" $result] }] proc main {} $mainBody } set params {IM 139968 IA 3877 IC 29573} set last 42 main
6. Come to think of it, there is one more unnecessary variable access in the loop: we 'incr N -1', then read the result instead of using it directly. What if we get rid of that too?
#!/usr/local/bin/tclsh # Program 6: 11% runtime trace variable params w make_main proc make_main {args} { global params set randBody [string map $params { expr {(100.0 * [set last [expr {($last * IA + IC) % IM}]]) / IM} }] set mainBody [string map [list randBody $randBody] { global argv last set N [lindex $argv 0] set result 0.0 incr N while {[incr N -1]} { set result [randBody] } puts [format "%.13f" $result] }] proc main {} $mainBody } set params {IM 139968 IA 3877 IC 29573} set last 42 main
7. I'm out of ideas now ...
DGP How does this do?
#!/usr/local/bin/tclsh # Program 7: use built-in functions expr {srand(42)} proc gen_random {max} { expr {$max * rand()} } proc main {} { global argv set N [lindex $argv 0] set result 0.0 while {$N} { set result [gen_random 100.0] incr N -1 } puts [format "%.13f" $result] } main
MS Much better, I bet - except that it is immediately disqualified from the competition, which specified (up to a point) the way things were to be coded. AFAIR, the shootout's 'umpire' only accepted version 4.
Donald Arseneau I don't suppose the umpire imposed a -noinline restriction on the compiled languages. I am amazed at the improvement you achieved on what was not originally a terrible implementation (expressions in braces etc.)
Why was this possible ? Can you do this for any program ?
I wish I could ...
The original 'Program 0' did not take into account the peculiarities of the programming language, and their impact in light of the peculiarities of the problem.
This here was easy in a sense: the simplicity and artificiality of this (and every other?) benchmark program simplifies the analysis. Real problems are typically nastier ...
I believe the above is a nice illustration of the wonders of event-based programming: the first solutions (up to Program 3) implement solutions that are able to handle any eventuality; from then on, we implement solutions to the particular problem at hand, and react to the event the problem has changed by adapting the implementation to the new particular problem at hand. This strategy is likely to payoff whenever the event does not occur all too often, as was the case here.
Footnotes
proc gen_random {max} { set last [expr {($::last * $::IA + $::IC) % $::IM}] expr {$max * $::last / $::IM} }