Can you run this benchmark 10 times faster

MS (Miguel Sofer):

While trying to improve Tcl's results in The Great Computer Language Shootout [L1 ] (page from Internet Archive), I did manage to improve the speed of one particular benchmark by a factor of 10. I thought it might possibly be interesting to document what I did ...


The problem

The program defines a pseudo-random number generator, and then generates a large number (900000 in The shootout, 100000 here) of pseudo-random numbers.

The most accurate description is code: this is the original program in [L2 ] (remark that it was updated since then to incorporate some of the improvements described here):

 #!/usr/local/bin/tclsh
 # Program 0: 100% runtime

 set IM 139968
 set IA 3877
 set IC 29573
 set last 42

 proc gen_random {max} {
     global IM IA IC last
     set last [expr {($last * $IA + $IC) % $IM}]
     expr {$max * $last / $IM}
 } 
 proc main {} {
     global argv 
     set N [lindex $argv 0]
     set result 0.0
     while {$N} {
        set result [gen_random 100.0]
        incr N -1
     }
     puts [format "%.13f" $result]
 }

 main

The results (to get you to read further??)

Running the original Program 0 and the succesive improvements on this page reduces the runtime by a factor 10 (almost, anyway ...)

 % runall 6 100000
 Program 0:  13853369 microseconds (100.00 %) - Result is 56.5686442615455
 Program 1:  12506569 microseconds ( 90.00 %) - Result is 56.5686442615455
 Program 2:   5555411 microseconds ( 40.00 %) - Result is 56.5686442615455
 Program 3:   5444521 microseconds ( 39.00 %) - Result is 56.5686442615455
 Program 4:   4849454 microseconds ( 35.00 %) - Result is 56.5686442615455
 Program 5:   1686481 microseconds ( 12.00 %) - Result is 56.5686442615455
 Program 6:   1592469 microseconds ( 11.00 %) - Result is 56.5686442615455

Properties of the problem

  • A lot of calls to a very small function: variable access times & function call overhead might be relevant
  • Need to keep up with changes in some parameters that do not change often: if (IM,IA,IC) change, the random numbers should reflect the change immediately; however, they are almost constant (really constant for this main)

Improving on Program 0

1. Global variables are expensive (but see footnote 1); let us reduce the number of global variable accesses by passing the parameters in a list.

 #!/usr/local/bin/tclsh
 # Program 1: 90% runtime

 set params {139968 3877 29573}
 set last 42

 proc gen_random {max} {
     global params last
     foreach {IM IA IC} $params {}
     set last [expr {($last * $IA + $IC) % $IM}]
     expr {$max * $last / $IM}
 } 
 proc main {} {
     global argv 
     set N [lindex $argv 0]
     set result 0.0
     while {$N} {
        set result [gen_random 100.0]
        incr N -1
     }
     puts [format "%.13f" $result]
 }

 main

2. Well, that was nice. Let us reduce global variable accesses still further and pass the parameters as arguments:

 #!/usr/local/bin/tclsh
 # Program 2: 40% runtime 

 set params {139968 3877 29573}
 set last 42

 proc gen_random {max IM IA IC} {
     global last
     set last [expr {($last * $IA + $IC) % $IM}]
     expr {$max * $last / $IM}
 } 
 proc main {} {
     global argv params
     set N [lindex $argv 0]
     foreach {IM IA IC} $params {}
     set result 0.0
     while {$N} {
        set result [gen_random 100.0 $IM $IA $IC]
        incr N -1
     }
     puts [format "%.13f" $result]
 }

 main

3. Amazing! Are we still being hit by unnecessary global variable accesses? There's one easy reduction: do not read the value of 'last' after setting it, use the result of the expr directly:

 #!/usr/local/bin/tclsh
 # Program 3: 39% runtime  

 set params {139968 3877 29573}
 set last 42

 proc gen_random {max IM IA IC} {
     global last
     expr {$max * [set last [expr {($last * $IA + $IC) % $IM}]] / $IM}
 } 
 proc main {} {
     global argv params
     set N [lindex $argv 0]
     foreach {IM IA IC} $params {}
     set result 0.0
     while {$N} {
        set result [gen_random 100.0 $IM $IA $IC]
        incr N -1
     }
     puts [format "%.13f" $result]
 }

 main

4. There is no more milk in that cow ... The parameters (IM,IA,IC) do not vary from call to call; how about inlining their values? But we still want to be able to change them and have the program use the new values automatically, so let us update the generator with a trace:

 #!/usr/local/bin/tclsh
 # Program 4: 35% runtime   

 trace variable params w make_gen_random
 proc make_gen_random {args} {
     global params
     foreach {IM IA IC} $params {}
     set parsub [list \$IM $IM \$IA $IA \$IC $IC]
     set body [string map $parsub {
        global last
         expr {($max * [set last [expr {($last * $IA + $IC) % $IM}]]) / $IM}
     }]
     proc gen_random max $body
 }

 set params {139968 3877 29573}
 set last 42

 proc main {} {
     global argv
     set N [lindex $argv 0]
     set result 0.0
     while {$N} {
         set result [gen_random 100.0]
         incr N -1
     }
     puts [format "%.13f" $result]
 }

 main

5. Cannot reduce variable accesses anymore. Are we being hit by the function call overhead? Let us try to get rid of it by inlining the complete generator:

 #!/usr/local/bin/tclsh
 # Program 5: 12% runtime    

 trace variable params w make_main
 proc make_main {args} {
     global params
     set randBody [string map $params {
         expr {(100.0 * [set last [expr {($last * IA + IC) % IM}]]) / IM}
     }]
     set mainBody [string map [list randBody $randBody] {
        global argv last
        set N [lindex $argv 0]
        set result 0.0
        while {$N} {
            set result [randBody]
            incr N -1
        }
        puts [format "%.13f" $result]
     }]
     proc main {} $mainBody     
 }

 set params {IM 139968 IA 3877 IC 29573}
 set last 42

 main

6. Come to think of it, there is one more unnecessary variable access in the loop: we 'incr N -1', then read the result instead of using it directly. What if we get rid of that too?

 #!/usr/local/bin/tclsh
 # Program 6: 11% runtime      

 trace variable params w make_main
 proc make_main {args} {
     global params
     set randBody [string map $params {
         expr {(100.0 * [set last [expr {($last * IA + IC) % IM}]]) / IM}
     }]
     set mainBody [string map [list randBody $randBody] {
        global argv last
        set N [lindex $argv 0]
        set result 0.0
        incr N
        while {[incr N -1]} {
            set result [randBody]
        }
        puts [format "%.13f" $result]
     }]
     proc main {} $mainBody     
 }

 set params {IM 139968 IA 3877 IC 29573}
 set last 42

 main

7. I'm out of ideas now ...

DGP How does this do?

 #!/usr/local/bin/tclsh
 # Program 7: use built-in functions

 expr {srand(42)}
 proc gen_random {max} {
     expr {$max * rand()}
 }
 proc main {} {
     global argv
     set N [lindex $argv 0]
     set result 0.0
     while {$N} {
        set result [gen_random 100.0]
        incr N -1
     }
     puts [format "%.13f" $result]
 }

 main

MS Much better, I bet - except that it is immediately disqualified from the competition, which specified (up to a point) the way things were to be coded. AFAIR, the shootout's 'umpire' only accepted version 4.

Donald Arseneau I don't suppose the umpire imposed a -noinline restriction on the compiled languages. I am amazed at the improvement you achieved on what was not originally a terrible implementation (expressions in braces etc.)


Why was this possible ? Can you do this for any program ?

I wish I could ...

The original 'Program 0' did not take into account the peculiarities of the programming language, and their impact in light of the peculiarities of the problem.

This here was easy in a sense: the simplicity and artificiality of this (and every other?) benchmark program simplifies the analysis. Real problems are typically nastier ...

I believe the above is a nice illustration of the wonders of event-based programming: the first solutions (up to Program 3) implement solutions that are able to handle any eventuality; from then on, we implement solutions to the particular problem at hand, and react to the event the problem has changed by adapting the implementation to the new particular problem at hand. This strategy is likely to payoff whenever the event does not occur all too often, as was the case here.


Footnotes

  1. Globals are expensive if you call them only a few times in the proc - the cost of the initial linking to a local variable weighs heavily; they are quite fast if you use them a lot [the breakeven seems to be around 15 reads ...]. For fewer calls, use fully qualified variable names instead. Doing this in Program 0 (shown below) already reduces the runtime to 54% !
      proc gen_random {max} {
          set last [expr {($::last * $::IA + $::IC) % $::IM}]
          expr {$max * $::last / $::IM}
      }

Category Performance