Binomial Coefficients

MS: moved here the discussion/proposals from Additional math functions

KBK Before cutting-and-pasting these procedures into your code, you probably ought to go have a look at the 'math::combinatorics' module of tcllib. It offers, as of this writing, factorials and their logarithms, binomial coefficients, the Beta and Gamma functions (and the logarithm of the Gamma function), and several others. Incomplete gamma, and normal, chi-square, Poisson, hypergeometric, and F distributions are planned but not yet implemented.

AM (11 january 2007) Many of the statistical distributions mentioned above are available via the statistics module in Tcllib

Binomial coefficient:

 proc over {n k} {
    # binomial coefficient: e.g. [over 5 3] = (5*4*3)/(1*2*3) = 10
        set k [expr {(($n-$k) < $k) ? $n-$k : $k}] 
        if {$k==0} {return 1}
        set num $n
        set den $k
        while {$k>1} {
                incr n -1
                incr k -1
                set num [expr {$num*$n}]
                set den [expr {$den*$k}]
        }
        expr {$num/$den}
 } ;#RS

Note that combinatoric functions are notorious for rewarding algorithmic "trickiness". In this case, the multiple "expr" invocations will toggle to floating point when n is sufficiently large. This is both slow and inaccurate, in general, although certainly quite adequate for many common cases of interest.

Higher precision is available with algorithms which exploit common factors repeated in the numerator and denominator (and there will be many of them for most calculations). CL will make a point some day of returning here to offer an [over] written in terms of [primefactors] (see below). - RS: Here's a variant which uses mostly lists: collect factors for num and den, remove from num all that are in den, finally computes one integer expression:

 proc over2 {n k} {
    # binomial coefficient: e.g. [over 5 3] = (5*4*3)/(1*2*3) = 10
        set k [expr {(($n-$k) < $k) ? $n-$k : $k}] 
        if {$k==0} {return 1}
        set num $n
        set den $k
        while {$k>1} {
                lappend num [incr n -1]
                lappend den [incr k -1]
        }
        set t 1 ;# neutral element of multiplication
        foreach i $num {
           if {[set pos [lsearch $den $i]]<0} {
                lappend t $i
           } else {
                set den [lreplace $den $pos $pos]
           }
        }
        expr [join $t *]/([join $den *])
 }

Recursive binomial calculations can be surprisingly well-conditioned (someone compare performance on these):

    proc binom {m n} {
        set diff [expr $m - $n]
        if {$diff < $n} {
            return [binom $m $diff]
        }
        switch $n {
            0 {
                return 1
            }
            1 {
                return $m
            }
            default {
                    # This could use a bit of common-factor elimination.
                return [expr [binom [expr $m - 1] [expr $n - 1]] * $m / $n]
            }
        }
    }

MS: This looks like too much of a good thing, you do not need to recurse over both variables. Recursing over just m gives the following algorithm.

 proc binom2 {m n} {
     set n [expr {(($m-$n) > $n) ? $m-$n : $n}]
     set k [expr {$m - $n}] 
     if {$k < 0} {return 0}
     switch $k {
         0 {return 1}
         1 {return $m}
         default {
             return [expr {($m*[binom [expr {$m - 1}] $n])/$k}]
         }
     }
 } ;#MS

Both recursive algorithms are stable in the sense of not producing unnecessarily large intermediate results. If you are interested in performance, avoid recursion and [switch]. An iterative implementation of binom2 is:

 proc binom3 {m n} {
     set n [expr {(($m-$n) > $n) ? $m-$n : $n}]

     if {$n > $m}  {return 0}
     if {$n == $m} {return 1}

     set res 1
     set d 0
     while {$n < $m} {
         set res [expr {($res*[incr n])/[incr d]}]
     }
     set res
 } ;#MS

Remark that this is very much the same as over, with the loop being traversed in reverse order and avoiding the accumulation in den and num which causes an earlier overflow.

---

MS: So, which is best? While trying to compute all coefficients in the expansion of (a+b)^20, both [over] and [over2] fail to produce correct results as some intermediate computations overflow. The timing of the recursive/iterative solutions gives:

 % time {for {set i 0} {$i <= 20} {incr i} {set x [binom 20 $i]}}
 51716 microseconds per iteration
 % time {for {set i 0} {$i <= 20} {incr i} {set x [binom2 20 $i]}}
 47071 microseconds per iteration
 % time {for {set i 0} {$i <= 20} {incr i} {set x [binom3 20 $i]}}
 3490 microseconds per iteration

---

 % time {for {set i 0} {$i <= 20} {incr i} {set x [binom4 20 $i]}}
 36949 microseconds per iteration

The best stability (but low speed) is obtained by using set operations on the lists of prime factors of numerator and denominator:

 proc binom4 {m n} {
     set n [expr {(($m-$n) > $n) ? $m-$n : $n}]
     if {$n > $m}  {return 0}
     if {$n == $m} {return 1}

     # Find the list of prime factors in the numerator and denominator
     set num {}
     set den {}
     set d 0
     while {$n < $m} {
         set num [appendPrimeFactors [incr n] $num]
         set den [appendPrimeFactors [incr d] $den]
     }

     # We now need to multiply all factors in $num that are not in $den 
     set num [lsort -integer $num]
     set den [lsort -integer $den]
     set res 1
     set i 0 ;# position counter for num
     foreach f $den {
         while {[set g [lindex $num $i]] != $f} {
             set res [expr {$res * $g}]
             incr i
         }              pr {$res * $g}]
         incr i
     }
     foreach g [lrange $num $i end] {
         set res [expr {$res * $g}]
     }
     set res
 } ;#MS

 proc appendPrimeFactors {n res} {
    # a number x is prime if [llength [primefactors $x]]==1
    set f 2
    while {$f<=$n} {
        while {$n%$f==0} {
            set n [expr {$n/$f}]
            lappend res $f
        }
        set f [expr {$f+2-($f==2)}]
    }
    set res
 } ;#RS

There should however exist very few cases where binom3 overflows and binom4 doesn't ...

But, if you're worried, read on:

Log gamma -- or, factorials the hard way.

There was a little discussion on the chat about how to do factorials inside [expr]. Larry Virden pointed the questioner to several pages with classical recursive or iterative approaches. Kevin Kenny saw a challenge in the discussion, and decided to do factorials the hard way -- by computing the gamma function:

This is actually useful for avoiding overflows in computing binomial coefficients. It can also be used as an obscure way of computing pi.

 # Returns log(gamma(x)), where x >= 0
 #
 #           +inf
 #             _
 #            |    x-1  -t
 # gamma(x)= _|   t    e   dt
 #            
 #            0
 
 proc lgamma { xx } {
     set x [expr { $xx - 1.0 }]
     set tmp [expr { $x + 5.5 }]
     set tmp [ expr { ( $x + 0.5 ) * log( $tmp ) - $tmp }]
     set ser 1.0
     foreach cof {
        76.18009173 -86.50532033 24.01409822
        -1.231739516 .00120858003 -5.36382e-6
     } {
        set x [expr { $x + 1.0 }]
        set ser [expr { $ser + $cof / $x }]
     }
     return [expr { $tmp + log( 2.50662827465 * $ser ) }]
 }
 
 # Return x! given x.
 
 proc fac { x } {
     return [expr { exp( [lgamma [expr { $x + 1 }]] ) }]
 }
 
 # Demonstrate by printing a table of factorials
 
 for { set i 0 } { $i <= 10 } { incr i } {
     puts "$i! == [expr { round( [fac $i] ) }]"
 }
 
 # Compute binomial coefficients.
 
 proc choose { n k } {
     set r [expr { [lgamma [expr { $n + 1 }]]
                  - [lgamma [expr { $k + 1 }]]
                  - [lgamma [expr { $n - $k + 1 }]] }]
     return [expr { exp( $r ) }]
 }
 
 # Demonstrate binomial coefficients by printing Pascal's triangle.
 
 puts "Pascal's triangle:"
 for { set n 0 } { $n < 15 } { incr n } {
     set line {}
     for { set k 0 } { $k <= $n } { incr k } {
        set r [expr { round( [choose $n $k] ) }]
        append line [format %5d $r]
     }
     puts $line
 }
 
 # Compute pi, the hard way!

 set sqrtpi [expr { exp( [lgamma 0.5] ) }]
 puts "pi is approximately [expr { $sqrtpi * $sqrtpi }]"

For those that don't like ASCII art, the following Tcl script puts the formula on a canvas:

 grid [canvas .c -width 300 -height 200]

 font create big -family times -size 24
 font create bigitalic -family times -size 24 -slant italic
 font create small -family times -size 18
 font create smallitalic -family times -size 18 -slant italic

 .c create text 50 100 -anchor w -text "\u0393(" -font big -tags t1
 foreach { - - x - } [.c bbox t1] break
 .c create text $x 100 -anchor w -text "x" -font bigitalic -tags t2
 foreach { - - x - } [.c bbox t2] break
 .c create text $x 100 -anchor w -text ") = " -font big -tags t3
 foreach { - - x - } [.c bbox t3] break
 .c create text $x 100 -anchor w -text "\u222b" -font big -tags t4
 foreach { x0 y0 x1 y1 } [.c bbox t4] break
 set x2 [expr { ( $x0 + $x1 ) / 2 }]
 .c create text $x2 $y0 -anchor s -text "\u221e" -font small
 .c create text $x2 $y1 -anchor n -text "0" -font small
 .c create text $x1 100 -anchor w -text " t" -font bigitalic -tags t5
 foreach { - y x - } [.c bbox t5] break
 .c create text $x $y -anchor w -text "x" -font smallitalic -tags t6
 foreach { - - x - } [.c bbox t6] break
 .c create text $x $y -anchor w -text "-1" -font small -tags t7
 foreach { - - x - } [.c bbox t7] break
 .c create text $x 100 -anchor w -text "e" -font bigitalic -tags t8
 foreach { - y x - } [.c bbox t8] break
 .c create text $x $y -anchor w -text "-" -font small -tags t9
 foreach { - - x - } [.c bbox t9] break
 .c create text $x $y -anchor w -text "t" -font smallitalic -tags t10
 foreach { - - x - } [.c bbox t10] break
 .c create text $x 100 -text "  dt" -font bigitalic

Arjen Markus Of course the gamma function is only one of several hundreds of ways to compute pi, mostly in very mysterious ways. The best (i.e. most unexpected and obscure) one that I know is:

Get two random integers
Decide if they have some factor in common or not (i.e. gcd(n,m) > 1)
Repeat this over and over again: the chance that they have a factor in common is related to pi: 6/pi^2 (if I am not mistaken).

But, more seriously: the above script might lead one to contemplate using the canvas for drawing all manner of mathematical formulae. Has anybody ever done something like that? RS: How about Fun with functions? Arjen Markus I will need to run this one, but I meant more visualising integral signs, partial differential equations and so on, in other words a formula viewer :-)

KBK observes that Arjen answered his own question about rendering mathematical formulae.

DKF: I wanted to get binomial coefficients (for a primality test ) but using the math::choose routine described in tcllib's math::combinatorics page didn't work; it had the right right scale, but I needed exact bigint math which the tcllib code doesn't use. So I wrote this, using lmap to simplify things:

proc coeffs {p {signs 1}} {
    set clist 1
    for {set i 0} {$i < $p} {incr i} {
        set clist [lmap x [list 0 {*}$clist] y [list {*}$clist 0] {
            expr {$x + $y}
        }]
    }
    if {$signs} {
        set s -1
        set clist [lmap c $clist {expr {[set s [expr {-$s}]] * $c}}]
    }
    return $clist
}

(The signs argument is because I was really interested in the polynomial expansion of (x-1)^p.)

I leave it as an exercise to the reader to write a version that caches previous results for lower values so that recalculation is minimised. However, testing in 8.6 indicates that this is only half the speed of binom3 above even without such optimisations.

Category Mathematics

Category Concept