Benford's Law

Keith Vetter : 2006-10-11 : To quote Wikipedia [L1 ]:

Benford's law, also called the first-digit law, states that in lists of numbers from many real-life sources of data, the leading digit is 1 almost one-third of the time, and further, larger numbers occur as the leading digit with less and less frequency as they grow in magnitude, to the point that 9 is the leading digit less than one time in twenty.

This counter-intuitive result applies to a wide variety of figures from the natural world or of social significance, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants, and processes described by power laws (which are very common in nature).

It is named after physicist Frank Benford, who stated it in 1938. However, it was earlier stated by Simon Newcomb, in 1881. The first rigorous formulation and proof appears to be due to Theodore P.Hill in 1988.

One cool application of Benford's Law is in fraud detection. People who make up numbers, say for a fraudelent insurance claim, tend to distribute their digits uniformly. So by checking if the leading digits distribution matches Benford's Law you can spot anomalous claims.

Here is a visualization of Benford's Law. It produces a list of numbers using the recurrence A(n+1) = A(n) * B, and plots the count of the leading digit.

NB. this program requires some sort of bignum package. Tcl 8.5 should work as is--see tip #237 [L2 ] (untested). Otherwise it looks for the bignum package (or the bignum starkit). Failing that it tries to load the mpexpr package (or the library directly). As a last result, it uses the tcllib math::bignum package, which works fine but is a bit slow.


 ##+##########################################################################
 #
 # benford.tcl -- Simulation demonstrating Benford's Law
 # by Keith Vetter, Oct 10, 2006
 #
 
 package require Tk
 package require tile
 package require Plotchart
 
 array set S {title "Benford's Law"}
 array set N {iter 100}
 
 set S(about) {
 Wikipedia:
 
    Benford's law, also called the first-digit law, states that in
    lists of numbers from many real-life sources of data, the leading
    digit is 1 almost one-third of the time, and further, larger
    numbers occur as the leading digit with less and less frequency as
    they grow in magnitude, to the point that 9 is the leading digit
    less than one time in twenty.
 
    This counter-intuitive result applies to a wide variety of figures
    from the natural world or of social significance, including
    electricity bills, street addresses, stock prices, population
    numbers, death rates, lengths of rivers, physical and mathematical
    constants, and processes described by power laws (which are very
    common in nature).
 
    It is named after physicist Frank Benford, who stated it in 1938.
    However, it was earlier stated by Simon Newcomb, in 1881. The first
    rigorous formulation and proof appears to be due to Theodore P.
    Hill in 1988.
 
    http://en.wikipedia.org/wiki/First_digit_law
 
 This simulation produces a list of numbers which follows Benford's Law
 by counting the leading digit as two numbers are repeatedly multiplied
 together:  A(n+1) = A(n) * B
 
 }
 
 proc DoDisplay {} {
    global S
 
    wm title . $S(title)
    wm minsize . 500 300
    frame .f -bd 2 -relief ridge
    canvas .c -width 500 -height 400 -bd 0 -highlightthickness 0
    bind .c <Configure> {ReCenter %W %h %w}
    bind all <F2> {console show}
 
    frame .ctrl
    ::ttk::button .go     -text "Generate" -command Go
    ::ttk::button .random -text "Random" -command Random
    ::ttk::button .clear  -text "Clear" -command Clear
    ::ttk::button .about  -text "About" -command About
 
    ::ttk::label .lstart -text "Start:"
    entry .estart -textvariable ::N(start) -width 10 -validate key \
        -vcmd {string is integer %P}
    ::ttk::label .lfactor -text "Factor:"
    entry .efactor -textvariable ::N(factor) -width 10 -validate key \
        -vcmd {string is integer %P}
    ::ttk::label .liter -text "Iterations:"
    entry .eiter -textvariable ::N(iter) -width 10 -validate key \
        -vcmd {string is integer %P}
    ::ttk::label .lcurr -text "Current:"
    entry .ecurr -textvariable ::N(current) -width 10 -state disabled \
        -disabledforeground [.eiter cget -fg]
 
 
    foreach i [trace info variable ::N] { eval trace remove variable ::N $i }
    trace variable ::N w Tracer
    Tracer a b c
 
    pack .f -side left -fill both -expand 1
    pack .c -in .f -side left -fill both -expand 1
    pack .ctrl -side right -fill y -padx 10 -pady 5 -before .f
    grid .lstart .estart -in .ctrl -sticky e -pady 5
    grid .lfactor .efactor -in .ctrl -sticky e -pady 5
    grid .liter .eiter -in .ctrl -sticky e -pady 5
    grid .lcurr .ecurr -in .ctrl -sticky e -pady 5
    grid .go - -in .ctrl -pady 15
    grid .random - -in .ctrl -pady 0
    grid .clear  - -in .ctrl -pady 5
    grid rowconfigure .ctrl 99 -weight 1
    grid .about - -in .ctrl -row 100 -pady 10
    catch {
        ::ttk::sizegrip .sg
        place .sg -in . -relx 1 -rely 1 -anchor se
    }
 }
 proc Tracer {var1 var2 op} {
    if {[regexp {^[0-9]*$} $::N(start)] &&
        [string is integer -strict $::N(factor)] &&
        [string is integer -strict $::N(iter)]} { set how !disabled
    } else { set how disabled }
    .go state $how
 }
 proc ReCenter {W h w} {                   ;# Called by configure event
    #set h2 [expr {$h / 2}] ; set w2 [expr {$w / 2}]
    #$W config -scrollregion [list -$w2 -$h2 $w2 $h2]
    DrawGraph
 }
 proc DrawGraph {} {
    global S CNT
 
    .c delete all
    .c config -width [winfo width .c] -height [winfo height .c]
 
    foreach {a b c} [::Plotchart::determineScale 0 \
                         [expr {$::CNT(max) > 40 ? $::CNT(max)+1 : 40}]] break
    set S(chart) [::Plotchart::createBarchart .c {1 2 3 4 5 6 7 8 9} \
                      [list 0 [expr {int($b)}] [expr {int($c)}]] 1]
    $S(chart) title "Benford's Law"
    .c itemconfig title -font {Times 20 bold}
    $S(chart) xtext "Significant Digit"
    $S(chart) ytext "Frequency"
    set ydata {}
    foreach i {1 2 3 4 5 6 7 8 9} { lappend ydata $CNT($i) }
    $S(chart) plot 1 $ydata {red yellow green cyan blue magenta red yellow green}
 }
 
 proc Init {} {
    global CNT
    unset -nocomplain CNT
    array set CNT {max 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0}
    set ::N(current) ""
 }
 
 proc Go:8.5 {} {
    global CNT N
 
    set a [expr {$N(current) ne "" ? $N(current) : $N(start)}]
    for {set i 0} {$i < $N(iter)} {incr i} {
        set a [expr {$a * $N(factor)}]
        set digit [string range $a 0 0]
        if {[incr CNT($digit)] > $CNT(max)} {
            set CNT(max) $CNT($digit)
        }
        DrawGraph
        update
    }
    set N(current) $a
 }
 proc Go:Bignum {} {
    global CNT N
 
    set a [expr {$N(current) ne "" ? $N(current) : $N(start)}]
    set b $N(factor)
 
    for {set i 0} {$i < $N(iter)} {incr i} {
        set a [::bigint::mul $a $b]
        set digit [string range $a 0 0]
        if {[incr CNT($digit)] > $CNT(max)} {
            set CNT(max) $CNT($digit)
        }
        DrawGraph
        update
    }
    set N(current) $a
 }
 proc Go:Mpexpr {} {
    global CNT N
 
    set a [expr {$N(current) ne "" ? $N(current) : $N(start)}]
    for {set i 0} {$i < $N(iter)} {incr i} {
        set a [mpexpr {$a * $N(factor)}]
        set digit [string range $a 0 0]
        if {[incr CNT($digit)] > $CNT(max)} {
            set CNT(max) $CNT($digit)
        }
        DrawGraph
        update
    }
    set N(current) $a
 }
 proc Go:Math.Bignum {} {
    global CNT N
 
    set a [expr {$N(current) ne "" ? $N(current) : $N(start)}]
    set a [::math::bignum::fromstr $a]
    set b [::math::bignum::fromstr $N(factor)]
 
    for {set i 0} {$i < $N(iter)} {incr i} {
        set a   [::math::bignum::mul $a $b]
        set str [::math::bignum::tostr $a]
        set digit [string range $str 0 0]
        if {[incr CNT($digit)] > $CNT(max)} {
            set CNT(max) $CNT($digit)
        }
        DrawGraph
        update
    }
    set N(current) [::math::bignum::tostr $a]
 }
 proc Clear {} {
    Init
    DrawGraph
 }
 proc Random {} {
    set ::N(start)  [expr {1 + int(rand() * 300)}]
    set ::N(factor) [expr {1 + int(rand() * 300)}]
    set ::N(current) ""
    Init
    Go
 }
 proc About {} {
    set msg "Benford's Law\nby Keith Vetter, Oct 2006\n$::S(about)"
    append msg "Using $::S(bignum) for multi-precision arithmetic."
    tk_messageBox -title "About" -message $msg
 }
 ################################################################
 #
 # Load a multi-precision math package of some sort
 #
 while {1} {
    # tcl 8.5 should work correctly
    if {[package vsatisfies $::tcl_version 8.5] > 0} {
        set S(bignum) "Tcl 8.5"
        interp alias {} Go {} Go:8.5
        break
    }
 
    # Bignum package
    set n [catch {package require bignum}]
    if {$n} {
        set n [catch {source bignum.kit}]
    }
    set n [catch {package require bignum}]
    if {! $n} {
        set S(bignum) "bignum package"
        interp alias {} Go {} Go:Bignum
        break
    }
 
    # mpexpr (how should it be loaded?)
    set n [catch {package require Mpexpr}]
    if {$n} {
        set fname "mpexpr10[info sharedlibextension]"
        set n [catch {load $fname}]
    }
    if {! $n} {
        set S(bignum) "mpexpr package"
        interp alias {} Go {} Go:Mpexpr
        break
    }
 
    # Fall back to tcllib's math::bignum package
    set S(bignum) "math::bignum package"
    package require math::bignum
    interp alias {} Go {} Go:Math.Bignum
    break
 }
 
 Init
 DoDisplay
 update
 Random
 return