## Benford's Law

Keith Vetter : 2006-10-11 : To quote Wikipedia [L1 ]:

Benford's law, also called the first-digit law, states that in lists of numbers from many real-life sources of data, the leading digit is 1 almost one-third of the time, and further, larger numbers occur as the leading digit with less and less frequency as they grow in magnitude, to the point that 9 is the leading digit less than one time in twenty.

This counter-intuitive result applies to a wide variety of figures from the natural world or of social significance, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants, and processes described by power laws (which are very common in nature).

It is named after physicist Frank Benford, who stated it in 1938. However, it was earlier stated by Simon Newcomb, in 1881. The first rigorous formulation and proof appears to be due to Theodore P.Hill in 1988.

One cool application of Benford's Law is in fraud detection. People who make up numbers, say for a fraudelent insurance claim, tend to distribute their digits uniformly. So by checking if the leading digits distribution matches Benford's Law you can spot anomalous claims.

Here is a visualization of Benford's Law. It produces a list of numbers using the recurrence A(n+1) = A(n) * B, and plots the count of the leading digit.

NB. this program requires some sort of bignum package. Tcl 8.5 should work as is--see tip #237 [L2 ] (untested). Otherwise it looks for the bignum package (or the bignum starkit). Failing that it tries to load the mpexpr package (or the library directly). As a last result, it uses the tcllib math::bignum package, which works fine but is a bit slow.

``` ##+##########################################################################
#
# benford.tcl -- Simulation demonstrating Benford's Law
# by Keith Vetter, Oct 10, 2006
#

package require Tk
package require tile
package require Plotchart

array set S {title "Benford's Law"}
array set N {iter 100}

Wikipedia:

Benford's law, also called the first-digit law, states that in
lists of numbers from many real-life sources of data, the leading
digit is 1 almost one-third of the time, and further, larger
numbers occur as the leading digit with less and less frequency as
they grow in magnitude, to the point that 9 is the leading digit
less than one time in twenty.

This counter-intuitive result applies to a wide variety of figures
from the natural world or of social significance, including
electricity bills, street addresses, stock prices, population
numbers, death rates, lengths of rivers, physical and mathematical
constants, and processes described by power laws (which are very
common in nature).

It is named after physicist Frank Benford, who stated it in 1938.
However, it was earlier stated by Simon Newcomb, in 1881. The first
rigorous formulation and proof appears to be due to Theodore P.
Hill in 1988.

http://en.wikipedia.org/wiki/First_digit_law

This simulation produces a list of numbers which follows Benford's Law
by counting the leading digit as two numbers are repeatedly multiplied
together:  A(n+1) = A(n) * B

}

proc DoDisplay {} {
global S

wm title . \$S(title)
wm minsize . 500 300
frame .f -bd 2 -relief ridge
canvas .c -width 500 -height 400 -bd 0 -highlightthickness 0
bind .c <Configure> {ReCenter %W %h %w}
bind all <F2> {console show}

frame .ctrl
::ttk::button .go     -text "Generate" -command Go
::ttk::button .random -text "Random" -command Random
::ttk::button .clear  -text "Clear" -command Clear

::ttk::label .lstart -text "Start:"
entry .estart -textvariable ::N(start) -width 10 -validate key \
-vcmd {string is integer %P}
::ttk::label .lfactor -text "Factor:"
entry .efactor -textvariable ::N(factor) -width 10 -validate key \
-vcmd {string is integer %P}
::ttk::label .liter -text "Iterations:"
entry .eiter -textvariable ::N(iter) -width 10 -validate key \
-vcmd {string is integer %P}
::ttk::label .lcurr -text "Current:"
entry .ecurr -textvariable ::N(current) -width 10 -state disabled \
-disabledforeground [.eiter cget -fg]

foreach i [trace info variable ::N] { eval trace remove variable ::N \$i }
trace variable ::N w Tracer
Tracer a b c

pack .f -side left -fill both -expand 1
pack .c -in .f -side left -fill both -expand 1
pack .ctrl -side right -fill y -padx 10 -pady 5 -before .f
grid .lstart .estart -in .ctrl -sticky e -pady 5
grid .lfactor .efactor -in .ctrl -sticky e -pady 5
grid .liter .eiter -in .ctrl -sticky e -pady 5
grid .lcurr .ecurr -in .ctrl -sticky e -pady 5
grid .go - -in .ctrl -pady 15
grid .random - -in .ctrl -pady 0
grid .clear  - -in .ctrl -pady 5
grid rowconfigure .ctrl 99 -weight 1
grid .about - -in .ctrl -row 100 -pady 10
catch {
::ttk::sizegrip .sg
place .sg -in . -relx 1 -rely 1 -anchor se
}
}
proc Tracer {var1 var2 op} {
if {[regexp {^[0-9]*\$} \$::N(start)] &&
[string is integer -strict \$::N(factor)] &&
[string is integer -strict \$::N(iter)]} { set how !disabled
} else { set how disabled }
.go state \$how
}
proc ReCenter {W h w} {                   ;# Called by configure event
#set h2 [expr {\$h / 2}] ; set w2 [expr {\$w / 2}]
#\$W config -scrollregion [list -\$w2 -\$h2 \$w2 \$h2]
DrawGraph
}
proc DrawGraph {} {
global S CNT

.c delete all
.c config -width [winfo width .c] -height [winfo height .c]

foreach {a b c} [::Plotchart::determineScale 0 \
[expr {\$::CNT(max) > 40 ? \$::CNT(max)+1 : 40}]] break
set S(chart) [::Plotchart::createBarchart .c {1 2 3 4 5 6 7 8 9} \
[list 0 [expr {int(\$b)}] [expr {int(\$c)}]] 1]
\$S(chart) title "Benford's Law"
.c itemconfig title -font {Times 20 bold}
\$S(chart) xtext "Significant Digit"
\$S(chart) ytext "Frequency"
set ydata {}
foreach i {1 2 3 4 5 6 7 8 9} { lappend ydata \$CNT(\$i) }
\$S(chart) plot 1 \$ydata {red yellow green cyan blue magenta red yellow green}
}

proc Init {} {
global CNT
unset -nocomplain CNT
array set CNT {max 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0}
set ::N(current) ""
}

proc Go:8.5 {} {
global CNT N

set a [expr {\$N(current) ne "" ? \$N(current) : \$N(start)}]
for {set i 0} {\$i < \$N(iter)} {incr i} {
set a [expr {\$a * \$N(factor)}]
set digit [string range \$a 0 0]
if {[incr CNT(\$digit)] > \$CNT(max)} {
set CNT(max) \$CNT(\$digit)
}
DrawGraph
update
}
set N(current) \$a
}
proc Go:Bignum {} {
global CNT N

set a [expr {\$N(current) ne "" ? \$N(current) : \$N(start)}]
set b \$N(factor)

for {set i 0} {\$i < \$N(iter)} {incr i} {
set a [::bigint::mul \$a \$b]
set digit [string range \$a 0 0]
if {[incr CNT(\$digit)] > \$CNT(max)} {
set CNT(max) \$CNT(\$digit)
}
DrawGraph
update
}
set N(current) \$a
}
proc Go:Mpexpr {} {
global CNT N

set a [expr {\$N(current) ne "" ? \$N(current) : \$N(start)}]
for {set i 0} {\$i < \$N(iter)} {incr i} {
set a [mpexpr {\$a * \$N(factor)}]
set digit [string range \$a 0 0]
if {[incr CNT(\$digit)] > \$CNT(max)} {
set CNT(max) \$CNT(\$digit)
}
DrawGraph
update
}
set N(current) \$a
}
proc Go:Math.Bignum {} {
global CNT N

set a [expr {\$N(current) ne "" ? \$N(current) : \$N(start)}]
set a [::math::bignum::fromstr \$a]
set b [::math::bignum::fromstr \$N(factor)]

for {set i 0} {\$i < \$N(iter)} {incr i} {
set a   [::math::bignum::mul \$a \$b]
set str [::math::bignum::tostr \$a]
set digit [string range \$str 0 0]
if {[incr CNT(\$digit)] > \$CNT(max)} {
set CNT(max) \$CNT(\$digit)
}
DrawGraph
update
}
set N(current) [::math::bignum::tostr \$a]
}
proc Clear {} {
Init
DrawGraph
}
proc Random {} {
set ::N(start)  [expr {1 + int(rand() * 300)}]
set ::N(factor) [expr {1 + int(rand() * 300)}]
set ::N(current) ""
Init
Go
}
proc About {} {
set msg "Benford's Law\nby Keith Vetter, Oct 2006\n\$::S(about)"
append msg "Using \$::S(bignum) for multi-precision arithmetic."
tk_messageBox -title "About" -message \$msg
}
################################################################
#
# Load a multi-precision math package of some sort
#
while {1} {
# tcl 8.5 should work correctly
if {[package vsatisfies \$::tcl_version 8.5] > 0} {
set S(bignum) "Tcl 8.5"
interp alias {} Go {} Go:8.5
break
}

# Bignum package
set n [catch {package require bignum}]
if {\$n} {
set n [catch {source bignum.kit}]
}
set n [catch {package require bignum}]
if {! \$n} {
set S(bignum) "bignum package"
interp alias {} Go {} Go:Bignum
break
}

# mpexpr (how should it be loaded?)
set n [catch {package require Mpexpr}]
if {\$n} {
set fname "mpexpr10[info sharedlibextension]"
set n [catch {load \$fname}]
}
if {! \$n} {
set S(bignum) "mpexpr package"
interp alias {} Go {} Go:Mpexpr
break
}

# Fall back to tcllib's math::bignum package
set S(bignum) "math::bignum package"
package require math::bignum
interp alias {} Go {} Go:Math.Bignum
break
}

Init
DoDisplay
update
Random
return```

 Applications using Tile Category Statistics