Benford’s Law predicts that in data that is spread out evenly over several orders of magnitude, numbers that begin with a first digit of 1 or 2 will be more common than numbers that begin with an 8 or 9. In fact, here are the proportions of leading digits predicted by Benford’s Law.
Leading Digit | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|
Proportion | 0.301 | 0.176 | 0.125 | 0.097 | 0.079 | 0.067 | 0.058 | 0.051 | 0.046 |
We can use a \(\chi^2\) Goodness of Fit Test to see if the populations of the countries of the world follow Benford’s Law. Here is population data from 2017.
world = read.csv('UNworldPopulation2017.csv')
pop = world$population
The first thing we need to do is find the leading digit for each number in our data. Using R, we can create a custom function that will find this for us.
leadingDigit = function(num) {
# This function returns the leading nonzero digit of any number.
order = floor(log(num)/log(10))
return(floor(num/(10^order)))
}
Now we can use the built in \(\chi^2\) test function to test how well the population numbers follow Benford’s law. The vector p
below contains the probabilities for each leading digit according to Benford’s law.
p = (log(2:10)-log(1:9))/log(10)
p
## [1] 0.30103000 0.17609126 0.12493874 0.09691001 0.07918125 0.06694679
## [7] 0.05799195 0.05115252 0.04575749
Since there are length(pop)
countries, we have the following expected counts according to Benford’s Law:
expected = p*length(pop)
expected
## [1] 70.13999 41.02926 29.11073 22.58003 18.44923 15.59860 13.51212 11.91854
## [9] 10.66150
Here are the actual observed counts.
table(leadingDigit(pop))
##
## 1 2 3 4 5 6 7 8 9
## 70 37 26 27 23 16 8 14 12
x = as.vector(table(leadingDigit(pop)))
x
## [1] 70 37 26 27 23 16 8 14 12
chisq.test(x,p=p)
##
## Chi-squared test for given probabilities
##
## data: x
## X-squared = 5.5066, df = 8, p-value = 0.7023