World Population and Benford’s Law

Benford’s Law predicts that in data that is spread out evenly over several orders of magnitude, numbers that begin with a first digit of 1 or 2 will be more common than numbers that begin with an 8 or 9. In fact, here are the proportions of leading digits predicted by Benford’s Law.

Leading Digit 1 2 3 4 5 6 7 8 9
Proportion 0.301 0.176 0.125 0.097 0.079 0.067 0.058 0.051 0.046

We can use a \(\chi^2\) Goodness of Fit Test to see if the populations of the countries of the world follow Benford’s Law. Here is population data from 2017.

world = read.csv('UNworldPopulation2017.csv')
pop = world$population

The first thing we need to do is find the leading digit for each number in our data. Using R, we can create a custom function that will find this for us.

leadingDigit = function(num) {
  # This function returns the leading nonzero digit of any number.
  order = floor(log(num)/log(10))
  return(floor(num/(10^order)))
}

Now we can use the built in \(\chi^2\) test function to test how well the population numbers follow Benford’s law. The vector p below contains the probabilities for each leading digit according to Benford’s law.

p = (log(2:10)-log(1:9))/log(10)
p
## [1] 0.30103000 0.17609126 0.12493874 0.09691001 0.07918125 0.06694679
## [7] 0.05799195 0.05115252 0.04575749

Since there are length(pop) countries, we have the following expected counts according to Benford’s Law:

expected = p*length(pop)
expected
## [1] 70.13999 41.02926 29.11073 22.58003 18.44923 15.59860 13.51212 11.91854
## [9] 10.66150

Here are the actual observed counts.

table(leadingDigit(pop))
## 
##  1  2  3  4  5  6  7  8  9 
## 70 37 26 27 23 16  8 14 12
x = as.vector(table(leadingDigit(pop)))
x
## [1] 70 37 26 27 23 16  8 14 12
chisq.test(x,p=p)
## 
##  Chi-squared test for given probabilities
## 
## data:  x
## X-squared = 5.5066, df = 8, p-value = 0.7023