Bootstrap Example: Standard Deviation for Farmville Rainfall

We we look at rainfall data for Farmville, VA from the Southeastern Regional Climate Center.

rainfall=read.csv("http://people.hsc.edu/faculty-staff/blins/StatsExamples/rainfall.csv")
dim(rainfall)

## [1] 81 14

head(rainfall)

##   year  Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec total
## 1 1931 2.12 2.15 3.76 3.94 7.92 4.05 6.11 5.89 3.98 0.50 0.15 3.52 44.09
## 2 1932 6.12 1.89 8.07 1.89 5.02 2.33 0.82 0.85 4.10 8.54 6.24 5.78 51.65
## 3 1933 3.76 2.93 2.16 4.86 5.96 4.08 7.70 4.10 0.88 0.30 1.12 3.49 41.34
## 4 1934 1.14 4.07 4.91 1.62 5.26 3.79 3.62 4.13 5.07 2.36 4.30 3.07 43.34
## 5 1935 5.01 3.36 5.06 4.93 3.52 3.41 4.79 2.71 5.33 1.66 3.97 2.47 46.22
## 6 1936 8.21 4.57 5.88 3.00 0.43 5.37 1.80 7.86 4.87 1.48 0.46 4.73 48.66

The data includes the annual rainfall totals in Farmville from 1931 to 2012.

hist(rainfall$total,col='gray',main='Annual Rainfall Totals',xlab='Inches')

Suppose we wanted to know how variable rainfall totals are. The right way to compute standard deviation from a sample is to use the sample standard deviation: \[s = \sqrt{\frac{\sum (x_i - \bar{x})}{N-1}}.\] The standard deviation of the annual rainfall totals is 6.9845129 inches per year. That is only a point estimate. We might like to make a confidence interval for the parameter \(\sigma_\text{rainfall}\), that is, the true population standard deviation for Farmville annual rainfall amounts. Since we don’t know how to make confidence intervals for standard deviations, we can use a bootstrap distribution.

boot.dist = c()
for (i in 1:10000) {
  boot.sample = sample(rainfall$total,81,replace=T)
  boot.stat = sd(boot.sample)
  boot.dist = c(boot.dist,boot.stat)
}
hist(boot.dist,col='gray',main='Bootstrap Distribution for s',xlab='Inches')

This looks roughly normal. When the bootstrap distribution is roughly normal, you can make a bootstrap confidence interval using the t-distribution. The equation is:

\[\text{statistic} \pm t^* \text{SE}_{boot}\] where \(\text{SE}_{boot}\) is the standard deviation of the bootstrap distribution (it’s called the bootstrap standard error). You calculate it with the command sd(boot.dist) and in this example we get 0.5404. Our statistic is \(s\) which is 6.9845129 inches per year, and the critical t-value for a 95% confidence interval with 80 degrees of freedom is given by qt(0.975,80).

s = sd(rainfall$total)
t = qt(0.975,80)
boot.SE = sd(boot.dist)
c(s-t*boot.SE,s+t*boot.SE)

## [1] 5.909083 8.059943

So we are 95% certain that the true population standard deviation of Farmville annual rainfall amounts is between 5.9090826 and 8.0599433 inches of rain per year.