We we look at rainfall data for Farmville, VA from the Southeastern Regional Climate Center.
rainfall=read.csv("http://people.hsc.edu/faculty-staff/blins/StatsExamples/rainfall.csv")
dim(rainfall)
## [1] 81 14
head(rainfall)
## year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec total
## 1 1931 2.12 2.15 3.76 3.94 7.92 4.05 6.11 5.89 3.98 0.50 0.15 3.52 44.09
## 2 1932 6.12 1.89 8.07 1.89 5.02 2.33 0.82 0.85 4.10 8.54 6.24 5.78 51.65
## 3 1933 3.76 2.93 2.16 4.86 5.96 4.08 7.70 4.10 0.88 0.30 1.12 3.49 41.34
## 4 1934 1.14 4.07 4.91 1.62 5.26 3.79 3.62 4.13 5.07 2.36 4.30 3.07 43.34
## 5 1935 5.01 3.36 5.06 4.93 3.52 3.41 4.79 2.71 5.33 1.66 3.97 2.47 46.22
## 6 1936 8.21 4.57 5.88 3.00 0.43 5.37 1.80 7.86 4.87 1.48 0.46 4.73 48.66
The data includes the annual rainfall totals in Farmville from 1931 to 2012.
hist(rainfall$total,col='gray',main='Annual Rainfall Totals',xlab='Inches')
Suppose we wanted to know how variable rainfall totals are. The right way to compute standard deviation from a sample is to use the sample standard deviation: \[s = \sqrt{\frac{\sum (x_i - \bar{x})}{N-1}}.\] The standard deviation of the annual rainfall totals is 6.9845129 inches per year. That is only a point estimate. We might like to make a confidence interval for the parameter \(\sigma_\text{rainfall}\), that is, the true population standard deviation for Farmville annual rainfall amounts. Since we don’t know how to make confidence intervals for standard deviations, we can use a bootstrap distribution.
boot.dist = c()
for (i in 1:10000) {
boot.sample = sample(rainfall$total,81,replace=T)
boot.stat = sd(boot.sample)
boot.dist = c(boot.dist,boot.stat)
}
hist(boot.dist,col='gray',main='Bootstrap Distribution for s',xlab='Inches')
This looks roughly normal. When the bootstrap distribution is roughly normal, you can make a bootstrap confidence interval using the t-distribution. The equation is:
\[\text{statistic} \pm t^* \text{SE}_{boot}\] where \(\text{SE}_{boot}\) is the standard deviation of the bootstrap distribution (it’s called the bootstrap standard error). You calculate it with the command sd(boot.dist)
and in this example we get 0.5404. Our statistic is \(s\) which is 6.9845129 inches per year, and the critical t-value for a 95% confidence interval with 80 degrees of freedom is given by qt(0.975,80)
.
s = sd(rainfall$total)
t = qt(0.975,80)
boot.SE = sd(boot.dist)
c(s-t*boot.SE,s+t*boot.SE)
## [1] 5.909083 8.059943
So we are 95% certain that the true population standard deviation of Farmville annual rainfall amounts is between 5.9090826 and 8.0599433 inches of rain per year.