When we looked at confidence intervals using the formula \(\displaystyle \bar{x} \pm z^* \frac{s}{\sqrt{N}}\), we saw that the sample size mattered a lot. Even when the population has a normal distribution, the formula didn’t work well for small sample sizes.
ages = read.csv("http://people.hsc.edu/faculty-staff/blins/classes/spring19/math222/Examples/townies.csv")
bell = ages$bellville
N=4
zstar=1.96
popMean = mean(bell) # Population mean of Bellville
results = c() # This will contain TRUE for each simulated confidence interval that contains popMean.
tvals = c() # This vector will contain each simulated t-value.
for (i in 1:10000) {
mySample = sample(bell,N)
xbar = mean(mySample)
s = sd(mySample)
t = (xbar-popMean)/(s/sqrt(N))
lower = xbar - zstar*s/sqrt(N)
upper = xbar + zstar*s/sqrt(N)
results = c(results,(lower < popMean) & (upper > popMean))
tvals = c(tvals,t)
}
table(results)
## results
## FALSE TRUE
## 1466 8534
As you can see in the table above, our confidence interval contained the population mean only about 85% of the time. The problem is that we a using the normal distribution to model the distribution of the t-values: \[t = \frac{\bar{x} - \mu}{s/\sqrt{n}}.\]
As you can see below: t-values aren’t normal!
qqnorm(tvals)
qqline(tvals)