Image that Hampden-Sydney had exactly 1,000 students, and that exactly half of them were born in Virginia. If we used a sample to estimate the proportion of students from Virginia, how accurate would the formula for a 95% confidence interval be?
N = 47
HSpop = c(rep(1,500),rep(0,500))
zstar = 1.96
results = c()
for (i in 1:100000) {
HSsample = sample(HSpop,N)
phat = sum(HSsample)/N
lower = phat - zstar*sqrt(phat*(1-phat)/N)
upper = phat + zstar*sqrt(phat*(1-phat)/N)
results = c((lower < 0.5) & (upper > 0.5),results)
}
table(results)
## results
## FALSE TRUE
## 7126 92874
As you can see, these results suggest that we can’t really trust that our confidence interval will be 95% accurate at catching the parameter of interest.