Image that Hampden-Sydney had exactly 1,000 students, and that exactly half of them were born in Virginia. If we used a sample to estimate the proportion of students from Virginia, how accurate would the formula for a 95% confidence interval be?

N = 47 
HSpop = c(rep(1,500),rep(0,500))
zstar = 1.96

results = c()
for (i in 1:100000) {
  HSsample = sample(HSpop,N)
  phat = sum(HSsample)/N
  lower = phat - zstar*sqrt(phat*(1-phat)/N)
  upper = phat + zstar*sqrt(phat*(1-phat)/N)
  results = c((lower < 0.5) & (upper > 0.5),results)
}
table(results)
## results
## FALSE  TRUE 
##  7126 92874

As you can see, these results suggest that we can’t really trust that our confidence interval will be 95% accurate at catching the parameter of interest.