The file townies.csv contains data about the ages of the residents of two imaginary towns, Bellville and Skewton. Both towns have a population of 3600 people, but the age distributions in the towns are very different.

ages = read.csv("http://people.hsc.edu/faculty-staff/blins/classes/spring19/math222/Examples/townies.csv")
bell = ages$bellville
skew = ages$skewton
hist(bell,col='gray',main='Bellville Age Distribution',xlab='Age (Years)')

hist(skew,col='gray',main='Skewton Age Distribution',xlab='Age (Years)')

  1. Write a program in R that simulates taking a sample of N residents from each town and constructs a 95% confidence interval for the population mean using the formula \(\displaystyle \bar{x} \pm z^* \frac{s}{\sqrt{N}}\). Use a variable N for the sample size so that you can try different sample sizes. What are the actual population means for each town? Do your confidence intervals contain the parameter of interest?

  2. One of the assumptions of the confidence interval for means is that \(\bar{x}\) has a normal distribution. Is this assumption true? Use a for-loop to simulate 10,000 different samples of size N = 4, N = 25, and N = 100 from both towns. Make qqplots to see if the results are normally distributed or not. What do you see? How does sample size affect normality? How does the skew in the population affect normality?

  3. Now simulate making many different confidence intervals for the population mean age in each town. How often do your confidence intervals contain the population parameter when N = 4, 25, or 100? How does the population skew affect the results?