NBA = read.csv("http://people.hsc.edu/faculty-staff/blins/spring17/math222/data/NBASalaries2014.txt")
East = subset(NBA,conference == "eastern")
West = subset(NBA,conference == "western")
A 95% confidence intervals for a difference in means should have a 95% of actually containing the true value of \[\mu_{eastern}-\mu_{western}.\] Since we know that the true difference is \(-\$186.17\), we can test out the t-distribution method and see how well it actually performs.
How often are t-distribution confidence intervals correct? Let’s simulate many random samples from our two populations, and make confidence intervals for \(\mu_{eastern} - \mu_{western}\) using the 2-sample t-distribution confidence interval formula: \[\bar{x}_1-\bar{x}_2 \pm t^* \sqrt{\frac{s_1^2}{N_1}+\frac{s_2^2}{N_2}}.\] In the loop below, I simulate taking samples of size \(N=20\), and computing whether or not the true difference in population means is in the interval.
results = c()
N = 20
for (i in 1:10000) {
EastSample = sample(East$salary,N)
WestSample = sample(West$salary,N)
dF = N-1
tstar = qt(0.975,dF)
upper = mean(EastSample)-mean(WestSample) + tstar*sqrt(sd(EastSample)^2/N+sd(WestSample)^2/N)
lower = mean(EastSample)-mean(WestSample) - tstar*sqrt(sd(EastSample)^2/N+sd(WestSample)^2/N)
trueGap = mean(East$salary)-mean(West$salary)
containsTrueGap = lower <= trueGap & upper >= trueGap
results = c(results,containsTrueGap)
}
And here are the results:
sum(results)/length(results)
## [1] 0.9653
Copy the code above into your own R markdown file, and answer the following questions:
What happens if the sample size \(N\) is larger (say around 100)? Are the confidence intervals 95% accurate?
What happens if the sample size \(N\) is smaller (like around 5)?
What happens if one sample is large and the other is small? You will have to adjust the code to have two different N variables to answer this one.