2014 NBA Salaries: Confidence Intervals

NBA = read.csv("http://people.hsc.edu/faculty-staff/blins/spring17/math222/data/NBASalaries2014.txt")
East = subset(NBA,conference == "eastern")
West = subset(NBA,conference == "western")

A 95% confidence intervals for a difference in means should have a 95% of actually containing the true value of \[\mu_{eastern}-\mu_{western}.\] Since we know that the true difference is $-\$186.17$, we can test out the t-distribution method and see how well it actually performs.

Simulating Confidence

How often are t-distribution confidence intervals correct? Let’s simulate many random samples from our two populations, and make confidence intervals for $\mu_{eastern} - \mu_{western}$ using the 2-sample t-distribution confidence interval formula: \[\bar{x}_1-\bar{x}_2 \pm t^* \sqrt{\frac{s_1^2}{N_1}+\frac{s_2^2}{N_2}}.\] In the loop below, I simulate taking samples of size $N=20$, and computing whether or not the true difference in population means is in the interval.

results = c()
N = 20
for (i in 1:10000) {
  EastSample = sample(East$salary,N)
  WestSample = sample(West$salary,N)
  dF = N-1
  tstar = qt(0.975,dF)
  upper = mean(EastSample)-mean(WestSample) + tstar*sqrt(sd(EastSample)^2/N+sd(WestSample)^2/N)
  lower = mean(EastSample)-mean(WestSample) - tstar*sqrt(sd(EastSample)^2/N+sd(WestSample)^2/N)
  trueGap = mean(East$salary)-mean(West$salary)
  containsTrueGap = lower <= trueGap & upper >= trueGap
  results = c(results,containsTrueGap)
}

And here are the results:

sum(results)/length(results)

## [1] 0.9653

Copy the code above into your own R markdown file, and answer the following questions:

What happens if the sample size $N$ is larger (say around 100)? Are the confidence intervals 95% accurate?
What happens if the sample size $N$ is smaller (like around 5)?
What happens if one sample is large and the other is small? You will have to adjust the code to have two different N variables to answer this one.