2014 NBA Salaries: Bootstrap Confidence Intervals

NBA = read.csv("http://people.hsc.edu/faculty-staff/blins/spring17/math222/data/NBASalaries2014.txt")
East = subset(NBA,conference == "eastern")
West = subset(NBA,conference == "western")
EastSample = sample(East$salary,20)
WestSample = sample(West$salary,20)

We could also use bootstrapping to estimate the population difference from a sample. We would need to look at a boostrap distribution for the statistic \(\bar{x}_{eastern}-\bar{x}_{western}\). Here is what one bootstrap sample would look like:

Step 1 - Bootstrap Resample from Original Samples

EastBootstrapSample = sample(EastSample,20,replace=T) 
WestBootstrapSample = sample(WestSample,20,replace=T)

Note that the bootstrap samples are the same size as the original samples.

Step 2 - Compute the Bootstrap Statistic

BootstrapStatistic = mean(EastBootstrapSample)-mean(WestBootstrapSample)

Step 3 - Repeat to Find Bootstrap Distribution

#This one is an exercise for you to finish!

Make a bootstrap distribution from 5,000 bootrap statistics using a for-loop.
Use the bootstrap distribution to find a 95% confidence interval for \(\mu_{eastern}-\mu_{western}\).
Did your bootstrap confidence interval actually contain the true value of the difference?
Extra credit. Write a computer program that generates lots of different bootstrap confidence intervals in this example based on many different real samples (not just bootstrap resamples), and test each one to see whether or not it contains the true value of \(\mu_{eastern}-\mu_{western}\). Are the bootstrap confidence intervals 95% accurate? What happens if the sample sizes change?