The data frame NBA in the snippet below contains data on all of the NBA players from the 2014 season. The salary numbers are in millions of dollars.
NBA = read.csv("http://people.hsc.edu/faculty-staff/blins/spring17/math222/data/NBASalaries2014.txt")
head(NBA)
## firstname lastname team conference salary
## 1 Joe Johnson BrooklynNets eastern 23.18079
## 2 Deron Williams BrooklynNets eastern 19.75446
## 3 Brook Lopez BrooklynNets eastern 15.71900
## 4 Kevin Garnett BrooklynNets eastern 12.00000
## 5 Jarrett Jack BrooklynNets eastern 6.30000
## 6 Mirza Teletovic BrooklynNets eastern 3.36810
East = subset(NBA,conference == "eastern")
West = subset(NBA,conference == "western")
mean(East$salary)-mean(West$salary)
## [1] -0.0001861654
Because this is data from the whole population, there is no need to use statistical inference to give us information about salaries, we can just calculate the relevant parameters directly.
\[\mu_{eastern}-\mu_{western} = -\$186.17.\]
Does the salary data have any outliers? How can you tell?
Make histograms to display the distributions of salaries for both conferences. What do you notice?