Are baseball players with some positions better at bat than players with other positions? For example, are out-fielders better batters than catchers? Below is a data file containing data on 327 MLB players.
mlb = read.csv("http://people.hsc.edu/faculty-staff/blins/spring17/math222/data/bat10.txt")
head(mlb)
## name team position AB H HR RBI OBP AVG
## 1 I Suzuki SEA OF 680 214 6 43 0.359 0.315
## 2 D Jeter NYY IF 663 179 10 67 0.340 0.270
## 3 M Young TEX IF 656 186 21 91 0.330 0.284
## 4 J Pierre CWS OF 651 179 1 47 0.341 0.275
## 5 R Weeks MIL IF 651 175 29 83 0.366 0.269
## 6 M Scutaro BOS IF 632 174 11 56 0.333 0.275
Here the variables are:
name
team
position
- either in-field (IF
), out-field (OF
), catcher (C
), or designated hitter (DH
)AB
- number of opportunities at batH
- number of hitsHR
- number of home runsRBI
- number of runs batted inOBP
- on-base percentageAVG
- batting averageplot(mlb$position,mlb$OBP)
It looks like Designated Hitters have a better OBP, than other players, but is the difference statistically significant?
par(mfrow=c(1,3))
aggregate(OBP~position,data=mlb,FUN=length)
## position OBP
## 1 C 39
## 2 DH 14
## 3 IF 154
## 4 OF 120
aggregate(OBP~position,data=mlb,FUN=mean)
## position OBP
## 1 C 0.3226154
## 2 DH 0.3477857
## 3 IF 0.3315260
## 4 OF 0.3342500
aggregate(OBP~position,data=mlb,FUN=sd)
## position OBP
## 1 C 0.04513175
## 2 DH 0.03603669
## 3 IF 0.03709504
## 4 OF 0.02944394
R has a built-in function for doing analysis of variance.
results = aov(OBP~position,data=mlb)
summary(results)
## Df Sum Sq Mean Sq F value Pr(>F)
## position 3 0.0076 0.002519 1.994 0.115
## Residuals 323 0.4080 0.001263