5K Running Times vs. Age & Gender

The data set below includes the results for 248 runners in a 5k race that took place in California in 2013. Below is an example of how to construct a multiple linear regression model in R.

raceData = read.csv('http://people.hsc.edu/faculty-staff/blins/classes/spring18/math222/data/Talley5K2013.txt',sep='\t')
head(raceData)
##     BIB                Name            Hometown Gender AgeRank GenderRank
## 1  1538 Ricketts, Christian    Grover Beach, Ca      M       1          1
## 2  1581     Mccarty, Travis   Arroyo Grande, Ca      M       1          2
## 3  1679       Bounds, Julia    Redwood City, Ca      F       1          1
## 4 91506  Krichevsky, Daniel San Luis Obispo, Ca      M       1          3
## 5  1591          Shea, Owen             Slo, Ca      M       1          4
## 6  1542    Gillespie, Tyler   Arroyo Grande, Ca      M       2          5
##   OverallRank  Time Age
## 1           1 16.18  13
## 2           2 17.15  34
## 3           3 18.80  14
## 4           4 19.02  27
## 5           5 19.07  49
## 6           6 19.38  25

The Model

myLM = lm(Time~Age+Gender,data=raceData)
summary(myLM)
## 
## Call:
## lm(formula = Time ~ Age + Gender, data = raceData)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.789  -5.440  -1.831   3.768  53.025 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 29.98722    1.41269  21.227  < 2e-16 ***
## Age          0.12259    0.03196   3.836 0.000159 ***
## GenderM     -5.20582    1.09495  -4.754  3.4e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.485 on 245 degrees of freedom
## Multiple R-squared:  0.1318, Adjusted R-squared:  0.1247 
## F-statistic:  18.6 on 2 and 245 DF,  p-value: 3.021e-08

We should also plot these variables to see visually how they interact, and to decide whether or not a linear model is appropriate.

par(mfrow=c(2,2))
plot(raceData$Age,raceData$Time,xlab='Age',ylab='Race Time (min)')
plot(raceData$Gender,raceData$Time,xlab="Gender",ylab='Race Time (min)')
plot(raceData$Gender,raceData$Age,xlab="Gender",ylab="Age")

Restricted Models

newLM = lm(Time~Gender,data=raceData)
summary(newLM)
## 
## Call:
## lm(formula = Time ~ Gender, data = raceData)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.889  -5.790  -1.678   4.180  49.903 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  34.6892     0.7215  48.078  < 2e-16 ***
## GenderM      -5.1918     1.1250  -4.615 6.34e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.718 on 246 degrees of freedom
## Multiple R-squared:  0.07967,    Adjusted R-squared:  0.07593 
## F-statistic:  21.3 on 1 and 246 DF,  p-value: 6.34e-06