Cat Jumping

Evolutionary biologists Harris and Steudel (2002) investigated factors that are related to the jumping ability of domestic cats. The scientists measured the takeoff velocity (using high-speed cameras) as a proxy for jumping ability in 18 healthy adult cats. Several traits that might be related to takeoff velocity were also recorded including: gender, relative limb length (hindlimb), relative extensor muscle mass (musclemass), body mass, and fat mass relative to lean body mass (percent body fat).

results = read.csv("http://people.hsc.edu/faculty-staff/blins/classes/spring17/math222/data/CatJumping.txt")
results
##    sex bodymass hindlimb musclemass percentbodyfat velocity
## 1    F     3640    29.10      51.15             29    334.5
## 2    F     2670    28.55      46.05             17    387.3
## 3    M     5600    31.74      95.90             31    410.8
## 4    F     4130    26.90      55.65             39    318.6
## 5    F     3020    26.11      57.20             15    368.7
## 6    F     2660    26.69      48.67             11    358.8
## 7    F     3240    26.74      64.55             21    344.6
## 8    M     5140    27.71      78.80             35    324.6
## 9    F     3690    25.47      54.60             33    301.4
## 10   F     3620    28.18      55.50             15    331.8
## 11   F     5310    28.45      68.80             42    312.6
## 12   M     5560    28.65      79.80             37    316.8
## 13   M     3970    29.82      69.40             20    375.6
## 14   F     3770    26.66      60.25             26    372.4
## 15   F     5100    27.84      60.70             41    314.3
## 16   F     2950    27.89      55.65             25    367.5
## 17   M     7930    30.58      98.95             48    286.3
## 18   F     3550    28.06      79.25             16    352.5
  1. What is the response variable and what are the explanatory variables in this data set?

  2. The scatterplots below show the relationships between each explanatory variable and the response variable. For each plot, comment on the (a) direction, (b) linearity, and (c) strength of the trends. Because it is a little hard to see the plots from the pairs() function, here are the explanatory variables each plotted against the response variable.

par(mfrow=c(2,3))
plot(results$sex,results$velocity,ylab='Velocity',xlab='Sex')
plot(results$bodymass,results$velocity,ylab='Velocity',xlab='Body Mass')
plot(results$musclemass,results$velocity,ylab='Velocity',xlab='Muscle Mass')
plot(results$percentbodyfat,results$velocity,ylab='Velocity', xlab='Pecent Body Fat')
plot(results$hindlimb,results$velocity,ylab='Velocity',xlab = 'Hind Limb Length')

I used 4 different methods to find a good model for predicting velocity based on the other variables. Here were the results.

Forward selection for \(R^2_{adj}\)

By adding variables one at a time based on whether and how much they increased the \(R^2_{adj}\), we get a model that includes every variable except sex.

mylm = lm(velocity~percentbodyfat+musclemass+hindlimb+bodymass,data=results)
summary(mylm)
## 
## Call:
## lm(formula = velocity ~ percentbodyfat + musclemass + hindlimb + 
##     bodymass, data = results)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.229 -11.119  -1.931   8.156  39.486 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)  
## (Intercept)     46.000774 109.100588   0.422   0.6802  
## percentbodyfat  -0.005902   1.040256  -0.006   0.9956  
## musclemass       1.263113   0.712982   1.772   0.0999 .
## hindlimb        12.540988   4.337745   2.891   0.0126 *
## bodymass        -0.032726   0.013209  -2.478   0.0277 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.14 on 13 degrees of freedom
## Multiple R-squared:  0.7165, Adjusted R-squared:  0.6292 
## F-statistic: 8.213 on 4 and 13 DF,  p-value: 0.001565

Backwards elimination for \(R^2_{adj}\)

This leads to a model with three explanatory variables: musclemass, hindlimb, and bodymass.

mylm = lm(velocity~musclemass+hindlimb+bodymass,data=results)
summary(mylm)
## 
## Call:
## lm(formula = velocity ~ musclemass + hindlimb + bodymass, data = results)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.232 -11.126  -1.947   8.163  39.491 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 45.758845  96.768264   0.473  0.64359    
## musclemass   1.265299   0.578058   2.189  0.04605 *  
## hindlimb    12.548448   3.983319   3.150  0.00709 ** 
## bodymass    -0.032792   0.006172  -5.313  0.00011 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.41 on 14 degrees of freedom
## Multiple R-squared:  0.7165, Adjusted R-squared:  0.6557 
## F-statistic: 11.79 on 3 and 14 DF,  p-value: 0.0004012

Forward selection based on p-values

The first variable to add is percentbodyfat followed by hindlimb. After that, there are no other significant variables to add, so we stop.

mylm = lm(velocity~percentbodyfat+hindlimb,data=results)
summary(mylm)
## 
## Call:
## lm(formula = velocity ~ percentbodyfat + hindlimb, data = results)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -41.981 -11.090  -1.173  12.371  43.004 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    164.5979    99.1876   1.659 0.117779    
## percentbodyfat  -2.2978     0.5212  -4.409 0.000508 ***
## hindlimb         8.6462     3.6379   2.377 0.031211 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.77 on 15 degrees of freedom
## Multiple R-squared:  0.5818, Adjusted R-squared:  0.5261 
## F-statistic: 10.44 on 2 and 15 DF,  p-value: 0.001446

Backwards elimination using p-values

Of all the model selection methods, this is this easiest. You start with the full model and at each step, you eliminate the variable with the largest p-value until all remaining variables are statistically significant (usually at the \(\alpha = 5\%\) level). I ended up with a model with three explanatory variables: muslemass, hindlimb, and bodymass.

mylm = lm(velocity~musclemass+hindlimb+bodymass,data=results)
summary(mylm)
## 
## Call:
## lm(formula = velocity ~ musclemass + hindlimb + bodymass, data = results)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.232 -11.126  -1.947   8.163  39.491 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 45.758845  96.768264   0.473  0.64359    
## musclemass   1.265299   0.578058   2.189  0.04605 *  
## hindlimb    12.548448   3.983319   3.150  0.00709 ** 
## bodymass    -0.032792   0.006172  -5.313  0.00011 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.41 on 14 degrees of freedom
## Multiple R-squared:  0.7165, Adjusted R-squared:  0.6557 
## F-statistic: 11.79 on 3 and 14 DF,  p-value: 0.0004012