Logistic Regression Example

This is example 14.7 in the book.

In [25]:
A = matrix(c(0.96,1.33,1.63,2.04,2.32,50,48,46,49,50,6,16,24,42,44),ncol=3)
colnames(A)=c("concentration","number_of_insects","number_killed")
In [26]:
A
Out[26]:
concentrationnumber_of_insectsnumber_killed
0.9650.00 6.00
1.3348.0016.00
1.6346.0024.00
2.0449.0042.00
2.3250.0044.00
In [27]:
myData = as.data.frame(A)

I have stored the data from Example 14.7 in the book as a matrix above. Then I used the command as.data.frame() to convert the matrix to a data frame.
Data frames are better for doing regression analysis. I would like to add one more column to my data frame. The cbind() command lets me do this.

In [28]:
proportion_killed = myData$number_killed/(myData$number_of_insects)
cbind(myData,proportion_killed)
Out[28]:
concentrationnumber_of_insectsnumber_killedproportion_killed
10.965060.12
21.3348160.3333333
31.6346240.5217391
42.0449420.8571429
52.3250440.88

Now I am going to create a best fit logistic model for my data. I call it myGLM because the R command is glm(), which stands for generalized linear model. There are two ways to enter the data. One way is to use the proportion_killed numbers for y-values and concentration as the x-values. Alternatively, instead of using the proprtion_killed vector as the y-values, the glm() command accepts a matrix formed by using the number of successes in the first column and the number of failures as the second column. I do this below, but you can also try the other way and see what happens.

In [29]:
myGLM = glm(cbind(number_killed,number_of_insects-number_killed)~concentration,family="binomial",data=myData)
In [30]:
summary(myGLM)
Out[30]:
Call:
glm(formula = cbind(number_killed, number_of_insects - number_killed) ~ 
    concentration, family = "binomial", data = myData)

Deviance Residuals: 
      1        2        3        4        5  
-0.1963   0.2099  -0.2978   0.8726  -0.7222  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -4.8923     0.6426  -7.613 2.67e-14 ***
concentration   3.1088     0.3879   8.015 1.11e-15 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 96.6881  on 4  degrees of freedom
Residual deviance:  1.4542  on 3  degrees of freedom
AIC: 24.675

Number of Fisher Scoring iterations: 4
In [35]:
plot(myData$concentration,proportion_killed,ylab='Proportion Killed',xlab='Concentration',main='Logistic Regression')
curve(predict(myGLM,data.frame(concentration=x),type='response'),add=T)

Other Examples

Another good example of logistic regression can be found online at the Cookbook for R website. There they compare the fuel economy (mpg) of old (1970's) cars versus an indicator variable (vs) which is whether the engine cylinders were arranged in a V-shape (e.g., V6 engine) or a straight configuration (e.g., straight 4 cylinder or 6 cylinder engine). A 0 indicates a V-engine and 1 indicates a straight engine (actually the 0 category also includes the RX-4 which has a rotary engine and the Porsche 914's flat boxer engine).

In [ ]: