Logistic Regression Example¶

This is example 14.7 in the book.

A = matrix(c(0.96,1.33,1.63,2.04,2.32,50,48,46,49,50,6,16,24,42,44),ncol=3)
colnames(A)=c("concentration","number_of_insects","number_killed")

A

myData = as.data.frame(A)

I have stored the data from Example 14.7 in the book as a matrix above. Then I used the command as.data.frame() to convert the matrix to a data frame.
Data frames are better for doing regression analysis. I would like to add one more column to my data frame. The cbind() command lets me do this.

proportion_killed = myData$number_killed/(myData$number_of_insects)
cbind(myData,proportion_killed)

Now I am going to create a best fit logistic model for my data. I call it myGLM because the R command is glm(), which stands for generalized linear model. There are two ways to enter the data. One way is to use the proportion_killed numbers for y-values and concentration as the x-values. Alternatively, instead of using the proprtion_killed vector as the y-values, the glm() command accepts a matrix formed by using the number of successes in the first column and the number of failures as the second column. I do this below, but you can also try the other way and see what happens.

myGLM = glm(cbind(number_killed,number_of_insects-number_killed)~concentration,family="binomial",data=myData)

summary(myGLM)

Call:
glm(formula = cbind(number_killed, number_of_insects - number_killed) ~ 
    concentration, family = "binomial", data = myData)

Deviance Residuals: 
      1        2        3        4        5  
-0.1963   0.2099  -0.2978   0.8726  -0.7222  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -4.8923     0.6426  -7.613 2.67e-14 ***
concentration   3.1088     0.3879   8.015 1.11e-15 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 96.6881  on 4  degrees of freedom
Residual deviance:  1.4542  on 3  degrees of freedom
AIC: 24.675

Number of Fisher Scoring iterations: 4

plot(myData$concentration,proportion_killed,ylab='Proportion Killed',xlab='Concentration',main='Logistic Regression')
curve(predict(myGLM,data.frame(concentration=x),type='response'),add=T)

Other Examples¶

Another good example of logistic regression can be found online at the Cookbook for R website. There they compare the fuel economy (mpg) of old (1970's) cars versus an indicator variable (vs) which is whether the engine cylinders were arranged in a V-shape (e.g., V6 engine) or a straight configuration (e.g., straight 4 cylinder or 6 cylinder engine). A 0 indicates a V-engine and 1 indicates a straight engine (actually the 0 category also includes the RX-4 which has a rotary engine and the Porsche 914's flat boxer engine).

concentration	number_of_insects	number_killed
0.96	50.00	6.00
1.33	48.00	16.00
1.63	46.00	24.00
2.04	49.00	42.00
2.32	50.00	44.00

	concentration	number_of_insects	number_killed	proportion_killed
1	0.96	50	6	0.12
2	1.33	48	16	0.3333333
3	1.63	46	24	0.5217391
4	2.04	49	42	0.8571429
5	2.32	50	44	0.88