Logistic Regression Example: Aphids and Pesticides

This is example 14.7 in the book. A pesticide is tested at different concentration levels to see what proportion of aphids are killed. We will use the data to fit a logistic regression model to the data.

A = matrix(c(0.96,1.33,1.63,2.04,2.32,50,48,46,49,50,6,16,24,42,44),ncol=3)
colnames(A)=c("concentration","number_of_insects","number_killed")
A

##      concentration number_of_insects number_killed
## [1,]          0.96                50             6
## [2,]          1.33                48            16
## [3,]          1.63                46            24
## [4,]          2.04                49            42
## [5,]          2.32                50            44

I have stored the data from Example 14.7 in the book as a matrix above. Now I use the command as.data.frame() to convert the matrix to a data frame. Data frames are better for doing regression analysis.

myData = as.data.frame(A)
myData

##   concentration number_of_insects number_killed
## 1          0.96                50             6
## 2          1.33                48            16
## 3          1.63                46            24
## 4          2.04                49            42
## 5          2.32                50            44

I would like to add one more column to my data frame. The cbind() command lets me do this.

proportion_killed = myData$number_killed/(myData$number_of_insects)
myData=cbind(myData,proportion_killed)
myData

##   concentration number_of_insects number_killed proportion_killed
## 1          0.96                50             6         0.1200000
## 2          1.33                48            16         0.3333333
## 3          1.63                46            24         0.5217391
## 4          2.04                49            42         0.8571429
## 5          2.32                50            44         0.8800000

Now we are ready to carryout the logistic regression. There are two ways to enter the data. One way is to make an x and y vector of results for each individual (aphid). The x-vector would contain the insecticide concentration for that individual, while the y-vector would contain 0 or 1 (1 for killed, 0 for survived). Since are data isn’t in that format, there is an alternative. The glm() command accepts a matrix formed by using the number of successes in the first column and the number of failures as the second column. I do this below, but you can also try the other way and see what happens.

myModel = glm(cbind(number_killed,number_of_insects-number_killed)~concentration,data=myData,family="binomial")
summary(myModel)

## 
## Call:
## glm(formula = cbind(number_killed, number_of_insects - number_killed) ~ 
##     concentration, family = "binomial", data = myData)
## 
## Deviance Residuals: 
##       1        2        3        4        5  
## -0.1963   0.2099  -0.2978   0.8726  -0.7222  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -4.8923     0.6426  -7.613 2.67e-14 ***
## concentration   3.1088     0.3879   8.015 1.11e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 96.6881  on 4  degrees of freedom
## Residual deviance:  1.4542  on 3  degrees of freedom
## AIC: 24.675
## 
## Number of Fisher Scoring iterations: 4

Here is a graph of our logistic regression model.

plot(myData$concentration,proportion_killed,ylab='Proportion Killed',xlab='Concentration',main='Logistic Regression')
curve(predict(myModel,data.frame(concentration=x),type='response'),add=T)