Logistic Regression Example: Aphids and Pesticides

A study looked out whether increasing concentrations of a pesticide were more effective at killing aphids. Five groups of insects were exposed to different concentrations of the pesticide rotenone. After exposure, the researchers counted the number of aphids that were killed. The results are given in the matrix below.

A = matrix(c(0.96,1.33,1.63,2.04,2.32,50,48,46,49,50,6,16,24,42,44),ncol=3)
colnames(A)=c("concentration","number_of_insects","number_killed")
A
##      concentration number_of_insects number_killed
## [1,]          0.96                50             6
## [2,]          1.33                48            16
## [3,]          1.63                46            24
## [4,]          2.04                49            42
## [5,]          2.32                50            44

It would be more convenient if this data were an R data.frame rather than a matrix. You can use the command as.data.frame() to convert the matrix to a data frame. Data frames are better for doing regression analysis.

myData = as.data.frame(A)
myData
##   concentration number_of_insects number_killed
## 1          0.96                50             6
## 2          1.33                48            16
## 3          1.63                46            24
## 4          2.04                49            42
## 5          2.32                50            44

I would like to add one more column to my data frame. The cbind() command lets me do this.

proportion_killed = myData$number_killed/(myData$number_of_insects)
myData=cbind(myData,proportion_killed)
myData
##   concentration number_of_insects number_killed proportion_killed
## 1          0.96                50             6         0.1200000
## 2          1.33                48            16         0.3333333
## 3          1.63                46            24         0.5217391
## 4          2.04                49            42         0.8571429
## 5          2.32                50            44         0.8800000

Now we are ready to carryout the logistic regression. There are two ways to enter the data. One way is to make an x and y vector of results for each individual (aphid). The x-vector would contain the insecticide concentration for that individual, while the y-vector would contain 0 or 1 (1 for killed, 0 for survived). Since are data isn’t in that format, there is an alternative. The glm() command accepts a matrix formed by using the number of successes in the first column and the number of failures as the second column. I do this below, but you can also try the other way and see what happens.

myModel = glm(cbind(number_killed,number_of_insects-number_killed)~concentration,data=myData,family="binomial")
summary(myModel)
## 
## Call:
## glm(formula = cbind(number_killed, number_of_insects - number_killed) ~ 
##     concentration, family = "binomial", data = myData)
## 
## Deviance Residuals: 
##       1        2        3        4        5  
## -0.1963   0.2099  -0.2978   0.8726  -0.7222  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -4.8923     0.6426  -7.613 2.67e-14 ***
## concentration   3.1088     0.3879   8.015 1.11e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 96.6881  on 4  degrees of freedom
## Residual deviance:  1.4542  on 3  degrees of freedom
## AIC: 24.675
## 
## Number of Fisher Scoring iterations: 4

Here is a graph of our logistic regression model.

plot(myData$concentration,proportion_killed,ylab='Proportion Killed',xlab='Concentration',main='Logistic Regression')
curve(predict(myModel,data.frame(concentration=x),type='response'),add=T)