This is example 14.7 in the book. A pesticide is tested at different concentration levels to see what proportion of aphids are killed. We will use the data to fit a logistic regression model to the data.
A = matrix(c(0.96,1.33,1.63,2.04,2.32,50,48,46,49,50,6,16,24,42,44),ncol=3)
colnames(A)=c("concentration","number_of_insects","number_killed")
A
## concentration number_of_insects number_killed
## [1,] 0.96 50 6
## [2,] 1.33 48 16
## [3,] 1.63 46 24
## [4,] 2.04 49 42
## [5,] 2.32 50 44
I have stored the data from Example 14.7 in the book as a matrix above. Now I use the command as.data.frame() to convert the matrix to a data frame. Data frames are better for doing regression analysis.
myData = as.data.frame(A)
myData
## concentration number_of_insects number_killed
## 1 0.96 50 6
## 2 1.33 48 16
## 3 1.63 46 24
## 4 2.04 49 42
## 5 2.32 50 44
I would like to add one more column to my data frame. The cbind() command lets me do this.
proportion_killed = myData$number_killed/(myData$number_of_insects)
myData=cbind(myData,proportion_killed)
myData
## concentration number_of_insects number_killed proportion_killed
## 1 0.96 50 6 0.1200000
## 2 1.33 48 16 0.3333333
## 3 1.63 46 24 0.5217391
## 4 2.04 49 42 0.8571429
## 5 2.32 50 44 0.8800000
Now we are ready to carryout the logistic regression. There are two ways to enter the data. One way is to make an x and y vector of results for each individual (aphid). The x-vector would contain the insecticide concentration for that individual, while the y-vector would contain 0 or 1 (1 for killed, 0 for survived). Since are data isn’t in that format, there is an alternative. The glm() command accepts a matrix formed by using the number of successes in the first column and the number of failures as the second column. I do this below, but you can also try the other way and see what happens.
myModel = glm(cbind(number_killed,number_of_insects-number_killed)~concentration,data=myData,family="binomial")
summary(myModel)
##
## Call:
## glm(formula = cbind(number_killed, number_of_insects - number_killed) ~
## concentration, family = "binomial", data = myData)
##
## Deviance Residuals:
## 1 2 3 4 5
## -0.1963 0.2099 -0.2978 0.8726 -0.7222
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.8923 0.6426 -7.613 2.67e-14 ***
## concentration 3.1088 0.3879 8.015 1.11e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 96.6881 on 4 degrees of freedom
## Residual deviance: 1.4542 on 3 degrees of freedom
## AIC: 24.675
##
## Number of Fisher Scoring iterations: 4
Here is a graph of our logistic regression model.
plot(myData$concentration,proportion_killed,ylab='Proportion Killed',xlab='Concentration',main='Logistic Regression')
curve(predict(myModel,data.frame(concentration=x),type='response'),add=T)