Comparing Proportions in R

Does random drug testing make high school athletes less likely to use drugs. One study looked a student athletes at two similar high schools in Oregon, one that used random drug testing and another that did not. Students participated in an anonymous survey where they were asked if they were using drugs. Here is the data from the study, put into a two-way table.

drugData = matrix(c(7,27,128,114),ncol=2,byrow=T)
colnames(drugData)=c('Random drug testing','No drug testing')
rownames(drugData)=c('Used drugs','Did not use drugs')
drugData
##                   Random drug testing No drug testing
## Used drugs                          7              27
## Did not use drugs                 128             114

A bar plot lets us see clearly that a higher proportion of athletes use drugs at the school without random drug testing. But is this difference statistically significant?

barplot(drugData,legend=T)

Proportions Test

In order to test the hypotheses:

  • \(H_0: p_\text{testing} = p_\text{no testing}\)
  • \(H_A: p_\text{testing} < p_\text{no testing}\)

We use the R command prop.test().

prop.test(drugData,alternative="less")
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  drugData
## X-squared = 11.191, df = 1, p-value = 0.000411
## alternative hypothesis: less
## 95 percent confidence interval:
##  -1.0000000 -0.1805903
## sample estimates:
##    prop 1    prop 2 
## 0.2058824 0.5289256

As you can see, the results are very significant, so we can safely conclude that the difference in drug use at the two high schools is not a random fluke. We cannot assume that the difference is due to the drug testing. There might be other lurking variables that could be causing the difference. Can you think of any possible confounding variables (i.e., lurking variables that are associated with both drug use and whether a school uses random drug testing)?