Does Dandruff Shampoo Work?

ANOVA is often used in medical research to simultaneously assess the effectiveness of several treatments for a given condition. One such study looked at different treatments for dandruff. The treatments were 1% pyrithione zinc shampoo (PyrI), the same shampoo but with instructions to shampoo two times (PyrII), 2% ketoconazole shampoo (Keto), and a placebo shampoo (Placebo). Subjects were randomized to each group, and initially each group had 112 volunteers, except the placebo group which was only assigned 28. Each volunteer was examined for dandruff flakes before and after six weeks of treatment. Dandruff flaking was measured on a scale from 0 to 80. Initially, there were no significant differences between the groups. During the clinical trial, 3 dropped out from the PyrII group and 6 from the Keto group. No patients dropped out of the other two groups.

Below is a summary of the final results for each group:

##           N     Mean        SD
## Keto    106 16.02830 0.9305149
## Placebo  28 29.39286 1.5948827
## PyrI    112 17.39286 1.1418110
## PyrII   109 17.20183 1.3524999

The data can be found here:

dandruff = read.csv("http://people.hsc.edu/faculty-staff/blins/classes/spring18/math222/data/dandruff.txt")

Carry out an ANOVA test for this situation. Explain what your conclusions are.
Which group looks like it is the most different from the others? Would the ANOVA test still be significant if that group were removed? Hint: use the command subset(dandruff,treat != ~~~~) to remove that group.

Contrasts

Once an ANOVA test indicates that there are significant differences, you can dig deeper to find out what those differences might be. One approach is to look at a contrast of interest. A contrast is a linear combination of different population means \[\psi = a_1 \mu_1 + a_2 \mu_2 + \ldots + a_I \mu_I\] where the constants \(a_i\) all add up to zero. A sample contrast is similar, except with sample means instead of population means. We typically use the symbol \(C\) to denote sample contrasts. For example, we might want to know how the two Pyrithione Zinc treatments compare with the Ketoconazole treatment. Then, a natural choice for the contrast would be: \[\psi = \frac{1}{2}\mu_\text{PyrI} + \frac{1}{2}\mu_\text{PyrII} - \mu_\text{Keto}\]

Inference for Contrasts

To test whether or not a contrast is significantly different than zero, that is:

\(H_0: \psi = 0\)
\(H_A: \psi \neq 0\)

we use the test statistic: \[t = \frac{C}{SE_C}\] where \(SE_C = s_p \sqrt{\sum \frac{a_i^2}{N_i}}\) is the standard error of the contrast and \(s_p\) is the pooled standard deviation (equal to \(\sqrt{MSE}\)). This t-value has degrees of freedom equal to the degrees of freedom of the residuals in the ANOVA model (\(DFE=N-I\)). To make a confidence interval for a contrast \(\psi\), use the formula: \[C \pm t^* SE_C.\]

Exercises

Find the sample contrast \(C = \frac{1}{2}\bar{x}_\text{PyrI} + \frac{1}{2}\bar{x}_\text{PyrII} - \bar{x}_\text{Keto}\) between Ketocanozole vs. the two Pyrithione Zinc groups.
Is the contrast in this example statistically significant?
Make a confidence interval for the contrast and explain what it means.