Permutation Test example

The Milgram experiments were a famous set of psychological studies that showed that ordinary people were capable of committing heineous acts if they were "ordered" to by a perceived authority figure. In the Milgram study, volunteers believed that they were administering dangerous electrical shocks to punish slow learners for answering questions incorrectly. In truth, the victims were actors, but the participants in the study didn't know that. In the end, every one of the volunteers was persuaded to push the button labelled "300 volts: XXX".

A follow-up study by Mary Ann DiMatteo conducted a randomized controlled experiment. The participants were 37 high school teachers who didn't previously know about the Milgram study. They were asked to decide if the study was ethical on a scale from 1 (not at all ethical) to 9 (completely ethical). Some of the teachers were told (incorrectly) that people many people in the Milgram study refused pushed the button labelled "300 volts: XXX", while others were told that many complied, and a third group of teachers were told the actual results (all complied). We want to know if believing that some or most of the participants in the Milgram study refused to use dangerous voltage changes the teacher's view on how ethical the study was.

ethicsData = read.csv("Milgram.csv")
head(ethicsData)

	Results	Score
1	Actual	6
2	Actual	1
3	Actual	7
4	Actual	2
5	Actual	1
6	Actual	7

aggregate(Score~Results,ethicsData,mean)

	Results	Score
1	Actual	3.307692
2	Complied	3.846154
3	Refused	5.545455

It appears as though the teachers who though that many people refused to push the button feel that the Milgram experiment was more ethical than those who didn't have that misconception. Now lets use a permutation test to see if the results are statitistically significant. Here is the F-value for a one-way ANOVA with this data:

summary(aov(Score~Results,ethicsData))[[1]][["F value"]][[1]]

3.48768487304897

Fvalues = c()
for (i in 1:10000) {
    Fvalues = c(Fvalues,summary(aov(sample(ethicsData$Score)~ethicsData$Results))[[1]][["F value"]][[1]])
}

hist(Fvalues,col='gray')

The histogram above shows the permutation distribution of F-values for 10,000 resamples. Now we need to ask what percent of these are more extreme than the F-value we got from our data \(F = 3.4877\).

sum(Fvalues > 3.4877)/10000

0.0402

This result is significant, but just barely. Let's compare it with a standard ANOVA p-value.

summary(aov(Score~Results,ethicsData))

            Df Sum Sq Mean Sq F value Pr(>F)  
Results      2  31.84  15.919   3.488 0.0419 *
Residuals   34 155.19   4.564                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1