Does added calcium intake reduce the blood pressure of black men? In a randomized comparative double-blind trial, 10 men were given a calcium supplement for twelve weeks and 11 others received a placebo. Blood pressure was recorded at the beginning and end of the twelve week period for each subject.
myData = read.csv("calcium.csv")
head(myData)
Before carrying out an ANOVA F-test to see if there was a statistically significant difference in the mean decrease for the two groups we need to check the assumptions. One assumption is the assumption of constant variance. Here are the sample standard deviations of the decreases for each group.
tapply(myData$decrease,list(myData$group),sd)
When $s_1^2$ and $s_2^2$ are sample variances from independent SRSs of sizes $n_1$ and $n_2$ drawn from normal populations with the same population variances, then the ratio $s_1^2/s_2^2$ has the F-distribution with $n_1-1$ and $n_2-1$ degrees of freedom. In theory, this should let us test the hypothesis $H_0:\sigma_1 = \sigma_2$. The p-value is double the one-sided probability of getting an F-value larger than what we got (assuming $s_1 > s_2$). In practice, this test is not very robust.
s1=sd(subset(myData,group=="Calcium")$decrease)
s2=sd(subset(myData,group=="Placebo")$decrease)
F = s1^2/s2^2
F
2*(1-pf(F,9,10))
This p-value is not significant, so we don't have strong evidence to reject the null hypothesis. In otherwords, we don't have any reason to believe that the the variances are different. Unfortunately this test is very non-robust, so any deviation from normality could mess things up a lot.
If we use resampling, we get a whole resampling distribution of F-values that would allow us to assess whether our F-value was extreme enough to reject the null hypothesis.
Fvalues = c()
for (i in 1:5000) {
resampleData=cbind(sample(myData$decrease),myData$group)
s1=sd(subset(resampleData,resampleData[,2]==1)[,1])
s2=sd(subset(resampleData,resampleData[,2]==2)[,1])
Fvalues = c(Fvalues,s1^2/s2^2)
}
2*length(subset(Fvalues,Fvalues>2.2187))/5000
The permutation method is definitely more robust than the standard F-value computation, and since the p-value obtain is still not significant, we are probably safe proceeding as though the variances are equal. In fact, however, even the permutation method for comparing variances is not widely used.
# The following function computes the p-value of a ratio s_1^2/s_2^2 = 4 for different group sizes (assuming groups are both size n).
pvalue = function(n) {2*(1-pf(4,n,n))}
pvalue(1:40)
As you can see, after $n\ge 10$, a factor of 2 difference in the standard deviations is statistically significant under this test. So the rule of thumb suggested by the book completely ignores the F-test for constant variance. Here is a quote from the statistician George E. P. Box on the subject of the F-test for constant variance:
To make a preliminary test on variances is rather like putting to sea in a rowing boat to find out whether conditions are sufficiently calm for an ocean liner to leave port!