Comparing Variances

Does added calcium intake reduce the blood pressure of black men? In a randomized comparative double-blind trial, 10 men were given a calcium supplement for twelve weeks and 11 others received a placebo. Blood pressure was recorded at the beginning and end of the twelve week period for each subject.

In [4]:
myData = read.csv("calcium.csv")
head(myData)
Out[4]:
obsgroupgroupidbeginenddecrease
11Calcium01071007
22Calcium0110114-4
33Calcium012310518
44Calcium012911217
55Calcium0112115-3
66Calcium0111116-5

Before carrying out an ANOVA F-test to see if there was a statistically significant difference in the mean decrease for the two groups we need to check the assumptions. One assumption is the assumption of constant variance. Here are the sample standard deviations of the decreases for each group.

In [5]:
tapply(myData$decrease,list(myData$group),sd)
Out[5]:
Calcium
8.743251365736
Placebo
5.86979943903925

Theory

When $s_1^2$ and $s_2^2$ are sample variances from independent SRSs of sizes $n_1$ and $n_2$ drawn from normal populations with the same population variances, then the ratio $s_1^2/s_2^2$ has the F-distribution with $n_1-1$ and $n_2-1$ degrees of freedom. In theory, this should let us test the hypothesis $H_0:\sigma_1 = \sigma_2$. The p-value is double the one-sided probability of getting an F-value larger than what we got (assuming $s_1 > s_2$). In practice, this test is not very robust.

In [27]:
s1=sd(subset(myData,group=="Calcium")$decrease)
s2=sd(subset(myData,group=="Placebo")$decrease)
F = s1^2/s2^2
F
2*(1-pf(F,9,10))
Out[27]:
2.21870419231897
Out[27]:
0.230424076484427

This p-value is not significant, so we don't have strong evidence to reject the null hypothesis. In otherwords, we don't have any reason to believe that the the variances are different. Unfortunately this test is very non-robust, so any deviation from normality could mess things up a lot.

Permutation Method Approach

If we use resampling, we get a whole resampling distribution of F-values that would allow us to assess whether our F-value was extreme enough to reject the null hypothesis.

In [37]:
Fvalues = c()
for (i in 1:5000) {
    resampleData=cbind(sample(myData$decrease),myData$group)
    s1=sd(subset(resampleData,resampleData[,2]==1)[,1])
    s2=sd(subset(resampleData,resampleData[,2]==2)[,1])
    Fvalues = c(Fvalues,s1^2/s2^2)
}
2*length(subset(Fvalues,Fvalues>2.2187))/5000
Out[37]:
0.206

The permutation method is definitely more robust than the standard F-value computation, and since the p-value obtain is still not significant, we are probably safe proceeding as though the variances are equal. In fact, however, even the permutation method for comparing variances is not widely used.

In [1]:
# The following function computes the p-value of a ratio s_1^2/s_2^2 = 4 for different group sizes (assuming groups are both size n).
pvalue = function(n) {2*(1-pf(4,n,n))}
pvalue(1:40)
Out[1]:
  1. 0.590334470601733
  2. 0.4
  3. 0.284756979865294
  4. 0.208
  5. 0.154377250484413
  6. 0.11584
  7. 0.0876228290414025
  8. 0.0666880000000001
  9. 0.0510032607069508
  10. 0.0391628800000001
  11. 0.0301707951655739
  12. 0.0233084108799999
  13. 0.0180500879415
  14. 0.0140071223296001
  15. 0.0108895470583548
  16. 0.00847949941964798
  17. 0.00661231730415612
  18. 0.00516292567367671
  19. 0.00403591538162695
  20. 0.00315824109833418
  21. 0.00247380221596716
  22. 0.00193939287652589
  23. 0.00152165704832696
  24. 0.00119478741738499
  25. 0.000938778615267255
  26. 0.000738096069111682
  27. 0.00058065810599528
  28. 0.00045705523940498
  29. 0.000359949762503309
  30. 0.000283612898786201
  31. 0.000223567227490395
  32. 0.000176309904056593
  33. 0.000139098044514618
  34. 0.000109782047324369
  35. 8.66759576252374e-05
  36. 6.84565080835142e-05
  37. 5.40843973193539e-05
  38. 4.27428392226759e-05
  39. 3.37895446314018e-05
  40. 2.6719163469302e-05

As you can see, after $n\ge 10$, a factor of 2 difference in the standard deviations is statistically significant under this test. So the rule of thumb suggested by the book completely ignores the F-test for constant variance. Here is a quote from the statistician George E. P. Box on the subject of the F-test for constant variance:

To make a preliminary test on variances is rather like putting to sea in a rowing boat to find out whether conditions are sufficiently calm for an ocean liner to leave port!
In [ ]: