Chicken farming is a multi-billion dollar industry, and any methods that increase the growth rate of young chicks can reduce consumer costs while increasing company profits, possibly by millions of dollars. An experiment was conducted to measure and compare the effectiveness of various feed supplements on the growth rate of chickens. Newly hatched chicks were randomly allocated into six groups, and each group was given a different feed supplement. Below are some summary statistics from this data set along with box plots showing the distribution of weights by feed type.
chickenData = read.csv('http://people.hsc.edu/faculty-staff/blins/classes/spring17/math222/data/chickwts.csv')
boxplot(weight~feed,data=chickenData,col='gray')
Below is the ANOVA table for this data. It is clear from the F-test that there are statistically significant differences between the mean weight gain for the different chicken feeds.
results = aov(weight~feed,data=chickenData)
summary(results)
## Df Sum Sq Mean Sq F value Pr(>F)
## feed 5 231129 46226 15.37 5.94e-10 ***
## Residuals 65 195556 3009
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now that we know that the differences are significant, we might want to know which pairwise comparisons are statistically significant.
pairwise.t.test(chickenData$weight,chickenData$feed,p.adj='bonferroni')
##
## Pairwise comparisons using t tests with pooled SD
##
## data: chickenData$weight and chickenData$feed
##
## casein horsebean linseed meatmeal soybean
## horsebean 3.1e-08 - - - -
## linseed 0.00022 0.22833 - - -
## meatmeal 0.68350 0.00011 0.20218 - -
## soybean 0.00998 0.00487 1.00000 1.00000 -
## sunflower 1.00000 1.2e-08 9.3e-05 0.39653 0.00447
##
## P value adjustment method: bonferroni
P value adjustment method: bonferroni
Notice the argument p.adj, which is the method for adjusting significance levels to avoid type I errors. In this case, there are 15 pairwise comparison, so R has increased all of the p-values by a factor of 15 to compensate. Notice that multiplying the p-values by 15 is pretty much the same as dividing the α by 15. This way, however, if a p-value in the table above looks significant (i.e., below 0.05), then we can safely conclude that it is statistically significant.
It is also possible to construct confidence intervals for each difference above. The most common technique is known as Tukey’s Honest Significant Differences, and the command is TukeyHSD(), as shown below:
TukeyHSD(results,conf.level=0.95)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = weight ~ feed, data = chickenData)
##
## $feed
## diff lwr upr p adj
## horsebean-casein -163.383333 -232.346876 -94.41979 0.0000000
## linseed-casein -104.833333 -170.587491 -39.07918 0.0002100
## meatmeal-casein -46.674242 -113.906207 20.55772 0.3324584
## soybean-casein -77.154762 -140.517054 -13.79247 0.0083653
## sunflower-casein 5.333333 -60.420825 71.08749 0.9998902
## linseed-horsebean 58.550000 -10.413543 127.51354 0.1413329
## meatmeal-horsebean 116.709091 46.335105 187.08308 0.0001062
## soybean-horsebean 86.228571 19.541684 152.91546 0.0042167
## sunflower-horsebean 168.716667 99.753124 237.68021 0.0000000
## meatmeal-linseed 58.159091 -9.072873 125.39106 0.1276965
## soybean-linseed 27.678571 -35.683721 91.04086 0.7932853
## sunflower-linseed 110.166667 44.412509 175.92082 0.0000884
## soybean-meatmeal -30.480519 -95.375109 34.41407 0.7391356
## sunflower-meatmeal 52.007576 -15.224388 119.23954 0.2206962
## sunflower-soybean 82.488095 19.125803 145.85039 0.0038845
From this table, we can see from the bottom row that the difference between the average weights of chickens fed sunflower feed versus soybean feed will be between 19.1 ounces and 145.8 ounces (with 95%) confidence. Notice that the adjusted p-values are a little lower than the ones using the Bonferroni condition. The Tukey method is a little more complicated then the Bonferroni method, and it is a little less conservative. R does not have a built in method for making Bonferroni confidence intervals for the pairwise differences in means, but you could easily compute that yourself using the formula: \[\bar{x}_A - \bar{x}_B \pm t^{**}s_p \sqrt{\frac{1}{N_A}+\frac{1}{N_B}}\]