In the 2014 General Social Survey, respondents were asked questions about many topics, including their religion and income. Below is a two-way analysis of variance for differences in income due to two factors: region (Northeast, South, Midwest, or West) and religion (Catholic, Protestant, or Other).

myData = read.csv('RegionReligionIncome2.csv')
head(myData)
dim(myData)
Household_Income Region Religion Income_Number
1 75000 to 89999 Northeast Catholic 82500
2 150000 or over Northeast Catholic 250000
3 40000 to 49999 Northeast Protestant 45000
4 150000 or over Northeast Catholic 250000
5 Refused Northeast Catholic NA
6 150000 or over Northeast Protestant 250000
  1. 2538
  2. 4

Once again, this data has a lot of NA entries. So we will use the na.omit() command to clean things up.

myData=na.omit(myData)
head(myData)
dim(myData)
Household_Income Region Religion Income_Number
1 75000 to 89999 Northeast Catholic 82500
2 150000 or over Northeast Catholic 250000
3 40000 to 49999 Northeast Protestant 45000
4 150000 or over Northeast Catholic 250000
6 150000 or over Northeast Protestant 250000
8 75000 to 89999 Northeast Catholic 82500
  1. 2314
  2. 4
par(mfrow=c(2,2))
boxplot(Income_Number~Region,data=myData,col='gray')
boxplot(Income_Number~Religion,data=myData,col='gray')
interaction.plot(myData$Religion,myData$Region,myData$Income_Number,col=c(1,2,3,4))
interaction.plot(myData$Region,myData$Religion,myData$Income_Number,col=c(1,2,3,4))

From this interaction plot, it looks like there is an interaction between region and religion. For example, Catholics make more money than Protestants in the Northeast, but they make less money in the South and West.

table(myData$Region,myData$Religion)
            Catholic Other Protestant
  Midwest        120   155        249
  Northeast      134   145        112
  South          173   178        496
  West           121   252        179

The table command can be used to find the counts for each cell. From this we can see that most of the cells are about the same size, although there are some, like Protestants in the South that are much larger.

myAOV = aov(Income_Number ~ Region * Religion, data = myData)
summary(myAOV)
model.tables(myAOV,'means')
#?model.tables
                  Df    Sum Sq   Mean Sq F value   Pr(>F)    
Region             3 7.990e+10 2.663e+10   6.116 0.000385 ***
Religion           2 1.987e+09 9.935e+08   0.228 0.796019    
Region:Religion    6 9.215e+10 1.536e+10   3.527 0.001772 ** 
Residuals       2302 1.002e+13 4.355e+09                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1






Tables of means
Grand mean
         
68666.92 

 Region 
    Midwest Northeast South  West
      74413     77377 62802 66042
rep     524       391   847   552

 Religion 
    Catholic Other Protestant
       67826 69992      68178
rep      548   730       1036

 Region:Religion 
           Religion
Region      Catholic Other Protestant
  Midwest   70702    61629 84160     
  rep         120      155   249     
  Northeast 80685    83343 65694     
  rep         134      145   112     
  South     64236    70229 59637     
  rep         173      178   496     
  West      60202    69063 65736     
  rep         121      252   179     

Checking the Assumptions

We need to check that the residuals obey the following assumptions.

  1. Indepence
  2. Constant variance
  3. Normality

The first assumption will automatically if the data for each cell comes from a SRS of the whole population (assuming that the population is very large). We can be pretty sure that this is basically correct, since the general social survey is a very reputable survey run by people who know a lot about statistics.

To check the second assumption, we can use the tapply() function to find the standard deviation for each subgroup in our study:

tapply(myData$Income_Number,list(myData$Region,myData$Religion),sd)
Catholic Other Protestant
Midwest 60319.39 59621.43 76522.09
Northeast 73418.60 77886.68 68138.84
South 70879.39 69892.39 56320.61
West 56246.46 70059.87 56845.68

The gap between the largest and smallest sample standard deviation much less than a factor of 2, so assumption #2 is probably okay. Let us turn to assumption #3.

qqnorm(resid(myAOV))
par(mfrow=c(4,3))
qqnorm(subset(myData,Region=='Midwest' & Religion=='Catholic')$Income_Number)
qqnorm(subset(myData,Region=='Midwest' & Religion=='Other')$Income_Number)
qqnorm(subset(myData,Region=='Midwest' & Religion=='Protestant')$Income_Number)
qqnorm(subset(myData,Region=='Northeast' & Religion=='Catholic')$Income_Number)
qqnorm(subset(myData,Region=='Northeast' & Religion=='Other')$Income_Number)
qqnorm(subset(myData,Region=='Northeast' & Religion=='Protestant')$Income_Number)
qqnorm(subset(myData,Region=='South' & Religion=='Catholic')$Income_Number)
qqnorm(subset(myData,Region=='South' & Religion=='Other')$Income_Number)
qqnorm(subset(myData,Region=='South' & Religion=='Protestant')$Income_Number)
qqnorm(subset(myData,Region=='West' & Religion=='Catholic')$Income_Number)
qqnorm(subset(myData,Region=='West' & Religion=='Other')$Income_Number)
qqnorm(subset(myData,Region=='West' & Religion=='Protestant')$Income_Number)
par(mfrow=c(4,3))
hist(subset(myData,Region=='Midwest' & Religion=='Catholic')$Income_Number)
hist(subset(myData,Region=='Midwest' & Religion=='Other')$Income_Number)
hist(subset(myData,Region=='Midwest' & Religion=='Protestant')$Income_Number)
hist(subset(myData,Region=='Northeast' & Religion=='Catholic')$Income_Number)
hist(subset(myData,Region=='Northeast' & Religion=='Other')$Income_Number)
hist(subset(myData,Region=='Northeast' & Religion=='Protestant')$Income_Number)
hist(subset(myData,Region=='South' & Religion=='Catholic')$Income_Number)
hist(subset(myData,Region=='South' & Religion=='Other')$Income_Number)
hist(subset(myData,Region=='South' & Religion=='Protestant')$Income_Number)
hist(subset(myData,Region=='West' & Religion=='Catholic')$Income_Number)
hist(subset(myData,Region=='West' & Religion=='Other')$Income_Number)
hist(subset(myData,Region=='West' & Religion=='Protestant')$Income_Number)

The data in each cell is clearly not normal, but that really won't be a problem. Because the sample sizes in each cell are so large (over 100), the means for each cell will definitely have a normal distribution due to the central limit theorem. Therefore using ANOVA in this situation is probably fine.

Further Analysis

We can use the TukeyHSD() command to get confidence intervals for all of the pairwise differences in our model. There are a lot! Also, most of the differences are not statistically significant.

TukeyHSD(myAOV,conf.level=0.95)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = Income_Number ~ Region * Religion, data = myData)

$Region
                        diff        lwr        upr     p adj
Northeast-Midwest   2963.431  -8373.991 14300.8518 0.9076886
South-Midwest     -11611.220 -21040.296 -2182.1439 0.0085133
West-Midwest       -8371.048 -18718.388  1976.2914 0.1599661
South-Northeast   -14574.650 -24947.259 -4202.0417 0.0017606
West-Northeast    -11334.479 -22548.353  -120.6045 0.0464494
West-South          3240.172  -6039.987 12520.3295 0.8060734

$Religion
                          diff       lwr       upr     p adj
Other-Catholic       2165.2074 -6581.981 10912.395 0.8305636
Protestant-Catholic   351.7939 -7822.724  8526.312 0.9943998
Protestant-Other    -1813.4135 -9291.815  5664.988 0.8368036

$`Region:Religion`
                                                diff          lwr         upr
Northeast:Catholic-Midwest:Catholic       9982.61816 -17149.17128 37114.40760
South:Catholic-Midwest:Catholic          -6466.53420 -32112.83534 19179.76694
West:Catholic-Midwest:Catholic          -10499.60399 -38311.44994 17312.24195
Midwest:Other-Midwest:Catholic           -9073.05108 -35322.15466 17176.05251
Northeast:Other-Midwest:Catholic         12641.02011 -14000.12663 39282.16686
South:Other-Midwest:Catholic              -473.15075 -25971.50261 25025.20111
West:Other-Midwest:Catholic              -1638.59127 -25581.95585 22304.77331
Midwest:Protestant-Midwest:Catholic      13457.55522 -10532.29235 37447.40279
Northeast:Protestant-Midwest:Catholic    -5007.88690 -33370.67760 23354.90379
South:Protestant-Midwest:Catholic       -11065.49059 -33027.06629 10896.08511
West:Protestant-Midwest:Catholic         -4966.04981 -30435.70454 20503.60491
South:Catholic-Northeast:Catholic       -16449.15236 -41291.82628  8393.52156
West:Catholic-Northeast:Catholic        -20482.22215 -47554.79979  6590.35548
Midwest:Other-Northeast:Catholic        -19055.66923 -44520.17638  6408.83791
Northeast:Other-Northeast:Catholic        2658.40196 -23210.04185 28526.84576
South:Other-Northeast:Catholic          -10455.76891 -35145.67844 14234.14062
West:Other-Northeast:Catholic           -11621.20943 -34701.72919 11459.31033
Midwest:Protestant-Northeast:Catholic     3474.93706 -19653.79986 26603.67398
Northeast:Protestant-Northeast:Catholic -14990.50506 -42628.76805 12647.75792
South:Protestant-Northeast:Catholic     -21048.10875 -42065.63697   -30.58053
West:Protestant-Northeast:Catholic      -14948.66797 -39608.93960  9711.60366
West:Catholic-South:Catholic             -4033.06979 -29616.72129 21550.58170
Midwest:Other-South:Catholic             -2606.51687 -26482.02500 21268.99125
Northeast:Other-South:Catholic           19107.55432  -5198.31224 43413.42087
South:Other-South:Catholic                5993.38345 -17054.18468 29040.95158
West:Other-South:Catholic                 4827.94293 -16486.58630 26142.47216
Midwest:Protestant-South:Catholic        19924.08942  -1442.64256 41290.82140
Northeast:Protestant-South:Catholic       1458.64730 -24722.87949 27640.17408
South:Protestant-South:Catholic          -4598.95639 -23660.31144 14462.39866
West:Protestant-South:Catholic            1500.48439 -21515.33107 24516.29984
Midwest:Other-West:Catholic               1426.55292 -24761.34315 27614.44899
Northeast:Other-West:Catholic            23140.62411  -3440.21789 49721.46611
South:Other-West:Catholic                10026.45325 -15408.88456 35461.79105
West:Other-West:Catholic                  8861.01272 -15015.23423 32737.25968
Midwest:Protestant-West:Catholic         23957.15922     34.29886 47880.01957
Northeast:Protestant-West:Catholic        5491.71709 -22814.43696 33797.87114
South:Protestant-West:Catholic            -565.88660 -22454.26865 21322.49546
West:Protestant-West:Catholic             5533.55418 -19873.01531 30940.12367
Northeast:Other-Midwest:Other            21714.07119  -3227.01591 46655.15829
South:Other-Midwest:Other                 8599.90033 -15116.61446 32316.41511
West:Other-Midwest:Other                  7434.45981 -14601.68853 29470.60814
Midwest:Protestant-Midwest:Other         22530.60630    443.96073 44617.25186
Northeast:Protestant-Midwest:Other        4065.16417 -22707.11668 30837.44502
South:Protestant-Midwest:Other           -1992.43952 -21857.43217 17872.55314
West:Protestant-Midwest:Other             4107.00126 -19578.65764 27792.66017
South:Other-Northeast:Other             -13114.17086 -37263.87758 11035.53585
West:Other-Northeast:Other              -14279.61138 -36781.32452  8222.10175
Midwest:Protestant-Northeast:Other         816.53511 -21734.63278 23367.70299
Northeast:Protestant-Northeast:Other    -17648.90702 -44805.67933  9507.86529
South:Protestant-Northeast:Other        -23706.51071 -44086.72646 -3326.29495
West:Protestant-Northeast:Other         -17607.06993 -41726.47495  6512.33509
West:Other-South:Other                   -1165.44052 -22301.72074 19970.83970
Midwest:Protestant-South:Other           13930.70597  -7258.21615 35119.62809
Northeast:Protestant-South:Other         -4534.73616 -30571.35518 21501.88287
South:Protestant-South:Other            -10592.33984 -29454.16483  8269.48514
West:Protestant-South:Other              -4492.89906 -27343.74032 18357.94219
Midwest:Protestant-West:Other            15096.14649  -4193.46679 34385.75977
Northeast:Protestant-West:Other          -3369.29563 -27885.09238 21146.50111
South:Protestant-West:Other              -9426.89932 -26126.81553  7273.01689
West:Protestant-West:Other               -3327.45854 -24429.11028 17774.19319
Northeast:Protestant-Midwest:Protestant -18465.44213 -43026.63853  6095.75428
South:Protestant-Midwest:Protestant     -24523.04581 -41289.53860 -7756.55302
West:Protestant-Midwest:Protestant      -18423.60503 -39577.98484  2730.77477
South:Protestant-Northeast:Protestant    -6057.60369 -28641.89889 16526.69152
West:Protestant-Northeast:Protestant        41.83709 -25966.67871 26050.35290
West:Protestant-South:Protestant          6099.44078 -12723.57189 24922.45345
                                            p adj
Northeast:Catholic-Midwest:Catholic     0.9887616
South:Catholic-Midwest:Catholic         0.9996202
West:Catholic-Midwest:Catholic          0.9861544
Midwest:Other-Midwest:Catholic          0.9933194
Northeast:Other-Midwest:Catholic        0.9255492
South:Other-Midwest:Catholic            1.0000000
West:Other-Midwest:Catholic             1.0000000
Midwest:Protestant-Midwest:Catholic     0.7984972
Northeast:Protestant-Midwest:Catholic   0.9999892
South:Protestant-Midwest:Catholic       0.8908251
West:Protestant-Midwest:Catholic        0.9999702
South:Catholic-Northeast:Catholic       0.5746521
West:Catholic-Northeast:Catholic        0.3562458
Midwest:Other-Northeast:Catholic        0.3739089
Northeast:Other-Northeast:Catholic      1.0000000
South:Other-Northeast:Catholic          0.9664043
West:Other-Northeast:Catholic           0.8912900
Midwest:Protestant-Northeast:Catholic   0.9999980
Northeast:Protestant-Northeast:Catholic 0.8320198
South:Protestant-Northeast:Catholic     0.0492740
West:Protestant-Northeast:Catholic      0.7051412
West:Catholic-South:Catholic            0.9999967
Midwest:Other-South:Catholic            0.9999999
Northeast:Other-South:Catholic          0.2964306
South:Other-South:Catholic              0.9994899
West:Other-South:Catholic               0.9998663
Midwest:Protestant-South:Catholic       0.0951096
Northeast:Protestant-South:Catholic     1.0000000
South:Protestant-South:Catholic         0.9997519
West:Protestant-South:Catholic          1.0000000
Midwest:Other-West:Catholic             1.0000000
Northeast:Other-West:Catholic           0.1608574
South:Other-West:Catholic               0.9805024
West:Other-West:Catholic                0.9879418
Midwest:Protestant-West:Catholic        0.0492845
Northeast:Protestant-West:Catholic      0.9999716
South:Protestant-West:Catholic          1.0000000
West:Protestant-West:Catholic           0.9999093
Northeast:Other-Midwest:Other           0.1608072
South:Other-Midwest:Other               0.9900278
West:Other-Midwest:Other                0.9945672
Midwest:Protestant-Midwest:Other        0.0407377
Northeast:Protestant-Midwest:Other      0.9999977
South:Protestant-Midwest:Other          1.0000000
West:Protestant-Midwest:Other           0.9999910
South:Other-Northeast:Other             0.8308946
West:Other-Northeast:Other              0.6401319
Midwest:Protestant-Northeast:Other      1.0000000
Northeast:Protestant-Northeast:Other    0.6039518
South:Protestant-Northeast:Other        0.0080218
West:Protestant-Northeast:Other         0.4144931
West:Other-South:Other                  1.0000000
Midwest:Protestant-South:Other          0.5858756
Northeast:Protestant-South:Other        0.9999906
South:Protestant-South:Other            0.7973502
West:Protestant-South:Other             0.9999675
Midwest:Protestant-West:Other           0.3032525
Northeast:Protestant-West:Other         0.9999992
South:Protestant-West:Other             0.7918008
West:Protestant-West:Other              0.9999967
Northeast:Protestant-Midwest:Protestant 0.3663817
South:Protestant-Midwest:Protestant     0.0001153
West:Protestant-Midwest:Protestant      0.1604259
South:Protestant-Northeast:Protestant   0.9993160
West:Protestant-Northeast:Protestant    1.0000000
West:Protestant-South:Protestant        0.9961705