Politics and Education

Below is data from the 2008 General Social Survey about education levels and party affiliation. Notice that this is not a standard data frame because each row corresponds to a group of individuals, not a single individual.

myData = read.csv('http://people.hsc.edu/faculty-staff/blins/spring17/math222/data/polpartyfull.csv')
myData
##     Education         Politics Count
## 1        None   StrongDemocrat    63
## 2        None     WeakDemocrat    45
## 3        None     NearDemocrat    34
## 4        None      Independent    87
## 5        None   NearRepublican    19
## 6        None   WeakRepublican    20
## 7        None StrongRepublican    25
## 8        None       Otherparty     2
## 9  Highschool   StrongDemocrat   185
## 10 Highschool     WeakDemocrat   183
## 11 Highschool     NearDemocrat   132
## 12 Highschool      Independent   156
## 13 Highschool   NearRepublican    78
## 14 Highschool   WeakRepublican   147
## 15 Highschool StrongRepublican    98
## 16 Highschool       Otherparty    16
## 17  JrCollege   StrongDemocrat    32
## 18  JrCollege     WeakDemocrat    30
## 19  JrCollege     NearDemocrat    19
## 20  JrCollege      Independent    22
## 21  JrCollege   NearRepublican    20
## 22  JrCollege   WeakRepublican    33
## 23  JrCollege StrongRepublican    13
## 24  JrCollege       Otherparty     4
## 25   Bachelor   StrongDemocrat    56
## 26   Bachelor     WeakDemocrat    44
## 27   Bachelor     NearDemocrat    57
## 28   Bachelor      Independent    31
## 29   Bachelor   NearRepublican    35
## 30   Bachelor   WeakRepublican    76
## 31   Bachelor StrongRepublican    43
## 32   Bachelor       Otherparty    11
## 33   Graduate   StrongDemocrat    54
## 34   Graduate     WeakDemocrat    29
## 35   Graduate     NearDemocrat    20
## 36   Graduate      Independent    26
## 37   Graduate   NearRepublican    10
## 38   Graduate   WeakRepublican    27
## 39   Graduate StrongRepublican    22
## 40   Graduate       Otherparty     5

Making a Two-Way Table

Depending on how data is presented, R has several ways to make two way tables. Because each row in our data frame refers to a group of individuals, the fastest option is the R function xtabs. To manually enter the two way table yourself, see the Dolphin Therapy example from a couple weeks ago.

myTable = xtabs(myData$Count~myData$Education+myData$Politics)
myTable
##                 myData$Politics
## myData$Education Independent NearDemocrat NearRepublican Otherparty
##       Bachelor            31           57             35         11
##       Graduate            26           20             10          5
##       Highschool         156          132             78         16
##       JrCollege           22           19             20          4
##       None                87           34             19          2
##                 myData$Politics
## myData$Education StrongDemocrat StrongRepublican WeakDemocrat
##       Bachelor               56               43           44
##       Graduate               54               22           29
##       Highschool            185               98          183
##       JrCollege              32               13           30
##       None                   63               25           45
##                 myData$Politics
## myData$Education WeakRepublican
##       Bachelor               76
##       Graduate               27
##       Highschool            147
##       JrCollege              33
##       None                   20

Mosaic Plots

mosaicplot(myTable,color=T)

There are two problems with this plot:

  1. The labels for the categories on the y-axis are unreadable. To fix this, add the option las=1. This adjusts the style of the axis labels.
  2. Both of our variables education and political affiliation have a natural ordering, but this doesn’t show up in the two-way table. The factor() function in R lets you order the categories of categorical variable.
education = factor(myData$Education,levels=c("None","Highschool","JrCollege","Bachelor","Graduate"),ordered=T)
politics = factor(myData$Politics,levels=c("StrongDemocrat","WeakDemocrat","NearDemocrat","Otherparty","Independent","NearRepublican","WeakRepublican","StrongRepublican"),ordered=T)
myTable = xtabs(myData$Count~education+politics)

Now make a new mosaic plot, and use it to describe the association between the two variables of education and politics.

Chi-squared Test

Once you have a two-way table in R, the Chi-Squared Test for Association is super easy. Just use the chisq.test() function. As input, you should use a table or matrix with the counts (but not the row or column totals). You can also use the chisq.test() function for a Chi-Squared Goodness of Fit Test, but in that case you enter two vectors x and p, where x represents your observed count in each category, and p is the predicted probabilities for each category.