Below is data from the 2008 General Social Survey about education levels and party affiliation. Notice that this is not a standard data frame because each row corresponds to a group of individuals, not a single individual.
myData = read.csv('http://people.hsc.edu/faculty-staff/blins/spring17/math222/data/polpartyfull.csv')
myData
## Education Politics Count
## 1 None StrongDemocrat 63
## 2 None WeakDemocrat 45
## 3 None NearDemocrat 34
## 4 None Independent 87
## 5 None NearRepublican 19
## 6 None WeakRepublican 20
## 7 None StrongRepublican 25
## 8 None Otherparty 2
## 9 Highschool StrongDemocrat 185
## 10 Highschool WeakDemocrat 183
## 11 Highschool NearDemocrat 132
## 12 Highschool Independent 156
## 13 Highschool NearRepublican 78
## 14 Highschool WeakRepublican 147
## 15 Highschool StrongRepublican 98
## 16 Highschool Otherparty 16
## 17 JrCollege StrongDemocrat 32
## 18 JrCollege WeakDemocrat 30
## 19 JrCollege NearDemocrat 19
## 20 JrCollege Independent 22
## 21 JrCollege NearRepublican 20
## 22 JrCollege WeakRepublican 33
## 23 JrCollege StrongRepublican 13
## 24 JrCollege Otherparty 4
## 25 Bachelor StrongDemocrat 56
## 26 Bachelor WeakDemocrat 44
## 27 Bachelor NearDemocrat 57
## 28 Bachelor Independent 31
## 29 Bachelor NearRepublican 35
## 30 Bachelor WeakRepublican 76
## 31 Bachelor StrongRepublican 43
## 32 Bachelor Otherparty 11
## 33 Graduate StrongDemocrat 54
## 34 Graduate WeakDemocrat 29
## 35 Graduate NearDemocrat 20
## 36 Graduate Independent 26
## 37 Graduate NearRepublican 10
## 38 Graduate WeakRepublican 27
## 39 Graduate StrongRepublican 22
## 40 Graduate Otherparty 5
Depending on how data is presented, R has several ways to make two way tables. Because each row in our data frame refers to a group of individuals, the fastest option is the R function xtabs. To manually enter the two way table yourself, see the Dolphin Therapy example from a couple weeks ago.
myTable = xtabs(myData$Count~myData$Education+myData$Politics)
myTable
## myData$Politics
## myData$Education Independent NearDemocrat NearRepublican Otherparty
## Bachelor 31 57 35 11
## Graduate 26 20 10 5
## Highschool 156 132 78 16
## JrCollege 22 19 20 4
## None 87 34 19 2
## myData$Politics
## myData$Education StrongDemocrat StrongRepublican WeakDemocrat
## Bachelor 56 43 44
## Graduate 54 22 29
## Highschool 185 98 183
## JrCollege 32 13 30
## None 63 25 45
## myData$Politics
## myData$Education WeakRepublican
## Bachelor 76
## Graduate 27
## Highschool 147
## JrCollege 33
## None 20
mosaicplot(myTable,color=T)
There are two problems with this plot:
las=1. This adjusts the style of the axis labels.factor() function in R lets you order the categories of categorical variable.education = factor(myData$Education,levels=c("None","Highschool","JrCollege","Bachelor","Graduate"),ordered=T)
politics = factor(myData$Politics,levels=c("StrongDemocrat","WeakDemocrat","NearDemocrat","Otherparty","Independent","NearRepublican","WeakRepublican","StrongRepublican"),ordered=T)
myTable = xtabs(myData$Count~education+politics)
Now make a new mosaic plot, and use it to describe the association between the two variables of education and politics.
Once you have a two-way table in R, the Chi-Squared Test for Association is super easy. Just use the chisq.test() function. As input, you should use a table or matrix with the counts (but not the row or column totals). You can also use the chisq.test() function for a Chi-Squared Goodness of Fit Test, but in that case you enter two vectors x and p, where x represents your observed count in each category, and p is the predicted probabilities for each category.