Below is data from the 2008 General Social Survey about education levels and party affiliation. This time we will look at ways to manipulate tables in R to easily work with data like this.
myData = read.csv('http://people.hsc.edu/faculty-staff/blins/spring17/math222/data/polpartyfull.csv')
education = factor(myData$Education,levels=c("None","Highschool","JrCollege","Bachelor","Graduate"),ordered=T)
politics = factor(myData$Politics,levels=c("StrongDemocrat","WeakDemocrat","NearDemocrat","Otherparty","Independent","NearRepublican","WeakRepublican","StrongRepublican"),ordered=T)
myTable = xtabs(myData$Count~education+politics)
myTable
## politics
## education StrongDemocrat WeakDemocrat NearDemocrat Otherparty
## None 63 45 34 2
## Highschool 185 183 132 16
## JrCollege 32 30 19 4
## Bachelor 56 44 57 11
## Graduate 54 29 20 5
## politics
## education Independent NearRepublican WeakRepublican StrongRepublican
## None 87 19 20 25
## Highschool 156 78 147 98
## JrCollege 22 20 33 13
## Bachelor 31 35 76 43
## Graduate 26 10 27 22
To select a single row or column of a table in R, use the following syntax.
myTable["Bachelor",]
## StrongDemocrat WeakDemocrat NearDemocrat Otherparty
## 56 44 57 11
## Independent NearRepublican WeakRepublican StrongRepublican
## 31 35 76 43
In the table above, there are very few people who identify as Otherparty. This might cause problems for the chi-square test. Why not remove that column of the table? The command to do this is: myTable[,-4]. If I wanted to remove a row instead, I would use the command myTable[-m,] where m is the number of the row I wanted to remove. Or you can use the name of the row or column instead of the number.
myTable = myTable[,-4]
myTable
## politics
## education StrongDemocrat WeakDemocrat NearDemocrat Independent
## None 63 45 34 87
## Highschool 185 183 132 156
## JrCollege 32 30 19 22
## Bachelor 56 44 57 31
## Graduate 54 29 20 26
## politics
## education NearRepublican WeakRepublican StrongRepublican
## None 19 20 25
## Highschool 78 147 98
## JrCollege 20 33 13
## Bachelor 35 76 43
## Graduate 10 27 22
The cbind() function lets you combine column vectors into a matrix. You can use this to combine columns from a table in R. There is also an rbind() function that builds a matrix out of rows.
myTable2 = cbind(myTable[,"StrongDemocrat"]+myTable[,"WeakDemocrat"]+myTable[,"NearDemocrat"],myTable[,"Independent"],myTable[,"StrongRepublican"]+myTable[,"WeakRepublican"]+myTable[,"NearRepublican"])
myTable2
## [,1] [,2] [,3]
## None 142 87 64
## Highschool 500 156 323
## JrCollege 81 22 66
## Bachelor 157 31 154
## Graduate 103 26 59
Notice that we lost the column names. You make new column names using the colnames() function:
colnames(myTable2)=c("Democrat","Independent","Republican")
myTable2
## Democrat Independent Republican
## None 142 87 64
## Highschool 500 156 323
## JrCollege 81 22 66
## Bachelor 157 31 154
## Graduate 103 26 59
mosaicplot(myTable2,col=T,las=1)