Below is data from the 2008 General Social Survey about education levels and party affiliation. This time we will look at ways to manipulate tables in R to easily work with data like this. First, education levels would normally be sorted alphabetically… but it would be better to sort them in order of lowest to highest education. You do this with the factor()
command in R. You can also sort political opinions from left-wing to right-wing. Here is an example.
myData = read.csv('http://people.hsc.edu/faculty-staff/blins/classes/spring17/math222/data/polpartyfull.csv')
education = factor(myData$Education,levels=c("None","Highschool","JrCollege","Bachelor","Graduate"),ordered=T)
politics = factor(myData$Politics,levels=c("StrongDemocrat","WeakDemocrat","NearDemocrat","Otherparty","Independent","NearRepublican","WeakRepublican","StrongRepublican"),ordered=T)
myTable = xtabs(myData$Count~education+politics)
myTable
## politics
## education StrongDemocrat WeakDemocrat NearDemocrat Otherparty
## None 63 45 34 2
## Highschool 185 183 132 16
## JrCollege 32 30 19 4
## Bachelor 56 44 57 11
## Graduate 54 29 20 5
## politics
## education Independent NearRepublican WeakRepublican StrongRepublican
## None 87 19 20 25
## Highschool 156 78 147 98
## JrCollege 22 20 33 13
## Bachelor 31 35 76 43
## Graduate 26 10 27 22
To select a single row or column of a table in R, use the following syntax.
myTable["Bachelor",]
## StrongDemocrat WeakDemocrat NearDemocrat Otherparty
## 56 44 57 11
## Independent NearRepublican WeakRepublican StrongRepublican
## 31 35 76 43
In the table above, there are very few people who identify as Otherparty
. This might cause problems for the chi-square test. Why not remove that column of the table? The command to do this is: myTable[,-4]
. If I wanted to remove a row instead, I would use the command myTable[-m,]
where m is the number of the row I wanted to remove. Or you can use the name of the row or column instead of the number.
myTable = myTable[,-4]
myTable
## politics
## education StrongDemocrat WeakDemocrat NearDemocrat Independent
## None 63 45 34 87
## Highschool 185 183 132 156
## JrCollege 32 30 19 22
## Bachelor 56 44 57 31
## Graduate 54 29 20 26
## politics
## education NearRepublican WeakRepublican StrongRepublican
## None 19 20 25
## Highschool 78 147 98
## JrCollege 20 33 13
## Bachelor 35 76 43
## Graduate 10 27 22
The cbind()
function lets you combine column vectors into a matrix. You can use this to combine columns from a table in R. There is also an rbind()
function that builds a matrix out of rows.
myTable2 = cbind(myTable[,"StrongDemocrat"]+myTable[,"WeakDemocrat"]+myTable[,"NearDemocrat"],myTable[,"Independent"],myTable[,"StrongRepublican"]+myTable[,"WeakRepublican"]+myTable[,"NearRepublican"])
myTable2
## [,1] [,2] [,3]
## None 142 87 64
## Highschool 500 156 323
## JrCollege 81 22 66
## Bachelor 157 31 154
## Graduate 103 26 59
Notice that we lost the column names. You make new column names using the colnames()
function:
colnames(myTable2)=c("Democrat","Independent","Republican")
myTable2
## Democrat Independent Republican
## None 142 87 64
## Highschool 500 156 323
## JrCollege 81 22 66
## Bachelor 157 31 154
## Graduate 103 26 59
mosaicplot(myTable2,col=T,las=1)