Politics and Education: Working with Tables in R

Below is data from the 2008 General Social Survey about education levels and party affiliation. This time we will look at ways to manipulate tables in R to easily work with data like this. First, education levels would normally be sorted alphabetically… but it would be better to sort them in order of lowest to highest education. You do this with the factor() command in R. You can also sort political opinions from left-wing to right-wing. Here is an example.

myData = read.csv('http://people.hsc.edu/faculty-staff/blins/classes/spring17/math222/data/polpartyfull.csv')
education = factor(myData$Education,levels=c("None","Highschool","JrCollege","Bachelor","Graduate"),ordered=T)
politics = factor(myData$Politics,levels=c("StrongDemocrat","WeakDemocrat","NearDemocrat","Otherparty","Independent","NearRepublican","WeakRepublican","StrongRepublican"),ordered=T)
myTable = xtabs(myData$Count~education+politics)
myTable
##             politics
## education    StrongDemocrat WeakDemocrat NearDemocrat Otherparty
##   None                   63           45           34          2
##   Highschool            185          183          132         16
##   JrCollege              32           30           19          4
##   Bachelor               56           44           57         11
##   Graduate               54           29           20          5
##             politics
## education    Independent NearRepublican WeakRepublican StrongRepublican
##   None                87             19             20               25
##   Highschool         156             78            147               98
##   JrCollege           22             20             33               13
##   Bachelor            31             35             76               43
##   Graduate            26             10             27               22

Selecting a Single Row or Column

To select a single row or column of a table in R, use the following syntax.

myTable["Bachelor",]
##   StrongDemocrat     WeakDemocrat     NearDemocrat       Otherparty 
##               56               44               57               11 
##      Independent   NearRepublican   WeakRepublican StrongRepublican 
##               31               35               76               43

Removing Rows or Columns

In the table above, there are very few people who identify as Otherparty. This might cause problems for the chi-square test. Why not remove that column of the table? The command to do this is: myTable[,-4]. If I wanted to remove a row instead, I would use the command myTable[-m,] where m is the number of the row I wanted to remove. Or you can use the name of the row or column instead of the number.

myTable = myTable[,-4]
myTable
##             politics
## education    StrongDemocrat WeakDemocrat NearDemocrat Independent
##   None                   63           45           34          87
##   Highschool            185          183          132         156
##   JrCollege              32           30           19          22
##   Bachelor               56           44           57          31
##   Graduate               54           29           20          26
##             politics
## education    NearRepublican WeakRepublican StrongRepublican
##   None                   19             20               25
##   Highschool             78            147               98
##   JrCollege              20             33               13
##   Bachelor               35             76               43
##   Graduate               10             27               22

Combining Rows or Columns

The cbind() function lets you combine column vectors into a matrix. You can use this to combine columns from a table in R. There is also an rbind() function that builds a matrix out of rows.

myTable2 = cbind(myTable[,"StrongDemocrat"]+myTable[,"WeakDemocrat"]+myTable[,"NearDemocrat"],myTable[,"Independent"],myTable[,"StrongRepublican"]+myTable[,"WeakRepublican"]+myTable[,"NearRepublican"])
myTable2
##            [,1] [,2] [,3]
## None        142   87   64
## Highschool  500  156  323
## JrCollege    81   22   66
## Bachelor    157   31  154
## Graduate    103   26   59

Notice that we lost the column names. You make new column names using the colnames() function:

colnames(myTable2)=c("Democrat","Independent","Republican")
myTable2
##            Democrat Independent Republican
## None            142          87         64
## Highschool      500         156        323
## JrCollege        81          22         66
## Bachelor        157          31        154
## Graduate        103          26         59
mosaicplot(myTable2,col=T,las=1)