Political Parties and Education

myData = read.csv('polpartyfull.csv')

The head() function allows us to preview the data in the csv file without printing all of the data on the notebook. This way we can quickly check that the data loaded correctly, without filling up the page with useless information.

head(myData)

	Education	Politics	Count
1	None	StrongDemocrat	63
2	None	WeakDemocrat	45
3	None	NearDemocrat	34
4	None	Independent	87
5	None	NearRepublican	19
6	None	WeakRepublican	20

Notice that the rows in myData do not correspond to individual people, but rather to groups of people who all have the same educational background and political views. The count column tells us how many people we have in in group. Because of this, we cannot use the table() function to convert this data into an R table. We need to use the xtabs() function below, which has a somewhat strange input format:

myTable = xtabs(myData$Count~myData$Education+myData$Politics)
myTable

                myData$Politics
myData$Education Independent NearDemocrat NearRepublican Otherparty
      Bachelor            31           57             35         11
      Graduate            26           20             10          5
      Highschool         156          132             78         16
      JrCollege           22           19             20          4
      None                87           34             19          2
                myData$Politics
myData$Education StrongDemocrat StrongRepublican WeakDemocrat WeakRepublican
      Bachelor               56               43           44             76
      Graduate               54               22           29             27
      Highschool            185               98          183            147
      JrCollege              32               13           30             33
      None                   63               25           45             20

mosaicplot(myTable,color=T)

Note that this mosaic plot is a mess, mostly because the education levels and political offiliations are not in order. Also, the text on the left side overlaps, so it is very hard to read. In the cells below, I will order the political offiliation and education level data to fix this problem.

education = factor(myData$Education,levels=c("None","Highschool","JrCollege","Bachelor","Graduate"),ordered=T)
politics = factor(myData$Politics,levels=c("StrongDemocrat","WeakDemocrat","NearDemocrat","Otherparty","Independent","NearRepublican",
                    "WeakRepublican","StrongRepublican"),ordered=T)

The new variables, education and politics have the same information as myData$Education and myData$Politics, respectively, but now the categorical variables have an order that will be respected when they are put into a table or a mosaic plot.

myTable = xtabs(myData$Count~education+politics)
myTable

            politics
education    StrongDemocrat WeakDemocrat NearDemocrat Otherparty Independent
  None                   63           45           34          2          87
  Highschool            185          183          132         16         156
  JrCollege              32           30           19          4          22
  Bachelor               56           44           57         11          31
  Graduate               54           29           20          5          26
            politics
education    NearRepublican WeakRepublican StrongRepublican
  None                   19             20               25
  Highschool             78            147               98
  JrCollege              20             33               13
  Bachelor               35             76               43
  Graduate               10             27               22

mosaicplot(myTable,color=T,las=1)

Now we can see what is going on. It appears that as education level increases from high school drop outs to high school graduates to college graduates, the proportion of people who are on the more conservative end of the spectrum increases a little and the proportion of people on the Democrat end decreases, but the trend reverses when we look at people with graduate degrees. They are more likely than anyone else to be in the StrongDemocrat category. Now that we can see that there is an apparent association between education and political views, lets do a chi-squared test to test the strength of the association. Notice the las=1 option in the mosaicplot() command. This sets all plot labels to horizontal.

chisq.test(myTable)

Warning message:
In chisq.test(myTable): Chi-squared approximation may be incorrect





    Pearson's Chi-squared test

data:  myTable
X-squared = 111.01, df = 28, p-value = 7.753e-12

Since the p-value is astronomically small, we can be very confident that the association we observed in the mosaic plot above is not a random fluke, but instead truely reflects an association between education level and political views in the population. Notice the warning message above. That warning is probably due to the fact that the counts in the Otherparty column are so low (two below 5). We can remove that column from the table and recompute the chi-squared test to make sure.

myTable[,-c(4)]

	StrongDemocrat	WeakDemocrat	NearDemocrat	Independent	NearRepublican	WeakRepublican	StrongRepublican
None	63	45	34	87	19	20	25
Highschool	185	183	132	156	78	147	98
JrCollege	32	30	19	22	20	33	13
Bachelor	56	44	57	31	35	76	43
Graduate	54	29	20	26	10	27	22

chisq.test(myTable[,-c(4)])

    Pearson's Chi-squared test

data:  myTable[, -c(4)]
X-squared = 104.63, df = 24, p-value = 4.832e-12

Now there is no warning and the p-value is roughly the same order of magnitude. Also there are relatively few respondents who identified with another political party, so removing those people from the data is completely reasonable.