Visualizing Data with R

results = read.csv("http://people.hsc.edu/faculty-staff/blins/classes/spring19/math222/Examples/highbridge2018.csv")

One Categorical Variable

The main way to plot one categorical variable with with a bar graph. These are easy in R, just apply the plot() function to a vector of categorical data (factors).

plot(results$gender)

By default the y-axis represents the count. If you want percentages instead, you can use the following command:

barplot(table(results$gender)/nrow(results)*100,ylab='Percent',xlab='Gender',main='Gender of Runners')

One Quantitative Variable

There are two main ways to plot a quantitative variable: histograms and boxplots.

boxplot(results$minutes)

hist(results$minutes,xlab='Time (minutes)',ylab='Count',main='Distribution of Times',col='gray')

Two Variables

For two quantitative variables, you want a scatterplot:

plot(results$age,results$minutes)

If the explanatory variable is categorical, but the response variable is quantitative, then side-by-side boxplots are very useful. Instead of using the boxplot function, just use the plot function in this situation.

plot(results$gender,results$minutes)

For two categorical variables, it is common to make a two-way table of their counts. The table() function does this.

table(results$gender,results$state)
##    
##     GA IL MA MD NC OH VA
##   F  1  1  1  0  4  1 57
##   M  0  0  0  1  1  1 49

You can also make a table of counts for a single categorical variable:

table(results$state)
## 
##  GA  IL  MA  MD  NC  OH  VA 
##   1   1   1   1   5   2 106

ggplots Versions

The default plots in R are okay, but most people use the ggplots library to make plots now. These plots are a little harder to get started with, but they tend to be nicer looking and much easier to customize. Here are examples of how to use the ggplot library to create graphs like the ones above.

library(ggplot2)
ggplot(data = results, aes(x=minutes))+geom_histogram(binwidth=10,boundary=0,color='black',fill='gray')

ggplot(data = results, aes(x=gender,fill=gender))+geom_bar()

ggplot(data = results, aes(x=gender, y=minutes,fill=gender))+geom_boxplot()

ggplot(data = results, aes(x=age, y=minutes, col=gender))+geom_point()

You add details like labels, titles, and layout options to ggplots by adding layers.

ggplot(data = results, aes(x=age, y=minutes, col=gender))+geom_point()+labs(title="Race Times",x="Age",y="Time (minutes)")