Math 111 - Week 8 Notes

Mon, Mar 14

Today we talked about randomized controlled experiments which let you establish a cause and effect relationship between two variables. We defined:

Major Concept: In a randomized controlled experiment, all potential lurking variables are controlled by the randomization step. That means if you find an association between the treatment variable and the response variable, then you can conclude that the treatment was the cause of the effect.


Wed, Mar 16

Today talked about scatterplots and correlation. We started with this workshop:

Then we introduced the correlation coefficient \(R\) which measures the strength of a linear trend in a scatterplot. The value of \(R\) is always between \(-1\) and \(+1\).

We showed how to use the =CORREL function in Excel to find \(R\), and also how to guess \(R\) using examples:


Fri, Mar 18

Today we talked about how to find the least squares trendline for a scatterplot. This is the line that has the smallest sum of squared residuals where a residual is the vertical distance between a point in the scatterplot and the line. The least squared trendline always passes through the point \((\bar{x},\bar{y})\) with slope \(m = R\dfrac{s_y}{s_x}\) where \(\bar{x}, \bar{y}\) are the average x and y-values of the data, \(s_x\) and \(s_y\) are the standard deviations of the data, and \(R\) is the correlation coefficient.

The trendline has two important applications:

  1. Rate of change. The slope tells you the rate of change. For every 1 unit that \(x\) increases, the \(y\)-value tends to increase by the slope.

  2. Predicting average y-values You can also predict the average y-value based on different x-values.