Today we talked about randomized controlled experiments which let you establish a cause and effect relationship between two variables. We defined:
Experiments Are studies where the researchers impose one or more treatments on the individuals. This is different than an observational study which has no treatment.
A variable is controlled if it is recorded or accounted for in the study.
Randomizing is when individuals are randomly assigned to the different treatment groups.
Major Concept: In a randomized controlled experiment, all potential lurking variables are controlled by the randomization step. That means if you find an association between the treatment variable and the response variable, then you can conclude that the treatment was the cause of the effect.
Today talked about scatterplots and correlation. We started with this workshop:
Then we introduced the correlation coefficient \(R\) which measures the strength of a linear trend in a scatterplot. The value of \(R\) is always between \(-1\) and \(+1\).
We showed how to use the =CORREL
function in Excel to find \(R\), and also how to guess \(R\) using examples:
Today we talked about how to find the least squares trendline for a scatterplot. This is the line that has the smallest sum of squared residuals where a residual is the vertical distance between a point in the scatterplot and the line. The least squared trendline always passes through the point \((\bar{x},\bar{y})\) with slope \(m = R\dfrac{s_y}{s_x}\) where \(\bar{x}, \bar{y}\) are the average x and y-values of the data, \(s_x\) and \(s_y\) are the standard deviations of the data, and \(R\) is the correlation coefficient.
The trendline has two important applications:
Rate of change. The slope tells you the rate of change. For every 1 unit that \(x\) increases, the \(y\)-value tends to increase by the slope.
Predicting average y-values You can also predict the average y-value based on different x-values.