\[ \newcommand{\on}{\operatorname} \newcommand{\R}{\mathbb{R}} \]
Today I was probably a little too ambitious! We started with the t-distribution. And then I tried to briefly explain how the same ideas apply to analysis of variance (ANOVA) tables in linear regression. But we ran out of time, so we’ll pick up that thread after spring break.
If \(Z \sim \on{Norm}(0,1)\) and \(U \sim \chi^2(d)\) are independent RVs, then \[T = \frac{Z}{\sqrt{U/d}}\] has the t-distribution with \(d\) degrees of freedom.
We did not calculate the density function for a t-distribution, but we have all of the tools to do so. Here is how you would get started. Since \(T\) is a function of 2 independent RVs, you can use the multivariate change of variables formula. It helps to use \(U\) as a second dummy variable so that the change of variables is invertible. Then we have: \[ T = \frac{Z}{\sqrt{U/d}}\] \[ U = U\] which can be inverted to get equations for \(Z\) and \(U\) as functions of \(T\) and \(U\): \[ Z = d^{-1/2} T U^{1/2}\] \[ U = U\] Then the joint density function for \((T,U)\) is: \[f_{(T,U)}(t,u) = f_{(Z,U)} (z,u) \left| \frac{\partial Z, U}{\partial T, U} \right|.\]
At this point we stopped, but I did offer extra credit to anyone who wants to finish the derivation of the t-distribution and explain how it works!
The second example we looked at was linear regression. In a simple linear regression model, you assume that each observed y-value is a random variable \(Y_i \sim \on{Norm}(\beta_0 + \beta_1 X_i, \sigma)\) where the numbers \(\beta_0, \beta_1,\) and \(\sigma\) are the parameters of the model. Note that \(\sigma\) represents the standard deviation of the residuals. To estimate \(\beta_0\) and \(\beta_1\), we solve the normal equations for \(b_0\) and \(b_1\): \[b = (X^TX)^{-1} X^T y.\] Since \(b_0\) and \(b_1\) come from sample data, they are statistics, and will probably not match the true values of \(\beta_0\) and \(\beta_1\) perfectly. We would also like a statistic that approximates \(\sigma\). To get that, we use analysis of variance (ANOVA). To keep this organized, it helps to use an ANOVA table like the one below:
| Source | Degrees of Freedom | Sum of Squares | Mean Square | F-value | 
|---|---|---|---|---|
| Model | 1 | \(\|\hat{y}-\bar{y}\|^2\) | \(= SSM/DFM\) | \(F = MSM/MSE\) | 
| Error | \(n-2\) | \(\|y-\hat{y}\|^2\) | \(= SSE/DFE\) | |
| Total | \(n-1\) | \(\|y-\bar{y}\|^2\) | \(= SST/DFT\) | 
Here are some exercises to get a better understanding of the entries in the ANOVA table.
Show that \(\hat{y}\) and \(\bar{y}\) (by which we mean the vector with all entries equal to \(\bar{y}\)) are both in the column space of \(X\).
Explain why \(y - \hat{y}\) is orthogonal to \(\hat{y} - \bar{y}\).
Show that \(SST = SSM + SSE\).
Show that \(SSE/\sigma^2\) has a \(\chi^2\) distribution with \(n-2\) degrees of freedom. This one is a little harder… we’ll come back to it after Spring break.
With all of that out of the way, we can better estimate the parameter \(\sigma\) in our regression model. The best estimator for \(\sigma^2\) is the mean squared error (MSE) given by \[MSE = \frac{\|y-\hat{y}\|^2}{n-2}.\]