Today I was probably a little too ambitious! We started with the t-distribution. And then I tried to briefly explain how the same ideas apply to analysis of variance (ANOVA) tables in linear regression. But we ran out of time, so we’ll pick up that thread after spring break.
If and are independent RVs, then has the t-distribution with degrees of freedom.
We did not calculate the density function for a t-distribution, but we have all of the tools to do so. Here is how you would get started. Since is a function of 2 independent RVs, you can use the multivariate change of variables formula. It helps to use as a second dummy variable so that the change of variables is invertible. Then we have: which can be inverted to get equations for and as functions of and : Then the joint density function for is:
At this point we stopped, but I did offer extra credit to anyone who wants to finish the derivation of the t-distribution and explain how it works!
The second example we looked at was linear regression. In a simple linear regression model, you assume that each observed y-value is a random variable where the numbers and are the parameters of the model. Note that represents the standard deviation of the residuals. To estimate and , we solve the normal equations for and : Since and come from sample data, they are statistics, and will probably not match the true values of and perfectly. We would also like a statistic that approximates . To get that, we use analysis of variance (ANOVA). To keep this organized, it helps to use an ANOVA table like the one below:
| Source | Degrees of Freedom | Sum of Squares | Mean Square | F-value |
|---|---|---|---|---|
| Model | 1 | |||
| Error | ||||
| Total |
Here are some exercises to get a better understanding of the entries in the ANOVA table.
Show that and (by which we mean the vector with all entries equal to ) are both in the column space of .
Explain why is orthogonal to .
Show that .
Show that has a distribution with degrees of freedom. This one is a little harder… we’ll come back to it after Spring break.
With all of that out of the way, we can better estimate the parameter in our regression model. The best estimator for is the mean squared error (MSE) given by