Week 8 Lecture Notes

Monday, March 2

Today I was probably a little too ambitious! We started with the t-distribution. And then I tried to briefly explain how the same ideas apply to analysis of variance (ANOVA) tables in linear regression. But we ran out of time, so we’ll pick up that thread after spring break.

Definition (t-Distribution)

If ZNorm(0,1)Z \sim \operatorname{Norm}(0,1) and Uχ2(d)U \sim \chi^2(d) are independent RVs, then T=ZU/dT = \frac{Z}{\sqrt{U/d}} has the t-distribution with dd degrees of freedom.

  1. Use the results from last week to show that if x\bar{x} is the sample mean from nn independent observations from a normal distribution with mean μ\mu and standard deviation σ\sigma, then t=xμσ/nt = \frac{\bar{x} - \mu}{\sigma/\sqrt{n}} has a t-distribution with n1n-1 degrees of freedom.

We did not calculate the density function for a t-distribution, but we have all of the tools to do so. Here is how you would get started. Since TT is a function of 2 independent RVs, you can use the multivariate change of variables formula. It helps to use UU as a second dummy variable so that the change of variables is invertible. Then we have: T=ZU/d T = \frac{Z}{\sqrt{U/d}} U=U U = U which can be inverted to get equations for ZZ and UU as functions of TT and UU: Z=d1/2TU1/2 Z = d^{-1/2} T U^{1/2} U=U U = U Then the joint density function for (T,U)(T,U) is: f(T,U)(t,u)=f(Z,U)(z,u)|Z,UT,U|.f_{(T,U)}(t,u) = f_{(Z,U)} (z,u) \left| \frac{\partial Z, U}{\partial T, U} \right|.

  1. Show that the Jacobian determinant above is u/d\sqrt{u/d}.

At this point we stopped, but I did offer extra credit to anyone who wants to finish the derivation of the t-distribution and explain how it works!

The second example we looked at was linear regression. In a simple linear regression model, you assume that each observed y-value is a random variable YiNorm(β0+β1Xi,σ)Y_i \sim \operatorname{Norm}(\beta_0 + \beta_1 X_i, \sigma) where the numbers β0,β1,\beta_0, \beta_1, and σ\sigma are the parameters of the model. Note that σ\sigma represents the standard deviation of the residuals. To estimate β0\beta_0 and β1\beta_1, we solve the normal equations for b0b_0 and b1b_1: b=(XTX)1XTy.b = (X^TX)^{-1} X^T y. Since b0b_0 and b1b_1 come from sample data, they are statistics, and will probably not match the true values of β0\beta_0 and β1\beta_1 perfectly. We would also like a statistic that approximates σ\sigma. To get that, we use analysis of variance (ANOVA). To keep this organized, it helps to use an ANOVA table like the one below:

Source Degrees of Freedom Sum of Squares Mean Square F-value
Model 1 ŷy2\|\hat{y}-\bar{y}\|^2 =SSM/DFM= SSM/DFM F=MSM/MSEF = MSM/MSE
Error n2n-2 yŷ2\|y-\hat{y}\|^2 =SSE/DFE= SSE/DFE
Total n1n-1 yy2\|y-\bar{y}\|^2 =SST/DFT= SST/DFT

Here are some exercises to get a better understanding of the entries in the ANOVA table.

  1. Show that ŷ\hat{y} and y\bar{y} (by which we mean the vector with all entries equal to y\bar{y}) are both in the column space of XX.

  2. Explain why yŷy - \hat{y} is orthogonal to ŷy\hat{y} - \bar{y}.

  3. Show that SST=SSM+SSESST = SSM + SSE.

  4. Show that SSE/σ2SSE/\sigma^2 has a χ2\chi^2 distribution with n2n-2 degrees of freedom. This one is a little harder… we’ll come back to it after Spring break.

With all of that out of the way, we can better estimate the parameter σ\sigma in our regression model. The best estimator for σ2\sigma^2 is the mean squared error (MSE) given by MSE=yŷ2n2.MSE = \frac{\|y-\hat{y}\|^2}{n-2}.

  1. Show that the expected value of MSEMSE is σ2\sigma^2, so MSEMSE is an unbiased estimator for σ2\sigma^2.