Week 7 Lecture Notes

Monday, February 24

Today we talked about covariance and multivariate normal distributions. We started with the following warm-up problems:

  1. Suppose that X1,X2X_1, X_2 are i.i.d. Norm(0,1)\operatorname{Norm}(0,1) random variables. What is the joint density function for the random vector X=(X1,X2)TX = (X_1,X_2)^T?

  2. Let R=[cosθsinθsinθcosθ]R = \begin{bmatrix} \cos \theta & - \sin \theta \\ \sin \theta & \cos \theta \end{bmatrix}. Notice that RR is rotation matrix for an angle of θ\theta. Let Y=RXY = RX where XX has the joint distribution above. What is the joint distribution function for YY?

Definition (Covariance)

Suppose XX and YY are random variables with respective means μX\mu_X and μY\mu_Y. The covariance of XX and YY is Cov(X,Y)=E((XμX)(YμY))\operatorname{Cov}(X,Y) = E((X-\mu_X)(Y-\mu_Y))

  1. Show that the following alternative formula for covariance is also valid: Cov(X,Y)=E(XY)E(X)E(Y).\operatorname{Cov}(X,Y) = E(XY) - E(X)E(Y).

Theorem (Covariance is Linear)

Suppose X1,X2X_1, X_2 and YY are random variables and a,ba, b are constants. Then, Cov(aX1+bX2,Y)=aCov(X1,Y)+bCov(X2,Y).\operatorname{Cov}(aX_1 + b X_2,Y) = a \operatorname{Cov}(X_1,Y) + b \operatorname{Cov}(X_2,Y).


Theorem (Covariance is Symmetric)

Suppose XX and YY are random variables, then Cov(X,Y)=Cov(Y,X).\operatorname{Cov}(X,Y) = \operatorname{Cov}(Y,X).


Theorem (Covariance of Independent Random Variables)

Suppose XX and YY are independent random variables, then Cov(X,Y)=0\operatorname{Cov}(X,Y) = 0.

Notice, this is not an if and only if theorem. The converse is not always true.

You can also talk about the covariance of random vectors.

Definition (Covariance Matrix)

Suppose that X=[X1Xn]X = \begin{bmatrix} X_1 \\ \vdots \\ X_n \end{bmatrix} is a random vector with entry-wise means μX=[μX1μXn]\mu_X = \begin{bmatrix} \mu_{X_1} \\ \vdots \\ \mu_{X_n}\end{bmatrix}. The covariance matrix for XX is ΣX=[Cov(X1,X1)Cov(X1,Xn)Cov(Xn,X1)Cov(Xn,Xn)].\Sigma_X = \begin{bmatrix} \operatorname{Cov} (X_1, X_1) & \ldots & \operatorname{Cov} (X_1, X_n) \\ \vdots & & \vdots \\ \operatorname{Cov} (X_n, X_1) & \ldots & \operatorname{Cov} (X_n, X_n) \end{bmatrix}.

Observe that the entry in row ii, column jj of the covariance matrix is the covariance of XiX_i and XjX_j:

It is not hard to show that covariance matrices have the following linearity property:

Theorem (Linearity of Covariance Matrices)

Suppose that XX is a random vector and AA is a matrix of the right size so that AXAX makes sense. Then Cov(AX,AX)=ACov(X,X)AT.\operatorname{Cov}(AX,AX) = A \operatorname{Cov}(X,X) A^T.


Definition (Multivariate Normal Distribution)

A random vector X=(X1,,Xn)TX = (X_1, \ldots, X_n)^T has a multivariate normal distribution if the joint density function for XX is fX(x)=1(2π)n/2|Σ|1/2exp(12(xμX)TΣ1(xμX)).f_X(x) = \frac{1}{(2\pi)^{n/2} |\Sigma|^{1/2}} \exp \left( -\tfrac{1}{2}(x-\mu_X)^T\Sigma^{-1}(x-\mu_X) \right). where μX\mu_X is the vector of entry-wise means for XX and Σ\Sigma is the covariance matrix of XX (and |Σ||\Sigma| is the determinant of Σ\Sigma). The parameters for a multivariate normal distribution are the vector μX\mu_X and the covariance matrix Σ\Sigma. These completely determine the joint distribution function.

If Σ\Sigma is the nn-by-nn identity matrix and μX=0\mu_X = 0, then XX has the standard multivariate normal distribution: fX(x)=1(2π)ne12x2.f_X(x) = \frac{1}{(\sqrt{2\pi})^n} e^{-\frac{1}{2} \|x \|^2}.

  1. What is the covariance matrix ΣX\Sigma_X for the random vector XX in exercise 1?

Wednesday, February 26

Today we will look at some applications of Multivariate Normal Distributions (MVNs). Recall that a MVN distribution can be completely described by two pieces of information: the vector of means μX\mu_X and the covariance matrix ΣX\Sigma_X.

Theorem (Transformations of Multivariate Normal Distributions)

If a random vector X=(X1,,Xn)TX = (X_1, \ldots, X_n)^T has a multivariate normal distribution with vector of means μX\mu_X and covariance matrix ΣX\Sigma_X, and Am×nA \in \mathbb{R}^{m \times n} is any matrix, then the random variable Y=AXY = AX has a multivariate normal distribution with mean μY=AμX\mu_Y = A\mu_X and covariance matrix ΣY=AΣXAT\Sigma_Y = A \Sigma_X A^T.

Remark: We didn’t prove this theorem, but the proof when AA is invertible is one of this week’s homework problems. The theorem is still true, even if AA is not invertible, but that is a little harder to prove.

  1. Suppose that a=[123]a = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} and b=[111]b = \begin{bmatrix} 1 \\ 1 \\ -1 \end{bmatrix}. Find the distribution of Y=[Y1Y2]Y= \begin{bmatrix} Y_1 \\ Y_2 \end{bmatrix} where Y1=aTXY_1 = a^T X and Y2=bTXY_2 = b^T X. (Hint: YY is a linear transformation of XX. What is the transformation matrix?) What is the the joint density function for Y1Y_1 and Y2Y_2, and how can you tell that Y1Y_1 and Y2Y_2 are independent random variables?

Theorem (Independence and MVNs)

If X1,,XnX_1, \ldots, X_n are i.i.d. normal random variables with mean μ\mu and variance σ2\sigma^2, and a,bn×na, b \in \mathbb{R}^{n \times n}, then aTXa^T X and bTXb^T X are independent random variables if and only if aa and bb are orthogonal.

  1. Suppose X1,XnNorm(μ,σ)X_1, \ldots X_n \sim \operatorname{Norm}(\mu, \sigma) are i.i.d. RVs. What is the covariance matrix for the random vector XX?

  2. Suppose that X1,XnNorm(μ,σ)X_1, \ldots X_n \sim \operatorname{Norm}(\mu, \sigma) are i.i.d. RVs. Show that the average value x\bar{x} is independent of XixX_i - \bar{x} for every ii.

An immediate consequence is the following result.

Theorem (Independence of Sample Mean and Sample Variance)

If X1,,XnX_1, \ldots, X_n is a random sample of NN independent observations chosen from a population with a normal distribution, then the sample mean x=1n(X1++Xn)\bar{x} = \frac{1}{n}(X_1 + \ldots + X_n) and the sample variance s2=1n1i(Xix)2s^2 = \frac{1}{n-1} \sum_i (X_i-\bar{x})^2 are independent random variables.

  1. Use what you know of MVNs to say what the distribution of x\bar{x} is. Do you get the same answer as what you get using MGFs?

Friday, February 28

Today we defined the χ2\chi^2-distribution and we used it to explain why you divide by n1n-1 instead of nn in the formula for the sample variance.

Definition (χ2\chi^2-Distribution)

Suppose that XX is a random vector whose entries X1,,XnX_1, \ldots, X_n are independent Norm(0,1)\operatorname{Norm}(0,1) random variables. The χ2\chi^2-distribution with nn degrees of freedom is the probability distribution for $|X |^2.

  1. Let X1,X2X_1, X_2 be independent Norm(0,1)\operatorname{Norm}(0,1) random variables. Let Y=X12+X22Y = X_1^2 + X_2^2. Use a polar coordinates change of variables to set up and calculate a double integral for the probability P(YR2)P(Y \le R^2) for some fixed R>0R > 0.

Theorem (Orthogonal Projection & χ2\chi^2)

Suppose that XX is a random vector whose entries X1,,XnX_1, \ldots, X_n are independent Norm(μ,σ)\operatorname{Norm}(\mu,\sigma) random variables and Pn×nP \in \mathbb{R}^{n \times n} is an orthogonal projection matrix. Then 1σ2PX2\frac{1}{\sigma^2} \|PX\|^2 has a χ2\chi^2 distribution with degrees of freedom equal to the dimension of Range(P)\operatorname{Range}(P).

We didn’t try to prove this theorem, but if we have time later in the semester we might come back to it. As an application of this theorem, consider the sample variance of a collection of independent observations X1,XnX_1, \ldots X_n from a normal distribution with mean μ\mu and variance σ2\sigma^2:
s2=1n1i=1n(xix)2.s^2 = \frac{1}{n-1} \sum_{i = 1}^n (x_i - \bar{x})^2. The expression i=1n(xix)2=(I1neeT)X\sum_{i = 1}^n (x_i - \bar{x})^2 = \| (I - \frac{1}{n} ee^T) X \| and we proved in Homework 3, Problem 2 that I1neeTI-\frac{1}{n}ee^T is an orthogonal projection.

  1. What is the nullspace of I1neeTI - \frac{1}{n}ee^T. Hint: show that a vector vnv \in \mathbb{R}^n is in Null(I1neeT)\operatorname{Null}(I-\frac{1}{n}ee^T) if and only if all of the entries of vv are the same.

  2. Use the Fundamental Theorem of Linear Algebra to show that the dimension of Range(I1neeT)=n1\operatorname{Range}(I-\frac{1}{n}ee^T) = n-1.

From this, we can see that s2s^2 is σ2n1\frac{\sigma^2}{n-1} multiplied by a random variable with a χ2(n1)\chi^2(n-1) distribution.

Theorem (Distribution of Sample Variance)

Suppose that s2s^2 is the sample variance of nn independent observations from a normal distribution with mean μ\mu and standard deviation σ\sigma. Then (n1)s2σ2\frac{(n-1)s^2}{\sigma^2} has a χ2\chi^2-distribution with n1n-1 degrees of freedom.

  1. Show that the expected value of s2s^2 is σ2\sigma^2.