Week 5 Lecture Notes

This week we’ll continue our review linear algebra and apply it to least squares regression.

Monday, February 10

  1. Let x=[x1x2x3]3x = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} \in \mathbb{R}^3. Express xxx-\bar{x} as a matrix times the vector xx.

Definition (Orthogonality)

Two vectors x,ynx,y \in \mathbb{R}^n are orthogonal if xTy=0x^Ty = 0. This can be denoted xyx \perp y. Two sets of vectors V,WnV, W \subset \mathbb{R}^n are orthogonal if vTw=0v^T w = 0 for all vVv \in V and wWw \in W.

  1. Show that for any x3x \in \mathbb{R}^3, xxx-\bar{x} is orthogonal to the vector e=[111]e = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}.

Theorem (Properties of Transposes)

Let AA and BB be two matrices. Then

  1. (A+B)T=AT+BT(A+B)^T = A^T+B^T,
  2. (AB)T=BTAT(AB)^T = B^T A^T.
  1. Show that for matrices A,B,A, B, and CC, (ABC)T=CTBTAT(ABC)^T = C^T B^T A^T.

From Exercise 1, we know that xx=Axx - \bar{x} = Ax where A=[231313132313131323].A = \begin{bmatrix} \frac{2}{3} & -\frac{1}{3} & -\frac{1}{3} \\ -\frac{1}{3} & \frac{2}{3} & -\frac{1}{3} \\ -\frac{1}{3} & -\frac{1}{3} & \frac{2}{3} \end{bmatrix}. We can use this to give a different solution to Exercise 2. We want to show that xxx-\bar{x} is orthogonal to ee. By substitution, that is the same as showing that (Ax)Te=0(Ax)^T e = 0. This is the same as:

(Ax)Te=xTATe=xTAe(since A is symmetric)=xT0(since Ae=0)=0.\begin{align*} (Ax)^T e &= x^T A^T e \\ &= x^T A e ~~~~~~ \text{(since }A \text{ is symmetric)} \\ &= x^T 0 ~~~~~~ \text{(since }Ae = 0) \\ &= 0. \end{align*}

  1. For x,y3x, y \in \mathbb{R}^3, re-write (xx)T(yy)(x-\bar{x})^T (y- \bar{y}) using the matrix AA. Simplify as much as you can.

We finished class today by asking the question:

  1. What if xnx \in \mathbb{R}^n for n3n \ne 3? What matrix would you multiply xx by to get xxx- \bar{x}.

We were able to figure out the pattern pretty easily… but I suggested a totally different approach to solve the problem using matrix algebra. First, let ene \in \mathbb{R}^n denote the vector with all 1 entries, and let II denote the identity matrix in n×n\mathbb{R}^{n \times n}.

xx=xex(because that’s what we really mean when we write xx)=xe(1neTx)(since x=1neTx)=x1neeTx(pulling out the constant and using associativity)=(I1neeT)x(factoring out x).\begin{align*} x-\bar{x} &= x - e \bar{x} ~~~~~~ \text{(because that's what we really mean when we write } x-\bar{x}) \\ &= x- e (\tfrac{1}{n}e^T x) ~~~~~~ \text{(since }\bar{x} = \tfrac{1}{n} e^Tx) \\ &= x - \tfrac{1}{n} ee^T x ~~~~~~ \text{(pulling out the constant and using associativity}) \\ &= \left( I - \tfrac{1}{n} ee^T \right) x ~~~~~~ \text{(factoring out }x). \\ \end{align*}

So we get a nice general formula for the matrix AA in any dimension: A=I1neeTA = I - \frac{1}{n} ee^T.

Wednesday, February 12

Today we introduced least squares regression with a (relatively) simple in-class example. Here is the link:

In the example, we found the slope and y-intercept of a linear regression line by minimizing the sum of squared residuals. We did this two ways: first we used calculus, and then we used linear algebra. The linear algebra approach is much easier computationally!

To find the y-intercept aa and slope bb of the least squares regression line for the points (x1,y1)(x_1, y_1), (x2,y2),,(xn,yn)(x_2, y_2), \ldots, (x_n, y_n), you solve the normal equation: XTX[ab]=XTyX^T X \begin{bmatrix} a \\ b \end{bmatrix} = X^T y where X=[1x11x21xn] and y=[y1y2yn].X = \begin{bmatrix} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{bmatrix} ~~~~ \text{ and } ~~~~ y = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}.
As long as the columns of XX are linearly independent, then XTXX^T X is an invertible matrix and the solution of the normal equation is: [ab]=(XTX)1XTy.\begin{bmatrix} a \\ b \end{bmatrix} = (X^T X)^{-1} X^T y.

Friday, February 14

Today we dug a little deeper into the linear algebra behind least squares regression. We started with the following (hard) problem.

Problem: Find coefficients b0b_0, b1b_1, and b2b_2 that give the best fit parabola ŷ=b0+b1x+b2x2\hat{y}=b_0 + b_1 x + b_2 x^2 for the four points (2,3)(-2,3), (1,0)(-1,0), (1,1)(1,-1), and (3,2)(3,2).

Solution: You can use the matrix approach to least squares regression. Let X=[1x1x121x2x221x3x321x4x42]=[124111111139] and y=[3012].X = \begin{bmatrix} 1 & x_1 & x_1^2 \\ 1 & x_2 & x_2^2 \\1 & x_3 & x_3^2 \\1 & x_4 & x_4^2 \\ \end{bmatrix} = \begin{bmatrix} 1 & -2 & 4 \\ 1 & -1 & 1 \\1 & 1 & 1 \\1 & 3 & 9 \\ \end{bmatrix} ~~~~ \text{ and } y = \begin{bmatrix} 3 \\ 0 \\ -1 \\ 2 \end{bmatrix}. Then we just need to solve Xb=yXb = y to find the coefficients b0b_0, b1b_1, and b2b_2. Unfortunately, this system of equations has no solution. So we look for a least squares solution, that is we try to find a vector bb such that Xb=ŷX b = \hat{y} where ŷ\hat{y} is the closest vector in the column space of XX to yy. To do this, ŷy\hat{y} - y must be orthogonal to the column space of XX and that leads to the normal equations:

Definition (The Normal Equation)

If Xm×nX \in \mathbb{R}^{m \times n} and ymy \in \mathbb{R}^m, then the least squares solution of Xb=ŷXb = \hat{y} can be found by solving XTXb=XTy.X^T X b = X^T y. Furthermore, if the columns of XX are linearly independent, then XTXX^T X is invertible and b=(XTX)1XTy.b = (X^T X)^{-1} X^T y.

Notice that we are using a vector b=[b1bn]b = \begin{bmatrix} b_1 \\ \vdots \\ b_n \end{bmatrix} instead of the vector [ab]\begin{bmatrix} a \\ b \end{bmatrix} the we used last time. This notation is more convenient especially for fitting equations with lots of coefficients.

We used Octave/Matlab to solve the least squares problem above by using the following Octave/Matlab commands:

{matlab} X = [1 -2 4; 1 -1 1; 1 1 1; 1 3 9] y = [3; 0; -1; 2] b = inv(X'*X)*X'*y haty = X*b

Notice that X' is XTX^T in Octave/Matlab and inv(X'*X) is (XTX)1(X^T X)^{-1}.

Link to try this code in a SageCell online

  1. (Exercise) How would you adjust the matrix XX to find a best fit cubic polynomial of the form y=b0+b1x+b2x2+b3x3y = b_0 + b_1 x + b_2 x^2 + b_3 x^3. Try using Octave/Matlab to find the coefficients.

  2. (Exercise) Graph the solution and notice that it hits every dot perfect. What about the matrix XX changed when you added another column that made this possible?

To understand this a little better, it helps to know about the four fundamental subspaces of a matrix and about the Fundamental Theorem of Linear Algebra.

Definition (The Four Fundamental Subspaces of a Matrix)

Every matrix Am×nA \in \mathbb{R}^{m \times n} has four fundamental subspaces:

1. Column Space (aka Range): Range(A)={Ax:xn}\operatorname{Range}(A) = \{Ax : x \in \mathbb{R}^n \}
2. Row Space: Range(AT)={ATy:ym}\operatorname{Range}(A^T) = \{A^Ty : y \in \mathbb{R}^m \}
3. Null Space: Null(A)={xn:Ax=0}\operatorname{Null}(A) = \{x \in \mathbb{R}^n : Ax = 0 \}
4. Left Null Space: Null(AT)={ym:yTA=0}\operatorname{Null}(A^T) = \{y \in \mathbb{R}^m : y^T A = 0 \}

The row space and null space are subspaces of the domain n\mathbb{R}^n and the range and left null space are subspace of the codomain m\mathbb{R}^m.

Important Concept: A matrix Am×nA \in \mathbb{R}^{m \times n} should be understood two ways:

  1. AA is a rectangular array of numbers with mm rows and nn columns. (You already know this!)
  2. AA is a function from nm\mathbb{R}^n \rightarrow \mathbb{R}^m that is a linear transformation.

That’s why it makes sense to call the column space the range of AA.

  1. (Exercise) Find the range and null space for the 3-by-3 identity matrix. What about the 3-by-3 matrix eeT=[111111111]?e e^T = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix}?

We ran out of time after this example, so next week we’ll see how the four fundamental subspaces are related by the Fundamental Theorem of Linear Algebra.