Week 5 Lecture Notes

This week we’ll continue our review linear algebra and apply it to least squares regression.

Monday, February 10

Definition (Orthogonality)

Two vectors \(x,y \in \mathbb{R}^n\) are orthogonal if \(x^Ty = 0\). This can be denoted \(x \perp y\). Two sets of vectors \(V, W \subset \mathbb{R}^n\) are orthogonal if \(v^T w = 0\) for all \(v \in V\) and \(w \in W\).

Theorem (Properties of Transposes)

Let \(A\) and \(B\) be two matrices. Then

\((A+B)^T = A^T+B^T\),
\((AB)^T = B^T A^T\).

From Exercise 1, we know that \(x - \bar{x} = Ax\) where \[A = \begin{bmatrix} \frac{2}{3} & -\frac{1}{3} & -\frac{1}{3} \\ -\frac{1}{3} & \frac{2}{3} & -\frac{1}{3} \\ -\frac{1}{3} & -\frac{1}{3} & \frac{2}{3} \end{bmatrix}.\] We can use this to give a different solution to Exercise 2. We want to show that \(x-\bar{x}\) is orthogonal to \(e\). By substitution, that is the same as showing that \((Ax)^T e = 0\). This is the same as:

\[\begin{align*} (Ax)^T e &= x^T A^T e \\ &= x^T A e ~~~~~~ \text{(since }A \text{ is symmetric)} \\ &= x^T 0 ~~~~~~ \text{(since }Ae = 0) \\ &= 0. \end{align*}\]

We were able to figure out the pattern pretty easily… but I suggested a totally different approach to solve the problem using matrix algebra. First, let \(e \in \mathbb{R}^n\) denote the vector with all 1 entries, and let \(I\) denote the identity matrix in \(\mathbb{R}^{n \times n}\).

\[\begin{align*} x-\bar{x} &= x - e \bar{x} ~~~~~~ \text{(because that's what we really mean when we write } x-\bar{x}) \\ &= x- e (\tfrac{1}{n}e^T x) ~~~~~~ \text{(since }\bar{x} = \tfrac{1}{n} e^Tx) \\ &= x - \tfrac{1}{n} ee^T x ~~~~~~ \text{(pulling out the constant and using associativity}) \\ &= \left( I - \tfrac{1}{n} ee^T \right) x ~~~~~~ \text{(factoring out }x). \\ \end{align*}\]

So we get a nice general formula for the matrix \(A\) in any dimension: \(A = I - \frac{1}{n} ee^T\).

Wednesday, February 12

Today we introduced least squares regression with a (relatively) simple in-class example. Here is the link:

In the example, we found the slope and y-intercept of a linear regression line by minimizing the sum of squared residuals. We did this two ways: first we used calculus, and then we used linear algebra. The linear algebra approach is much easier computationally!

To find the y-intercept \(a\) and slope \(b\) of the least squares regression line for the points \((x_1, y_1)\), \((x_2, y_2), \ldots, (x_n, y_n)\), you solve the normal equation: \[X^T X \begin{bmatrix} a \\ b \end{bmatrix} = X^T y\] where \[X = \begin{bmatrix} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{bmatrix} ~~~~ \text{ and } ~~~~ y = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}.\]
As long as the columns of \(X\) are linearly independent, then \(X^T X\) is an invertible matrix and the solution of the normal equation is: \[\begin{bmatrix} a \\ b \end{bmatrix} = (X^T X)^{-1} X^T y.\]

Friday, February 14

Today we dug a little deeper into the linear algebra behind least squares regression. We started with the following (hard) problem.

Problem: Find coefficients \(b_0\), \(b_1\), and \(b_2\) that give the best fit parabola \(\hat{y}=b_0 + b_1 x + b_2 x^2\) for the four points \((-2,3)\), \((-1,0)\), \((1,-1)\), and \((3,2)\).

Solution: You can use the matrix approach to least squares regression. Let \[X = \begin{bmatrix} 1 & x_1 & x_1^2 \\ 1 & x_2 & x_2^2 \\1 & x_3 & x_3^2 \\1 & x_4 & x_4^2 \\ \end{bmatrix} = \begin{bmatrix} 1 & -2 & 4 \\ 1 & -1 & 1 \\1 & 1 & 1 \\1 & 3 & 9 \\ \end{bmatrix} ~~~~ \text{ and } y = \begin{bmatrix} 3 \\ 0 \\ -1 \\ 2 \end{bmatrix}.\] Then we just need to solve \(Xb = y\) to find the coefficients \(b_0\), \(b_1\), and \(b_2\). Unfortunately, this system of equations has no solution. So we look for a least squares solution, that is we try to find a vector \(b\) such that \(X b = \hat{y}\) where \(\hat{y}\) is the closest vector in the column space of \(X\) to \(y\). To do this, \(\hat{y} - y\) must be orthogonal to the column space of \(X\) and that leads to the normal equations:

Definition (The Normal Equation)

If \(X \in \mathbb{R}^{m \times n}\) and \(y \in \mathbb{R}^m\), then the least squares solution of \(Xb = \hat{y}\) can be found by solving \[X^T X b = X^T y.\] Furthermore, if the columns of \(X\) are linearly independent, then \(X^T X\) is invertible and \[b = (X^T X)^{-1} X^T y.\]

Notice that we are using a vector \(b = \begin{bmatrix} b_1 \\ \vdots \\ b_n \end{bmatrix}\) instead of the vector \(\begin{bmatrix} a \\ b \end{bmatrix}\) the we used last time. This notation is more convenient especially for fitting equations with lots of coefficients.

We used Octave/Matlab to solve the least squares problem above by using the following Octave/Matlab commands:

Notice that X' is \(X^T\) in Octave/Matlab and inv(X'*X) is \((X^T X)^{-1}\).

To understand this a little better, it helps to know about the four fundamental subspaces of a matrix and about the Fundamental Theorem of Linear Algebra.

Definition (The Four Fundamental Subspaces of a Matrix)

Every matrix \(A \in \mathbb{R}^{m \times n}\) has four fundamental subspaces:

1. Column Space (aka Range):	\(\operatorname{Range}(A) = \{Ax : x \in \mathbb{R}^n \}\)
2. Row Space:	\(\operatorname{Range}(A^T) = \{A^Ty : y \in \mathbb{R}^m \}\)
3. Null Space:	\(\operatorname{Null}(A) = \{x \in \mathbb{R}^n : Ax = 0 \}\)
4. Left Null Space:	\(\operatorname{Null}(A^T) = \{y \in \mathbb{R}^m : y^T A = 0 \}\)

The row space and null space are subspaces of the domain \(\mathbb{R}^n\) and the range and left null space are subspace of the codomain \(\mathbb{R}^m\).

Important Concept: A matrix \(A \in \mathbb{R}^{m \times n}\) should be understood two ways:

We ran out of time after this example, so next week we’ll see how the four fundamental subspaces are related by the Fundamental Theorem of Linear Algebra.