Week 5 Lecture Notes

This week we’ll continue our review linear algebra and apply it to least squares regression.

Monday, February 10

Let $x = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} \in \mathbb{R}^3$ . Express $x-\bar{x}$ as a matrix times the vector $x$ .

Definition (Orthogonality)

Two vectors $x,y \in \mathbb{R}^n$ are orthogonal if $x^Ty = 0$ . This can be denoted $x \perp y$ . Two sets of vectors $V, W \subset \mathbb{R}^n$ are orthogonal if $v^T w = 0$ for all $v \in V$ and $w \in W$ .

Show that for any $x \in \mathbb{R}^3$ , $x-\bar{x}$ is orthogonal to the vector $e = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}$ .

Theorem (Properties of Transposes)

Let $A$ and $B$ be two matrices. Then

$(A+B)^T = A^T+B^T$ ,
$(AB)^T = B^T A^T$ .

Show that for matrices $A, B,$ and $C$ , $(ABC)^T = C^T B^T A^T$ .

From Exercise 1, we know that $x - \bar{x} = Ax$ where $A = \begin{bmatrix} \frac{2}{3} & -\frac{1}{3} & -\frac{1}{3} \\ -\frac{1}{3} & \frac{2}{3} & -\frac{1}{3} \\ -\frac{1}{3} & -\frac{1}{3} & \frac{2}{3} \end{bmatrix}.$ We can use this to give a different solution to Exercise 2. We want to show that $x-\bar{x}$ is orthogonal to $e$ . By substitution, that is the same as showing that $(Ax)^T e = 0$ . This is the same as:

$\begin{align*} (Ax)^T e &= x^T A^T e \\ &= x^T A e ~~~~~~ \text{(since }A \text{ is symmetric)} \\ &= x^T 0 ~~~~~~ \text{(since }Ae = 0) \\ &= 0. \end{align*}$

For $x, y \in \mathbb{R}^3$ , re-write $(x-\bar{x})^T (y- \bar{y})$ using the matrix $A$ . Simplify as much as you can.

We finished class today by asking the question:

What if $x \in \mathbb{R}^n$ for $n \ne 3$ ? What matrix would you multiply $x$ by to get $x- \bar{x}$ .

We were able to figure out the pattern pretty easily… but I suggested a totally different approach to solve the problem using matrix algebra. First, let $e \in \mathbb{R}^n$ denote the vector with all 1 entries, and let $I$ denote the identity matrix in $\mathbb{R}^{n \times n}$ .

$\begin{align*} x-\bar{x} &= x - e \bar{x} ~~~~~~ \text{(because that's what we really mean when we write } x-\bar{x}) \\ &= x- e (\tfrac{1}{n}e^T x) ~~~~~~ \text{(since }\bar{x} = \tfrac{1}{n} e^Tx) \\ &= x - \tfrac{1}{n} ee^T x ~~~~~~ \text{(pulling out the constant and using associativity}) \\ &= \left( I - \tfrac{1}{n} ee^T \right) x ~~~~~~ \text{(factoring out }x). \\ \end{align*}$

So we get a nice general formula for the matrix $A$ in any dimension: $A = I - \frac{1}{n} ee^T$ .

Wednesday, February 12

Today we introduced least squares regression with a (relatively) simple in-class example. Here is the link:

Least Squares Example

In the example, we found the slope and y-intercept of a linear regression line by minimizing the sum of squared residuals. We did this two ways: first we used calculus, and then we used linear algebra. The linear algebra approach is much easier computationally!

To find the y-intercept $a$ and slope $b$ of the least squares regression line for the points $(x_1, y_1)$ , $(x_2, y_2), \ldots, (x_n, y_n)$ , you solve the normal equation: $X^T X \begin{bmatrix} a \\ b \end{bmatrix} = X^T y$ where $X = \begin{bmatrix} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{bmatrix} ~~~~ \text{ and } ~~~~ y = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}.$
As long as the columns of $X$ are linearly independent, then $X^T X$ is an invertible matrix and the solution of the normal equation is: $\begin{bmatrix} a \\ b \end{bmatrix} = (X^T X)^{-1} X^T y.$

Friday, February 14

Today we dug a little deeper into the linear algebra behind least squares regression. We started with the following (hard) problem.

Problem: Find coefficients $b_0$ , $b_1$ , and $b_2$ that give the best fit parabola $\hat{y}=b_0 + b_1 x + b_2 x^2$ for the four points $(-2,3)$ , $(-1,0)$ , $(1,-1)$ , and $(3,2)$ .

Solution: You can use the matrix approach to least squares regression. Let $X = \begin{bmatrix} 1 & x_1 & x_1^2 \\ 1 & x_2 & x_2^2 \\1 & x_3 & x_3^2 \\1 & x_4 & x_4^2 \\ \end{bmatrix} = \begin{bmatrix} 1 & -2 & 4 \\ 1 & -1 & 1 \\1 & 1 & 1 \\1 & 3 & 9 \\ \end{bmatrix} ~~~~ \text{ and } y = \begin{bmatrix} 3 \\ 0 \\ -1 \\ 2 \end{bmatrix}.$ Then we just need to solve $Xb = y$ to find the coefficients $b_0$ , $b_1$ , and $b_2$ . Unfortunately, this system of equations has no solution. So we look for a least squares solution, that is we try to find a vector $b$ such that $X b = \hat{y}$ where $\hat{y}$ is the closest vector in the column space of $X$ to $y$ . To do this, $\hat{y} - y$ must be orthogonal to the column space of $X$ and that leads to the normal equations:

Definition (The Normal Equation)

If $X \in \mathbb{R}^{m \times n}$ and $y \in \mathbb{R}^m$ , then the least squares solution of $Xb = \hat{y}$ can be found by solving $X^T X b = X^T y.$ Furthermore, if the columns of $X$ are linearly independent, then $X^T X$ is invertible and $b = (X^T X)^{-1} X^T y.$

Notice that we are using a vector $b = \begin{bmatrix} b_1 \\ \vdots \\ b_n \end{bmatrix}$ instead of the vector $\begin{bmatrix} a \\ b \end{bmatrix}$ the we used last time. This notation is more convenient especially for fitting equations with lots of coefficients.

We used Octave/Matlab to solve the least squares problem above by using the following Octave/Matlab commands:

{matlab} X = [1 -2 4; 1 -1 1; 1 1 1; 1 3 9] y = [3; 0; -1; 2] b = inv(X'*X)*X'*y haty = X*b

Notice that X' is $X^T$ in Octave/Matlab and inv(X'*X) is $(X^T X)^{-1}$ .

Link to try this code in a SageCell online

(Exercise) How would you adjust the matrix $X$ to find a best fit cubic polynomial of the form $y = b_0 + b_1 x + b_2 x^2 + b_3 x^3$ . Try using Octave/Matlab to find the coefficients.
(Exercise) Graph the solution and notice that it hits every dot perfect. What about the matrix $X$ changed when you added another column that made this possible?

To understand this a little better, it helps to know about the four fundamental subspaces of a matrix and about the Fundamental Theorem of Linear Algebra.

Definition (The Four Fundamental Subspaces of a Matrix)

Every matrix $A \in \mathbb{R}^{m \times n}$ has four fundamental subspaces:

1. Column Space (aka Range):	$\operatorname{Range}(A) = \{Ax : x \in \mathbb{R}^n \}$
2. Row Space:	$\operatorname{Range}(A^T) = \{A^Ty : y \in \mathbb{R}^m \}$
3. Null Space:	$\operatorname{Null}(A) = \{x \in \mathbb{R}^n : Ax = 0 \}$
4. Left Null Space:	$\operatorname{Null}(A^T) = \{y \in \mathbb{R}^m : y^T A = 0 \}$

The row space and null space are subspaces of the domain $\mathbb{R}^n$ and the range and left null space are subspace of the codomain $\mathbb{R}^m$ .

Important Concept: A matrix $A \in \mathbb{R}^{m \times n}$ should be understood two ways:

$A$ is a rectangular array of numbers with $m$ rows and $n$ columns. (You already know this!)
$A$ is a function from $\mathbb{R}^n \rightarrow \mathbb{R}^m$ that is a linear transformation.

That’s why it makes sense to call the column space the range of $A$ .

(Exercise) Find the range and null space for the 3-by-3 identity matrix. What about the 3-by-3 matrix $e e^T = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix}?$

We ran out of time after this example, so next week we’ll see how the four fundamental subspaces are related by the Fundamental Theorem of Linear Algebra.