Week 5 Lecture Notes

This week we’ll continue our review linear algebra and apply it to least squares regression.

Monday, February 10

  1. Let \(x = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} \in \mathbb{R}^3\). Express \(x-\bar{x}\) as a matrix times the vector \(x\).

Definition (Orthogonality)

Two vectors \(x,y \in \mathbb{R}^n\) are orthogonal if \(x^Ty = 0\). This can be denoted \(x \perp y\). Two sets of vectors \(V, W \subset \mathbb{R}^n\) are orthogonal if \(v^T w = 0\) for all \(v \in V\) and \(w \in W\).

  1. Show that for any \(x \in \mathbb{R}^3\), \(x-\bar{x}\) is orthogonal to the vector \(e = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}\).

Theorem (Properties of Transposes)

Let \(A\) and \(B\) be two matrices. Then

  1. \((A+B)^T = A^T+B^T\),
  2. \((AB)^T = B^T A^T\).
  1. Show that for matrices \(A, B,\) and \(C\), \((ABC)^T = C^T B^T A^T\).

From Exercise 1, we know that \(x - \bar{x} = Ax\) where \[A = \begin{bmatrix} \frac{2}{3} & -\frac{1}{3} & -\frac{1}{3} \\ -\frac{1}{3} & \frac{2}{3} & -\frac{1}{3} \\ -\frac{1}{3} & -\frac{1}{3} & \frac{2}{3} \end{bmatrix}.\] We can use this to give a different solution to Exercise 2. We want to show that \(x-\bar{x}\) is orthogonal to \(e\). By substitution, that is the same as showing that \((Ax)^T e = 0\). This is the same as:

\[\begin{align*} (Ax)^T e &= x^T A^T e \\ &= x^T A e ~~~~~~ \text{(since }A \text{ is symmetric)} \\ &= x^T 0 ~~~~~~ \text{(since }Ae = 0) \\ &= 0. \end{align*}\]

  1. For \(x, y \in \mathbb{R}^3\), re-write \((x-\bar{x})^T (y- \bar{y})\) using the matrix \(A\). Simplify as much as you can.

We finished class today by asking the question:

  1. What if \(x \in \mathbb{R}^n\) for \(n \ne 3\)? What matrix would you multiply \(x\) by to get \(x- \bar{x}\).

We were able to figure out the pattern pretty easily… but I suggested a totally different approach to solve the problem using matrix algebra. First, let \(e \in \mathbb{R}^n\) denote the vector with all 1 entries, and let \(I\) denote the identity matrix in \(\mathbb{R}^{n \times n}\).

\[\begin{align*} x-\bar{x} &= x - e \bar{x} ~~~~~~ \text{(because that's what we really mean when we write } x-\bar{x}) \\ &= x- e (\tfrac{1}{n}e^T x) ~~~~~~ \text{(since }\bar{x} = \tfrac{1}{n} e^Tx) \\ &= x - \tfrac{1}{n} ee^T x ~~~~~~ \text{(pulling out the constant and using associativity}) \\ &= \left( I - \tfrac{1}{n} ee^T \right) x ~~~~~~ \text{(factoring out }x). \\ \end{align*}\]

So we get a nice general formula for the matrix \(A\) in any dimension: \(A = I - \frac{1}{n} ee^T\).

Wednesday, February 12

Today we introduced least squares regression with a (relatively) simple in-class example. Here is the link:

In the example, we found the slope and y-intercept of a linear regression line by minimizing the sum of squared residuals. We did this two ways: first we used calculus, and then we used linear algebra. The linear algebra approach is much easier computationally!

To find the y-intercept \(a\) and slope \(b\) of the least squares regression line for the points \((x_1, y_1)\), \((x_2, y_2), \ldots, (x_n, y_n)\), you solve the normal equation: \[X^T X \begin{bmatrix} a \\ b \end{bmatrix} = X^T y\] where \[X = \begin{bmatrix} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{bmatrix} ~~~~ \text{ and } ~~~~ y = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}.\]
As long as the columns of \(X\) are linearly independent, then \(X^T X\) is an invertible matrix and the solution of the normal equation is: \[\begin{bmatrix} a \\ b \end{bmatrix} = (X^T X)^{-1} X^T y.\]

Friday, February 14

Today we dug a little deeper into the linear algebra behind least squares regression. We started with the following (hard) problem.

Problem: Find coefficients \(b_0\), \(b_1\), and \(b_2\) that give the best fit parabola \(\hat{y}=b_0 + b_1 x + b_2 x^2\) for the four points \((-2,3)\), \((-1,0)\), \((1,-1)\), and \((3,2)\).

Solution: You can use the matrix approach to least squares regression. Let \[X = \begin{bmatrix} 1 & x_1 & x_1^2 \\ 1 & x_2 & x_2^2 \\1 & x_3 & x_3^2 \\1 & x_4 & x_4^2 \\ \end{bmatrix} = \begin{bmatrix} 1 & -2 & 4 \\ 1 & -1 & 1 \\1 & 1 & 1 \\1 & 3 & 9 \\ \end{bmatrix} ~~~~ \text{ and } y = \begin{bmatrix} 3 \\ 0 \\ -1 \\ 2 \end{bmatrix}.\] Then we just need to solve \(Xb = y\) to find the coefficients \(b_0\), \(b_1\), and \(b_2\). Unfortunately, this system of equations has no solution. So we look for a least squares solution, that is we try to find a vector \(b\) such that \(X b = \hat{y}\) where \(\hat{y}\) is the closest vector in the column space of \(X\) to \(y\). To do this, \(\hat{y} - y\) must be orthogonal to the column space of \(X\) and that leads to the normal equations:

Definition (The Normal Equation)

If \(X \in \mathbb{R}^{m \times n}\) and \(y \in \mathbb{R}^m\), then the least squares solution of \(Xb = \hat{y}\) can be found by solving \[X^T X b = X^T y.\] Furthermore, if the columns of \(X\) are linearly independent, then \(X^T X\) is invertible and \[b = (X^T X)^{-1} X^T y.\]

Notice that we are using a vector \(b = \begin{bmatrix} b_1 \\ \vdots \\ b_n \end{bmatrix}\) instead of the vector \(\begin{bmatrix} a \\ b \end{bmatrix}\) the we used last time. This notation is more convenient especially for fitting equations with lots of coefficients.

We used Octave/Matlab to solve the least squares problem above by using the following Octave/Matlab commands:

X = [1 -2 4; 1 -1 1; 1 1 1; 1 3 9]
y = [3; 0; -1; 2]
b = inv(X'*X)*X'*y
haty = X*b

Notice that X' is \(X^T\) in Octave/Matlab and inv(X'*X) is \((X^T X)^{-1}\).

Link to try this code in a SageCell online

  1. (Exercise) How would you adjust the matrix \(X\) to find a best fit cubic polynomial of the form \(y = b_0 + b_1 x + b_2 x^2 + b_3 x^3\). Try using Octave/Matlab to find the coefficients.

  2. (Exercise) Graph the solution and notice that it hits every dot perfect. What about the matrix \(X\) changed when you added another column that made this possible?

To understand this a little better, it helps to know about the four fundamental subspaces of a matrix and about the Fundamental Theorem of Linear Algebra.

Definition (The Four Fundamental Subspaces of a Matrix)

Every matrix \(A \in \mathbb{R}^{m \times n}\) has four fundamental subspaces:

1. Column Space (aka Range): \(\operatorname{Range}(A) = \{Ax : x \in \mathbb{R}^n \}\)
2. Row Space: \(\operatorname{Range}(A^T) = \{A^Ty : y \in \mathbb{R}^m \}\)
3. Null Space: \(\operatorname{Null}(A) = \{x \in \mathbb{R}^n : Ax = 0 \}\)
4. Left Null Space: \(\operatorname{Null}(A^T) = \{y \in \mathbb{R}^m : y^T A = 0 \}\)

The row space and null space are subspaces of the domain \(\mathbb{R}^n\) and the range and left null space are subspace of the codomain \(\mathbb{R}^m\).

Important Concept: A matrix \(A \in \mathbb{R}^{m \times n}\) should be understood two ways:

  1. \(A\) is a rectangular array of numbers with \(m\) rows and \(n\) columns. (You already know this!)
  2. \(A\) is a function from \(\mathbb{R}^n \rightarrow \mathbb{R}^m\) that is a linear transformation.

That’s why it makes sense to call the column space the range of \(A\).

  1. (Exercise) Find the range and null space for the 3-by-3 identity matrix. What about the 3-by-3 matrix \[e e^T = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix}?\]

We ran out of time after this example, so next week we’ll see how the four fundamental subspaces are related by the Fundamental Theorem of Linear Algebra.