Week 6 Lecture Notes

This week we’ll continue to look at least squares regression using linear algebra.

Monday, February 17

Last week, we introduced the four fundamental subspaces of a matrix. We ended with the following exercise that we didn’t have time to finish:

(Exercise) Find the range and null space for the 3-by-3 matrix: \[e e^T = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix}?\]
(Exercise) Why is every column of \(A\) in \(\operatorname{Range}(A)\)?
(Exercise) Prove that \(\operatorname{Null}(A) \perp \operatorname{Range}(A^T)\).

Definition (Complementary Subspaces)

Two subspaces \(U, V \subseteq \mathbb{R}^n\) are called complementary if \(U \cap V = \{ 0 \}\) and \(U \cup V\) spans \(\mathbb{R}^n\). An equivalent condition is that \(\dim U + \dim V = n\) while \(\dim (U \cap V) = 0\).

Definition (Orthogonal Complement)

The orthogonal complement of a set \(V \subseteq \mathbb{R}^n\) is the subspace \[V^{\perp} = \{ w \in \mathbb{R}^n : w^T v = 0 \text{ for all } v \in V \}.\]

Note that two subspaces \(U, V \subseteq \mathbb{R}^n\) are orthogonal complements if and only if they are orthogonal and complementary. In particular, if \(U = V^{\perp}\), then \(V = U^{\perp}\) and \((V^{\perp})^{\perp} = V\) as long as \(V\) is a subspace.

Theorem (The Fundamental Theorem of Linear Algebra)

For any matrix \(A \in \mathbb{R}^{m \times n}\),

\(\operatorname{Null}(A) = \operatorname{Range}(A^T)^\perp\).
\(\operatorname{Null}(A^T) = \operatorname{Range}(A)^\perp\).

In other words, the row space and null space of \(A\) are orthogonal complements in the domain of \(A\) which is \(\mathbb{R} ^n\) and the column space and left null space of \(A\) are orthogonal complements of \(A\) in the codomain of \(A\) which is \(\mathbb{R}^m\). Furthermore, the dimensions of \(\operatorname{Range}(A)\) and \(\operatorname{Range}(A^T)\) are both the same and both are the rank of \(A\).

Applications

Application 1. In linear regression, we needed to find the vector \(\hat{y} \in \operatorname{Range}(X)\) that is closest to \(y\). Geometrically, this is the same as requiring \(\hat{y} - y\) to be orthogonal to \(\operatorname{Range}(X)\). What is the set of vectors orthogonal to \(\operatorname{Range}(X)\)?

Use the Fundamental Theorem of Linear Algebra to show that \(\hat{y} - y\) is orthogonal to \(\operatorname{Range}(X)\) if and only if \(X^T ( \hat{y} - y ) = 0\).

If \(\hat{y} = Xb\), then \(X^T( \hat{y} - y) = 0\) can be rewritten \(X^TXb = X^T y\) which is the normal equation for least squares regression.

Application 2. To solve the normal equations for \(b\), it helps if \(X^T X\) is an invertible matrix. It turns out that \(X^T X\) is invertible if and only if the columns of \(X\) are linearly independent.

For any \(X \in \mathbb{R}^{m \times n}\), show that \(\operatorname{Null}(X) = \operatorname{Null}(X^TX)\). Hint: To prove that \(\operatorname{Null}(X^T X) \subseteq \operatorname{Null}(X^T X)\), it helps to prove that if \(b \in \operatorname{Null}(X^T X)\), then \(\|Xb \| = 0\) which means that \(Xb = 0\).

We ran out of time before we could answer these last two questions, but they are good linear algebra review questions:

Why are the columns of \(X\) linearly independent if and only if \(\operatorname{Null}(X) = \{ 0 \}\)?
Why is \(X^T X\) invertible if and only if \(\operatorname{Null}(X^T X) = \{ 0 \}\)? Hint: When is \(X^T X\) onto? When is it 1-to-1?