Math 342 - Week 11 Notes

Mon, Apr 4

Today we introduced least squares regression.


Wed, Apr 6

Today we spent more time on least squares and looked at how you can apply the method to non-linear models. We did this lab in class:


Fri, Apr 8

Today we introduced the idea of continuous least squares regression. Unlike the discrete case where we only want the smallest sum of squared residuals at a finite number of points, here the goal is to find a polynomial \(p(x)\) that minimizes the integral of the squared differences: \[\int_a^b (p(x)-f(x))^2 \, dx\]

Our book derives the normal equations for contiuous least squares using partial derivatives. I used a different approach in class by giving a brief introduction to functional analysis (which is like linear algebra when the vectors are functions). The notes below explain the functional analytic approach.

Definition

An inner-product space is a vector space \(V\) with an inner-product \(\langle x,y \rangle\) which is a real-valued function such that for all \(x,y, z \in V\) and \(c \in \mathbb{R}\)

  1. \(\langle {x,y} \rangle = \langle {y,x} \rangle\),
  2. \(\langle {x,x} \rangle \ge 0\) and \(\langle {x,x} \rangle = 0\) if and only if \(x = 0\),
  3. \(\langle {cx,y} \rangle = c \langle {x,y} \rangle\),
  4. \(\langle {x+y,z} \rangle = \langle {x,z} \rangle + \langle {y,z} \rangle\).

The norm of a vector in an inner-product space is \(\|x\| = \langle {x,x} \rangle^{1/2}\).

Examples of inner-product spaces include

Definition

Let \(V, W\) be inner-product spaces. If \(T:V \rightarrow W\) is a linear transformation, then the adjoint of \(T\) is a linear transformation \(T^*: W \rightarrow V\) defined by \(\langle {y,Tx} \rangle = \langle {T^*y,x} \rangle\) for every \(x \in V\) and \(y \in W\).

With the adjoint, we have the following theorem:

Theorem

Let \(V, W\) be inner-product spaces. If \(T:V \rightarrow W\) is a linear transformation then \[\operatorname{range}(T)^\perp = \operatorname{null}(T^*).\]

We want to find the best fit polynomial of degree n.  \[p(x) = b_0 + b_1 x + \ldots + b_n x^n.\]
This polynomial will be in \(L^2[a,b]\) and it is a linear function of the coefficients \(b \in \mathbb{R}^{n+1}\). So there is a linear transformation \(X: \mathbb{R}^{n+1} \rightarrow L^2[a,b]\) such that \(p(x) = Xb\). We want to find the \(p(x)\) closest to \(f(x)\) in \(L^2[a,b]\). In other words, we want to minimize \[\|p(x)-f(x)\|.\] That happens when \(p(x) - f(x)\) is orthogonal to the range of \(X\) which happens when \[X^*(p(x) - f(x)) = 0.\] By replacing \(p(x)\) with \(Xb\), we get the normal equation for continuous least squares regression: \[X^*X b = X^* f.\]

Since \(X: \mathbb{R}^{n+1} \rightarrow L^2[a,b]\) and \(X^*: L^2[a,b] \rightarrow \mathbb{R}^{n+1}\), the composition \(X^*X\) is a linear transformation from \(\mathbb{R}^{n+1}\) to \(\mathbb{R}^{n+1}\). So it is an \((n+1)\)-by-\((n+1)\) matrix and its \((i,j)\)-entry is \[\langle {e_i,X^*Xe_j} \rangle = \langle {Xe_i, X e_j} \rangle = \int_a^b x^{i+j} \, dx\] where \(e_0, \ldots, e_n\) are the standard basis vectors for \(\mathbb{R}^{n+1}\). The entries of \(X^*f\) are \[\langle {e_i, X^*f} \rangle = \langle {Xe_i, f} \rangle = \int_a^b x^i f(x) \, dx.\]

To summarize, the normal equation to find the coefficients \(b\) of the continuous least squares polynomial is

Theorem (Continuous Least Squares)

For \(f \in L^2[a,b]\), the coefficients of the continuous least squares polynomial approximation \(p(x) = b_0 + b_1 x + \ldots + b_n x^n\)can be found by solving the normal equation \[X^*X b = X^* f\] where \(X^*X\) is an \((n+1)\)-by-\((n+1)\) matrix with entries \[(X^*X)_{ij} = \int_a^b x^{i+j} \, dx,\] and \(X^*f\) is a vector in \(\mathbb{R}^{n+1}\) with entries \[(X^*f)_i = \int_a^b x^i f(x) \, dx.\]

  1. Use continuous least squares regression to find the best fit 2nd degree polynomial approximation of \(f(x) = e^x\) on \([0,2]\).

We used this Python code to solve the problem and graph the solution:

import numpy as np
from scipy.integrate import quad
import matplotlib.pyplot as plt

n=2
f = np.exp
left, right = 0, 2
XstarX = np.array([[quad(lambda x: x**(i+j),left,right)[0] for j in range(n+1)] for i in range(n+1)])
Xstarf = np.array([quad(lambda x: x**i*f(x),left,right)[0] for i in range(n+1)])

b = np.linalg.solve(XstarX,Xstarf)
p = lambda x: sum([b[i]*x**i for i in range(n+1)])
print(b)

xvals = np.linspace(left,right,1000)
plt.plot(xvals,f(xvals))
plt.plot(xvals,p(xvals))