Math 342 - Week 6 Notes

Mon, Feb 21

Today we talked about using partial pivoting to avoid ill-conditioned matrices when we use the LU-decomposition. We looked at these examples:

\(A = \begin{pmatrix} 0 & 1 & 2 \\ 1 & 1 & 1 \end{pmatrix}\)
\(A = \begin{pmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{pmatrix}\)

The process involves permuting rows as needed so that the pivot in each column is the largest remaining option (in absolute value) at each step. You keep track of the permutations by applying the same permutations to the rows of the identity matrix and calling the result \(P\). Then you get the LU-decomposition with partial pivoting \[PA = LU.\] The fixes two problems:

When there are zero entries where pivots should be, you can’t do a regular LU-decomposition.
When you do an LU-decomposition, the matrices L and U might be ill-conditioned, even if \(A\) isn’t. The method of partial pivots avoids that problem.

Norms of Vectors

A norm is a function \(\|\cdot\|\) from a vector space \(V\) to \([0,\infty)\) with the following properties:

\(\|x\| = 0\) if and only if \(x=0\).
\(\|c x \| = |c| \|x\|\) for all \(x \in V\) and \(c \in \mathbb{R}\).
\(\|x+y\| \le \|x\| + \|y\|\) for all \(x, y \in V\).

Intuitively a norm measures the length of a vector. But there are different norms and they measure length in different ways. The three most important norms on the vector space \(\mathbb{R}^n\) are:

The \(2\)-norm (also known as the Euclidean norm) is the most commonly used, and it is exactly the formula for the length of a vector using the Pythagorean theorem. \[\|x\|_2 = \sqrt{x_1^2 + x_2^2 + \ldots + x_n^2}.\]
The \(1\)-norm (also known as the Manhattan norm) is \[\|x\|_1 = |x_1|+|x_2|+\ldots+|x_n|.\] This is the distance you would get if you had to navigate a city where the streets are arranged in a rectangular grid and you can’t take diagonal paths.
The \(\infty\)-norm (also known as the Maximum norm) is \[\|x\|_\infty = \max \{ |x_1|, |x_2|, \ldots, |x_n| \}.\]

These are all special cases of \(p\)-norms which have the form \[\|x\|_p = \sqrt[p]{|x_1|^p + |x_2|^p + \ldots + |x_n|^p}.\]

Norms of Matrices

The set of all matrices in \(\mathbb{R}^{m \times n}\) is a vector space. So it makes sense to talk about the norm of a matrix. There are many ways to define norms for matrices, but the most important for us are induced norms. For a matrix \(A \in \mathbb{R}^{m \times n}\), the induced \(p\)-norm is \[\|A\|_p = \max \{\|Ax\|_p : x \in \mathbb{R}^n, \|x\|=1\}.\]
Two important special cases are

When \(p=2\), the induced norm \(\|A\|_2\) is the square root of the largest eigenvalue of \(A^T A\).
When \(p=\infty\), the induced norm \(\|A\|_\infty\) is the largest 1-norm of the rows of \(A\). Note: In class, I mistakenly said that the induced \(\infty\)-norm of a matrix is the largest entry of the matrix in absolute value.

Condition Number

For a matrix \(A \in \mathbb{R}^{n \times n}\), \(\kappa(A) = \|A\| \, \|A^{-1}\|\) (using any induced norm, although \(\ell_2\) is the most common). Then for a linear system \(Ax = b\), the relative error in the solution is given by \[\frac{\|x-x'\|}{\|x\|} \le \kappa(A) \frac{\|b-b'\|}{\|b\|}.\] This is because, \(\|b\| = \|A x \| \le \|A\| \, \|x\|\) and \(\|x-x'\| = \|A^{-1}(b-b')\| \le \|A^{-1}\| \, \|b-b'\|\), so putting both together gives \[\|b\| \|x-x'\| \le \|A\| \, \|A^{-1}\| \, \|x\| \, \|b-b'\|.\]
This leads directly to the inequality above when you separate the factors with \(x\) from those with \(b\).

From this equation, we get the following:

Rule of thumb. If the entries of \(A\) and \(b\) are both accurate to \(n\)-significant digits and the condition number of \(A\) is \(\kappa(A) = 10^k\), then the solution of the linear system \(Ax = b\) will be accurate to \(n-k\) significant digits.