Today we talked about using partial pivoting to avoid ill-conditioned matrices when we use the LU-decomposition. We looked at these examples:
\(A = \begin{pmatrix} 0 & 1 & 2 \\ 1 & 1 & 1 \end{pmatrix}\)
\(A = \begin{pmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{pmatrix}\)
The process involves permuting rows as needed so that the pivot in each column is the largest remaining option (in absolute value) at each step. You keep track of the permutations by applying the same permutations to the rows of the identity matrix and calling the result \(P\). Then you get the LU-decomposition with partial pivoting \[PA = LU.\] The fixes two problems:
A norm is a function \(\|\cdot\|\) from a vector space \(V\) to \([0,\infty)\) with the following properties:
Intuitively a norm measures the length of a vector. But there are different norms and they measure length in different ways. The three most important norms on the vector space \(\mathbb{R}^n\) are:
The \(2\)-norm (also known as the Euclidean norm) is the most commonly used, and it is exactly the formula for the length of a vector using the Pythagorean theorem. \[\|x\|_2 = \sqrt{x_1^2 + x_2^2 + \ldots + x_n^2}.\]
The \(1\)-norm (also known as the Manhattan norm) is \[\|x\|_1 = |x_1|+|x_2|+\ldots+|x_n|.\] This is the distance you would get if you had to navigate a city where the streets are arranged in a rectangular grid and you can’t take diagonal paths.
The \(\infty\)-norm (also known as the Maximum norm) is \[\|x\|_\infty = \max \{ |x_1|, |x_2|, \ldots, |x_n| \}.\]
These are all special cases of \(p\)-norms which have the form \[\|x\|_p = \sqrt[p]{|x_1|^p + |x_2|^p + \ldots + |x_n|^p}.\]
The set of all matrices in \(\mathbb{R}^{m \times n}\) is a vector space. So it makes sense to talk about the norm of a matrix. There are many ways to define norms for matrices, but the most important for us are induced norms. For a matrix \(A \in \mathbb{R}^{m \times n}\), the induced \(p\)-norm is \[\|A\|_p = \max \{\|Ax\|_p : x \in \mathbb{R}^n, \|x\|=1\}.\]
Two important special cases are
For a matrix \(A \in \mathbb{R}^{n \times n}\), \(\kappa(A) = \|A\| \, \|A^{-1}\|\) (using any induced norm, although \(\ell_2\) is the most common). Then for a linear system \(Ax = b\), the relative error in the solution is given by \[\frac{\|x-x'\|}{\|x\|} \le \kappa(A) \frac{\|b-b'\|}{\|b\|}.\] This is because, \(\|b\| = \|A x \| \le \|A\| \, \|x\|\) and \(\|x-x'\| = \|A^{-1}(b-b')\| \le \|A^{-1}\| \, \|b-b'\|\), so putting both together gives \[\|b\| \|x-x'\| \le \|A\| \, \|A^{-1}\| \, \|x\| \, \|b-b'\|.\]
This leads directly to the inequality above when you separate the factors with \(x\) from those with \(b\).
From this equation, we get the following:
Rule of thumb. If the entries of \(A\) and \(b\) are both accurate to \(n\)-significant digits and the condition number of \(A\) is \(\kappa(A) = 10^k\), then the solution of the linear system \(Ax = b\) will be accurate to \(n-k\) significant digits.