In Section 3.4, we studied specific techniques that allow us to solve systems of linear equations in the form , where A is an matrix and b is an vector. Such systems often arise in applications to the real world. The coefficients in the system are frequently obtained from experimental data, and, in many cases, both m and n are so large that a computer must be used in the calculation of the solution. Thus two types of errors must be considered. First, experimental errors arise in the collection of data since no instruments can provide completely accurate measurements. Second, computers introduce roundoff errors. One might intuitively feel that small relative changes in the coefficients of the system cause small relative errors in the solution. A system that has this property is called well-conditioned; otherwise, the system is called ill-conditioned.
We now consider several examples of these types of errors, concentrating primarily on changes in b rather than on changes in the entries of A. In addition, we assume that A is a square, complex (or real), invertible matrix since this is the case most frequently encountered in applications.
Consider the system
The solution to this system is
Now suppose that we change the system somewhat and consider the new system
This modified system has the solution
We see that a change of in one coefficient has caused a change of less than in each coordinate of the new solution. More generally, the system
has the solution
Hence small changes in b introduce small changes in the solution. Of course, we are really interested in relative changes since a change in the solution of, say, 10, is considered large if the original solution is of the order , but small if the original solution is of the order .
We use the notation to denote the vector , where b is the vector in the original system and is the vector in the modified system. Thus we have
We now define the relative change in b to be the scalar , where denotes the standard norm on (or ); that is, . Most of what follows, however, is true for any norm. Similar definitions hold for the relative change in x. In this example,
Thus the relative change in x equals, coincidentally, the relative change in b; so the system is well-conditioned.
Consider the system
which has
as its solution. The solution to the related system
is
Hence,
while
Thus the relative change in x is at least times the relative change in b! This system is very ill-conditioned. Observe that the lines defined by the two equations are nearly coincident. So a small change in either line could greatly alter the point of intersection, that is, the solution to the system.
To apply the full strength of the theory of self-adjoint matrices to the study of conditioning, we need the notion of the norm of a matrix. (See Exercises 26-30 of Section 6.1 for further results about norms.)
Let A be a complex (or real) matrix. Define the norm of A by
where or .
Intuitively, represents the maximum magnification of a vector by the matrix A. The question of whether or not this maximum exists, as well as the problem of how to compute it, can be answered by the use of the so-called Rayleigh quotient.
Let B be an self-adjoint matrix. The Rayleigh quotient for is defined to be the scalar .
The following result characterizes the extreme values of the Rayleigh quotient of a self-adjoint matrix.
For a self-adjoint matrix , we have that is the largest eigenvalue of B and is the smallest eigenvalue of B.
By Theorems 6.19 (p. 381) and 6.20 (p. 381), we may choose an orthonormal basis of eigenvectors of B such that , where . (Recall that by the lemma to Theorem 6.17, p. 370, the eigenvalues of B are real.) Now, for , there exist scalars such that
Hence
It is easy to see that , so we have demonstrated the first half of the theorem. The second half is proved similarly.
For any square matrix A, is finite and, in fact, equals , where is the largest eigenvalue of A* A.
Let B be the self-adjoint matrix A*A, and let be the largest eigenvalue of B. Since, for ,
it follows from Theorem 6.44 that .
Observe that the proof of Corollary 1 shows that all the eigenvalues of A*A are nonnegative. For our next result, we need the following lemma.
Lemma. For any square matrix A, is an eigenvalue of A* A if and only if is an eigenvalue of AA*.
Let be an eigenvalue of A*A. If , then A*A is not invertible. Hence A and A* are not invertible, so that is also an eigenvalue of AA* . The proof of the converse is similar.
Suppose now that . Then there exists such that . Apply A to both sides to obtain . Since (lest ), we have that is an eigenvalue of AA*. The proof of the converse is left as an exercise.
Let A be an invertible matrix. Then , where is the smallest eigenvalue of A* A.
Recall that is an eigenvalue of an invertible matrix if and only if is an eigenvalue of its inverse.
Now let be the eigenvalues of A*A, which by the lemma are the eigenvalues of AA*. Then equals the largest eigenvalue of , which equals .
For many applications, it is only the largest and smallest eigenvalues that are of interest. For example, in the case of vibration problems, the smallest eigenvalue represents the lowest frequency at which vibrations can occur.
We see the role of both of these eigenvalues in our study of conditioning.
Let
Then
The eigenvalues of B are 3, 3, and 0. Therefore . For any
we may compute R(x) for the matrix B as
Now that we know exists for every square matrix A, we can make use of the inequality , which holds for every x.
Assume in what follows that A is invertible, , and . For a given , let be the vector that satisfies . Then , and so . Hence
Thus
Similarly (see Exercise 9),
The number is called the condition number of A and is denoted cond(A). We summarize these results in the following theorem.
For the system , where A is invertible and , the following statements are true.
(a) We have .
(b) If and are the largest and smallest eigenvalues, respectively, of A*A, then .
Statement (a) follows from the previous inequalities, and (b) follows from Corollaries 1 and 2 to Theorem 6.44.
It should be noted that the definition of cond(A) depends on how the norm of A is defined. There are many reasonable ways of defining the norm of a matrix. In fact, the only property needed to establish Theorem 6.45(a) and the two displayed inequalities preceding it is that for all x.
It is clear from Theorem 6.45(a) that . It is left as an exercise to prove that if and only if A is a scalar multiple of a unitary or orthogonal matrix. Moreover, it can be shown with some work that equality can be obtained in (a) by an appropriate choice of b and .
We can see immediately from (a) that if cond(A) is close to 1, then a small relative error in b forces a small relative error in x. If cond(A) is large, however, then the relative error in x may be small even though the relative error in b is large, or the relative error in x may be large even though the relative error in b is small! In short, cond(A) merely indicates the potential for large relative errors.
We have so far considered only errors in the vector b. If there is an error in the coefficient matrix of the system , the situation is more complicated. For example, may fail to be invertible. But under the appropriate assumptions, it can be shown that a bound for the relative error in x can be given in terms of cond(A). For example, Charles Cullen (Charles G. Cullen, An Introduction to Numerical Linear Algebra, PWS Publishing Co., Boston 1994, p. 60) shows that if is invertible, then
It should be mentioned that, in practice, one never computes cond(A) from its definition, for it would be an unnecessary waste of time to compute merely to determine its norm. In fact, if a computer is used to find , the computed inverse of A in all likelihood only approximates , and the error in the computed inverse is affected by the size of cond(A). So we are caught in a vicious circle! There are, however, some situations in which a usable approximation of cond(A) can be found. Thus, in most cases, the estimate of the relative error in x is based on an estimate of cond(A).
Label the following statements as true or false.
(a) If is well-conditioned, then cond(A) is small.
(b) If cond(A) is large, then is ill-conditioned.
(c) If cond(A) is small, then is well-conditioned.
(d) The norm of A equals the Rayleigh quotient.
(e) The norm of A always equals the largest eigenvalue of A.
Compute the norms of the following matrices.
(a)
(b)
(c)
Prove that if B is symmetric, then is the largest eigenvalue of B.
Let A and be as follows:
The eigenvalues of A are approximately 84.74, 0.2007, and 0.0588.
(a) Approximate , and cond(A). (Note Exercise 3.)
(b) Suppose that we have vectors x and such that and . Use (a) to determine upper bounds for (the absolute error) and (the relative error).
Suppose that x is the actual solution of and that a computer arrives at an approximate solution . If , and , obtain upper and lower bounds for .
Let
Compute
Let B be a symmetric matrix. Prove that equals the smallest eigenvalue of B.
Prove that if is an eigenvalue of AA*, then is an eigenvalue of A*A. This completes the proof of the lemma to Corollary 2 to Theorem 6.44.
Prove that if A is an invertible matrix and , then
Prove the left inequality of (a) in Theorem 6.45.
Prove that if and only if A is a scalar multiple of a unitary or orthogonal matrix.
(a) Let A and B be square matrices that are unitarily equivalent. Prove that .
(b) Let T be a linear operator on a finite-dimensional inner product space V. Define
Prove that , where is any orthonormal basis for V.
(c) Let V be an infinite-dimensional inner product space with an orthonormal basis . Let T be the linear operator on V such that . Prove that (defined in (b)) does not exist.
Visit goo.gl/
The next exercise assumes the definitions of singular value and pseudoinverse and the results of Section 6.7.
Let A be an matrix of rank r with the nonzero singular values . Prove each of the following results.
(a)
(b)
(c) If A is invertible (and hence ), then .