Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Solving Systems of Linear Equations

8.1 Linear Algebra Background

To begin with, we will consider some basic definitions from linear algebra needed for understanding of matrix computations. We will restrict our consideration to the case when matrices and vectors are real. Vectors, then, are objects of the N-dimensional Euclidean space R^N. A list of the operations defined in R^N follows.

• Addition:

• Multiplication by scalar:

In addition, there is correspondence between any pair of vectors x, y ∈ R^N and a real scalar (x, y) called the dot product. The following axioms for the dot product are postulated:

Column vectors:

form a standard basis of R^N. All other vectors are expressed in terms of the basis b_n (n=1, …, N):

The coefficients x_n are the components of a vector x and this representation is commonly written in a compact form x=(x₁, …, x_N)^T. The superscript T indicates the transposition that changes column vectors into row vectors and vice versa. The dot product of two vectors x=(x₁, …, x_N)^Tand y=(y₁ …, y_N)^T is

Two vectors x and y are orthogonal if (x, y)=0.

To work with an object, we need some way to measure it. To do so, a vector norm is introduced. A norm of a vector x is a real scalar x with the following properties:

For a vector x=(x₁, …, x_N)^T, scalar

is a norm for any p≥ 1 (the Hölder p-norm). The following norms are commonly used:

Every square matrix generates a linear operator in the Euclidean space R^N. If y is a linear function of x, then

The rules for manipulating matrices are those required by the linear mappings. Thus C= A+ B is the sum of two linear functions represented by A and B, where c_nm= a_nm+ b_nm. The product AB represents the effect of applying the mapping B, then applying the mapping A. If we have y= A(Bx)=Cx, then components of matrix C are given by the formula

The transposition of A is obtained by reflecting A about its main diagonal:

A diagonal matrix D is all zeros except for d_n on the main diagonal: D= diag(d₁, …, d_N).

Now, let us consider norms of a matrix. Scalar A is a norm of a matrix A if the following conditions are satisfied:

The following norms are commonly used:

In what follows the notation ⋅ will denote some norm given above. A matrix norm is consistent with a vector norm if Ax ≤ A ⋅ x. For example, A₁ is consistent with x₁, A₂ with x₂, and A_∞ with x_∞.

The property of matrix definiteness plays an important role in matrix computations. So, a matrix A is positive (negative) definite if (x, A)>0 (< 0) for all nonzero vectors x. If the sign of (x, A) depends on vector x, then matrix A is called indefinite.

In the following, we demonstrate how to calculate the dot product of two vectors.

Listing 8.1

>>a = [123];

>>b = [-1 04];

>> c = dot(a, b)

c =

11.0000

>> d = sum(a. *b)

d =

11.0000

You could either use the dot function to compute the dot product of two vectors or you could use the dot product operator “.*” and then apply sum to get the dot product. The latter approach is better in performance.

The norm function, as demonstrated below, could be used to find out different norm values of a vector. The second argument determines the order of norm value. In the absence of second argument, the function returns 2-norm. This function also works for a matrix argument in the similar manner.

Listing 8.2

>>a = [1 2 3];

>> norm(a)

ans =

3.7417

>> norm(a, 1)

ans =

6.0000

>> norm(a, 2)

ans =

3.7417

>> norm(a, lnf)

ans =

3.0000

8.2 Systems of Linear Equations

A system of linear equations may be represented as

(8.1)

or, in expanded form,

where A={a_nm} is a square matrix of coefficients of size N by N, x =(x₁, …, x_N)^T is a vector of unknowns of size N, and f= (ƒ₁, …, f_N)^T is a known right-hand vector of size N.

There are numerous physical systems which naturally lead to linear equations, for example, the static analysis of structures (bridges, buildings, and aircraft frames) and the design of electrical networks. Many physical systems are modeled by differential equations, which often cannot be solved analytically. As a result, the finite difference approach has to be used to approximate differential equations. This often results in a system of linear equations (if the differential equation is linear) for approximate values of the solution. Also, systems of linear equations may arise as an intermediate stage in other computational processes. Matrix inversion is also an inefficient approach (if we have A^− 1, then computation of x is easy: x= A^− 1f), because computing an inverse matrix requires about three times as much computation and twice as much storage as does solving a system of linear equations.

To adopt one or another method for the solution of system (8.1) we should take into account properties of matrix A, such as symmetry, definiteness, band structure, and sparsity. This allows us to solve a system of linear equations more efficiently. The adoption of a method also depends on the type of problem under consideration, i.e., whether the problem requires solving a single linear system Ax=f or solving a set of linear equations Ax_k= f_k, k= l, …, M, where matrix A is the same for all k.

The following code listing demonstrates the methods to calculate the inverse and determinate of a matrix. There are two ways to calculate the inverse of a matrix. You could either use the inv function or you could take the reciprocal of the matrix.

Listing 8.3

>> u = rand(3);

0.9418 0.0777 0.1260

0.2466 0.1872 0.5753

0.9774 0.5600 0.0026

>> inv(u)

ans =

1.2140 − 0.2655 − 0.0797

− 2.1196 0.4555 1.9274

0.1694 1.7039 − 0.5932

>> 1/u

ans =

1.2140 − 0.2655 − 0.0797

− 2.1196 0.4555 1.9274

0.1694 1.7039 − 0.5932

>> det(u)

ans =

− 0.2650

8.3 Types of Matrices that arise from Applications and Analysis

8.3.1 Sparse matrices

Matrices with most of the elements equal to zero are called sparse. We can define a matrix A of size N by N as sparse if the number of its nonzero elements ~ N^1 + r(γ ≤ 0.5). For example, for N= 10³ and γ=0.5, the number of nonzero elements is 31622 (the total number of elements is 10⁶).

8.3.2 Band matrices

Many applications lead to matrices that are not only sparse, but have a band pattern in their nonzero elements. Matrix A={a_nm} is called a band matrix if a_nm= 0 for |n − m| > k where l= 2 k + 1 is the band width:

8.3.3 Symmetric positive definite matrices

Many physical problems lead to symmetric positive definite (SPD) matrices. The properties of SPD matrices follow:

• symmetry:

• positive definiteness:

It is rather difficult to determine whether a matrix is positive definite. Fortunately, there are more easily verified criteria for identifying members of this important class. First, all eigenvalues of a positive definite matrix are positive. This can be easily verified with the aid of the Gerschgorin theorem (see Chapter 4). Second, if A is an N by N positive definite matrix, then it has the following properties: (1) a_nm> 0 for all n; (2) a_nna_mm>(a_nm)² for n≠ m; and (3) the element of the largest modulus must lie on the leading diagonal.

8.3.4 Triangular matrices

Two types of triangular matrices follow, both of which are easy to work with.

• Lower triangular matrix:

• Upper triangular matrix:

The following function implements the LU decomposition of a tri-diagonal matrix. This utility function finds its application in a number of physical problems.

Listing 8.4

1. function [L, U] = ludecomp(A)

2. % ludecomp function decompose a matrix into

3. % Lower matrix (L) and upper matrix (U)

4. % input: A - input square matrix

5. % output: L - lower triangular factor

6. % U - upper triangular factor

7. n = length(A);

8. L = eye(n);

9. U = zeros(n);

10. U(l,l) = A(l,l);

11. for i=2:n

12. L(i, i-1) = A(i, i-1)/A(i-1, i-t),

13. U(i-1, i) = A(i-1, i);

14. U(i, i) = A(i, i) - L(i, i-1) *A(i-1, t);

15. A(i, i) = U(i, i);

16. end;

The application of this function is demonstrated in the following listing.

Listing 8.5

>> A = [2 − 2 0 0 0; − 2 5 − 6 0 0; 0 − 6 1 6 1 2 0; 0 0 1 2 3 9 − 6; 0 0 0 − 6 1 4];

>> [L U] = ludecomp(A)

L =

1.0000 0.0000 0.0000 0.0000 0.0000

− 1.0000 1.0000 0.0000 0.0000 0.0000

0.0000 − 2.0000 1.0000 0.0000 0.0000

0.0000 0.0000 3.0000 1.0000 0.0000

0.0000 0.0000 0.0000 − 2.0000 1.0000

U =

2.0000 − 2.0000 0.0000 0.0000 0.0000

0.0000 3.0000 − 6.0000 0.0000 0.0000

0.0000 0.0000 4.0000 12.0000 0.0000

0.0000 0.0000 0.0000 3.0000 − 6.0000

0.0000 0.0000 0.0000 0.0000 2.0000

>>A=L*U

A =

2.0000 − 2.0000 0.0000 0.0000 0.0000

− 2.0000 5.0000 − 6.0000 0.0000 0.0000

0.0000 − 6.0000 16.0000 12.0000 0.0000

0.0000 0.0000 12.0000 39.0000 − 6.0000

0.0000 0.0000 0.0000 − 6.0000 14.0000

A system of linear equations Lx= f can be solved by forward substitution:

(8.2)

In an analogous way, a system of linear equations Ux= f can be solved by backward substitution:

(8.3)

The following implementation of forward substitution method is used to solve a system of equations when the coefficient matrix is a lower triangular matrix. The function takes two arguments; the lower triangular coefficient matrix and the right- hand side vector. The output vector is the solution of the systems of equation.

Listing 8.6

1. function x = fsubstt(L, f)

2. % fsubstt function solves the linear system of equations

3. % using forward substitution method Lx = f such that L

4. % is the lower triangular matrix

5. % input: L - lower triangular matrix

6. % f - right-hand side vector

7. % output: x - solution vector

8. n = length(f);

9. x = zeros(n, 1);

10. x(l) = f(l) / L(l, 1);

11. for i=2:n

12. x(i) = (f(i) – L(i, 1:i−l) *x(l :i−I)) / L(i, i);

13. end;

Say, we have the following system of equations given in a matrix form.

Since the coefficient matrix is a lower triangular matrix, forward substitution method could be applied to solve the problem, as shown in the following.

Listing 8.7

>>L = [1 6 0 0 0; 51 1 0 0; 9 7 6 0; 4 1 4 1 5 l];

>> [16; 27; 41; 81]

>> x = fsubstt(L,f)

x =

1.0000

2.0000

3.0000

4.0000

The following implementation of backward substitution method is used to solve a system of equations when the coefficient matrix is an upper triangular matrix. The function takes two arguments; the upper triangular coefficient matrix and the right-hand side vector. The output vector is the solution of the systems of equation.

Listing 8.8

1. function x = bsubstt(U, f)

2. % bsubstt function solves the linear system of equations

3. % using backward substitution method Ux = f such that

4. %U is the upper triangular matrix

5. % input: U - upper triangular matrix

6. % f - right-hand side vector

7. % output: x - solution vector

8. n = length(f);

9. x = zeros(n, l);

10. x(n) = f(n) / U(n, n);

11. for i = n-1 : -l : 1

12. x(i) = (f(i) – U(i, i+1:n) * x(i + 1:n)) / U(i, i);

13. end;

Say, we have the following system of equations given in a matrix form.

Since the coefficient matrix is an upper triangular matrix, backward substitution method could be applied to solve the problem, as shown in the following.

Listing 8.7

>> U=[16 2 3 13; 0 11 108;00 6 12;000 1];

>> f = [81; 53; 60; 3];

>> x = bsubstt(U, f)

x =

2.0000

− 1.0000

4.0000

3.0000

8.3.5 Orthogonal matrices

If A is an orthogonal matrix, then A^TA= I or A^T= A^− 1, and it is easy to solve linear system (8.1) as x= A^Tf.

8.3.6 Reducible matrices

A square matrix A is reducible if its rows and columns can be permuted to bring it into the form:

where B is a square matrix of size M by M, D is a square matrix of size L by L, C is a matrix of size M by L, and O is zero matrix of size L by M (M+L=N). This property allows us to break the problem into two smaller ones, usually with a very substantial saving in work. Let us split each of the vectors x and f into two vectors:

Then system Ax=f can be solved in two stages, first

then

8.3.7 Matrices with diagonal dominance

If A is a matrix with strong diagonal dominance, then

If A is a matrix with weak diagonal dominance, then

and at least of the inequalities is satisfied as a strict one.

8.4 Error Sources

When we solve problem (8.1) by some method, three types of error may occur. First of all, we assume that problem (8.1) is posed exactly. Thus we neglect inevitable errors connected with formulating a problem as a model of scientific process (measurement errors, insufficiently precise basic hypotheses, etc.).

A vector fin (8.1) is not given a priori but must be computed by the data of the exact problem. These computations are performed with a certain error, and instead of f we actually obtain some “perturbed” vector f+ δf. Then instead of problem (8.1) we obtain the “perturbed” problem

(8.4)

We assume this problem is also solvable, and y is the solution. Then

is the perturbation error.

Now the perturbed system (8.4) must be solved. Let us assume that we have some algorithm which produces after a finite number of operations either an exact vector y (if operations are performed error free) or some vector z such, that the error

can be estimated and the error bound can be made arbitrarily small. The quantity e_a is called the algorithm error.

In fact any algorithm contains associated floating-point operations, which are connected with errors. These errors amount to the situation that instead of z we obtain some other vector w and the quantity

is called the rounding error.

Finally, instead of the wanted solution x* we compute some vector w. It is natural to take this vector as the approximate solution to problem (8.1) and its error is

8.5 Condition Number

As we discussed before, we have an original problem Ax=f and a perturbed problem A(x+ δx)=f + δf, where δf represents the perturbation in both the right-hand side and A. The objective is to estimate how the uncertainty δf in f is transmitted to δx, that is, to estimate the quantity

(8.5)

It follows from (8.1) and (8.4) that

and

If we substitute this into the previous expression, we get

The quantity

is called the standard condition number. This number estimates the maximum magnification of the uncertainty δf. If cond(A) is relatively small, then system (8.1) is well conditioned and δx is relatively small as well. For example, if δf/f = 10^− 14 and we want δf/f = 10^− 6, then cond(A)=10⁸ is quite acceptable. If cond(A) is relatively large, then we face an ill-conditioned system and we should use some special procedures to obtain an acceptable result.

The direct calculation of the condition number is usually inefficient, as we need to compute an inverse matrix. Therefore, we must resort to estimating the condition number. numEclipse provides the cond function to estimate the condition number as demonstrated in the following code listing.

Listing 8.8

>> u = rand(3);

u =

0.9418 0.0777 0.1260

0.2466 0.1872 0.5753

0.9774 0.5600 0.0026

>> cond(u)

ans =

4.5682

8.6 Direct Methods

In this part we will consider direct methods. The distinctive feature of these methods is that, ignoring round-off errors, they produce the true solution after a finite number of steps. Let us discuss the basic principles of these methods.

8.6.1 Basic principles of direct methods

The solution is obtained after first factorizing A as a product of triangular matrices, and the form of factorization depends on properties of a matrix A. For general matrices we obtain

where L is a lower triangular matrix with unit diagonal elements, and U is an upper triangular one. For SPD matrices we use the Cholesky factorization

where L is a lower triangular matrix. Hence, a system Ax=f can be transformed into two systems of equations:

(8.6)

The systems are solved simultaneously by forward and backward substitutions, respectively (see (8.2) and (8.3)). The error sources for these methods are the perturbation errors in factors L and U and the rounding errors in solving systems (8.6). An estimation of the work for the LU-factorization yields N_ops~ N³, and the Cholesky factorization requires half as much work and storage as the method of LU- factorization.

There is another factorization that is useful. Find an orthogonal matrix Q and upper triangular matrix R so that QAx=Rx=Qf; therefore, x may be obtained by backward substitution on the vector Qf. The attraction of the orthogonal factorization lies in the fact that multiplying A or f by Q does not magnify round-off errors or uncertainties associated with the matrix A or vector f. To see this, note that

since Q^TQ=I. So the advantage of this approach is that it produces a numerically stable method. However, this method requires twice as much work as the method of LU-factorization. That is why the application of orthogonal transformations to the solution of systems of linear equations is limited to some special cases. As we will see, these transformations have also found application in the eigenvalue problem.

Now we consider the sweep method. This technique is based on a special transformation that can be applied to systems with a band matrix. We restrict ourselves to the case where the matrix of system (8.1) is a tridiagonal one. It is convenient for us to represent a system of equations with a tridiagonal matrix in the explicit form:

(8.7)

The basic idea of the sweep method is to represent a solution in the following form:

(8.8)

which is at least possible for the first equation. Let coefficients α_n− 1, and β_n − 1, be known, then we can write

After eliminating x_n− 1, we rewrite the previous expression as follows:

Therefore, coefficients α_n and β_n are calculated according to the relations

where n= 2, …, N.

Once the parameters α_n and ß_n have been computed, we can easily solve system (8.7). To begin with, let us write expression (8.8) for n=N − 1 and the last equation (8.7):

Eliminating x_N− 1, we arrive at

All other components x_n are calculated using (8.10) for n=N − 1, …, 1. The sweep method is a very efficient technique for solving systems of equations with a tridiagonal matrix because N_ops= 8 N − 6.

The following theorem describes conditions which provide stability of the sweep method.

Theorem 8.1

Assume that the coefficients of system (8.7) satisfy the conditions

(8.9)

Where at least one of the inequalities is satisfied as a strict one. Then the following inequalities hold true:

and

where n= 2, …, N. Thus, conditions (8.9) turn out to be sufficient for performing the sweep method.

The following function implements the direct method of solving a system of linear equations such that the coefficient matrix is factored into lower- and upper- triangular matrices. The function takes these matrices as arguments along with the right-hand side vector. The implementation is simple and based on the last two implementations of forward and backward substitution. In the first step, forward substitution is applied and then the backward substitution is carried out to find the solution.

Listing 8.9

1. function x = ludirect(L, U,f)

2. % ludirect function solves the linear system of equations

3. % when the L & U factors are available Ly = f, Ux = Y

4. % input: L - lower triangular factor

5. % U - upper triangular factor

6. % f - righthand side vector

7. % output: x - solution vector

8. y = fsubstt(L,f);

9. x = bsubstt(U, y);

We have the following system of equations given in a matrix form.

The following steps show how to apply the direct method to solve this problem.

Listing 8.10

>> A = [1,2, -1; 4, 3,1; 2, 2,3];

>> [L, U] = ludecomp(A);

>> f = [2; 3; 5];

>> ludirect(L, U, f)

ans =

-1.0000

2.0000

1.0000

8.6.2 Error estimation for linear systems

Estimation of the error vector e= x− x* is a delicate task, since we usually have no way of computing this vector. It is possible, at least within the limits of machine precision, to compute the residual vector r= Ax− f, which vanishes if e=o. However, a small residual does not guarantee that the error is small. Let us analyze the relationship between e and r more generally. Observe that r= Ax− Ax*=A(x− x*)=Ae, that is, e= A^− 1r. Then the ratio e/f is

But Ax^∗ ≤ A ⋅ x^∗, so we can replace the denominator on the left side and multiply through by A to obtain the estimate

This inequality asserts that the relative error is small when the relative magnitude of the residual is small and cond(A) is not too large.

8.7 Iterative Methods

Iterative methods for Ax=f are infinite methods and they produce only approximate results. They are easy to define and can be highly efficient. However, a thorough study of the problem is needed in order to choose an appropriate method.

8.7.1 Basic principles of iterative methods

Iterative methods are usually based on the following equivalent form of system (8.1):

(8.10)

where B is an iterative matrix. We construct this form by analogy with (7.1), simply adapting the same principle to system (8.1). The components of the iterative matrix are expressed in terms of the components of matrix A. Vector g is also expressed in terms of matrix A and vector f. It is often not practical to calculate B explicitly but it is implied that in principle this can be done. On the basis of form (8.10), an iterative process is constructed as follows:

1) Choose an initial vector x⁽⁰⁾.

2) Compute successively

(8.11)

(8.12)

3) Conclude the iterative process if

Then

Iterative process (8.11) is called stationary and process (8.12) is accordingly called nonstationary. How are iterations x^(k) related to the exact answer for problem (8.1)? To clear up this question, let us carry out a simple analysis. Every approximation x^(k) may be represented in the following form:

(8.13)

where e^(k) is the error vector of kth approximation. After substituting (8.13) into (8.11) and taking into account (8.10), we obtain

(8.14)

then the iterations converge and

Theorem 8.2

A necessary and sufficient condition for iterative process (8.11) to be convergent for an arbitrary initial vector x⁽⁰⁾ is

(8.15)

here s(B) = max |λ_n(B)| is the spectral radius of the matrix B.

There is another sufficient condition that provides convergence: if B < 1, then condition (8.15) is satisfied, because s(B) ≤ B for any matrix norm.

Iterative process (8.13) is connected with the algorithm and rounding errors; the perturbation error does not occur. The algorithm error is easily estimated. For simplicity, the iteration is started with x⁽⁰⁾= g, then

Hence

For B < 1 we obtain

so the smaller the spectral radius of the matrix B, the faster approximations x^(k) approach x*. As for the rounding errors, the following estimate may be obtained:

As we can see, iterative methods are very stable in terms of rounding errors.

8.7.2 Jacobi method

Let us represent a matrix A as

(8.16)

where

Now we can construct an equivalent form (8.10) in the following manner

and finally

(8.17)

Therefore, iterative matrix B and vector g for the Jacobi method is

The construction of equivalent form (8.17) consists of solving the nth equation explicitly for the nth unknown and putting everything else on the right-hand side. Because matrix B_J is easily calculated, one can estimate s(B_J) to determine if the Jacobi method is convergent. There are also sufficient conditions which allow for clarifying convergence of the Jacobi method immediately in terms of matrix A. They are: (1) A is a matrix with strong diagonal dominance, so taking into account the structure of B_j, it is easy to show that B_J_∞ < 1; and (2) A is an irreducible matrix with weak diagonal dominance.

The implementation of the Jacobi method requires two utility functions, Idu and invdiag. The function Idu decomposes a given matrix into lower triangular, diagonal and upper triangular matrices.

Listing 8.11

1. function [L, D, U] = 1du(A)

2. % 1du function decompose a matrix into L, D & U matrices

3. % such that A = L + D + U

4. % input: A - input square matrix

5. % output: L - lower triangular factor

6. % D - diagonal factor

7. % U - upper triangular factor

8. n = length(A); L = zeros(n); D = zeros(n); U = zeros(n);

9. for i=l:n

10. for j=l:n

11. if i<j

12. U(i, j) = A(i, j);

13. elseif i > j

14. L(i, j) = A(i, j);

15. else

16. D(i,j) = A(i,j);

17. end;

18. end;

19. end;

The invdiag function returns the inverse of a given diagonal matrix. It was important to implement as a separate and efficient function rather than using the built-in inv function since the diagonal matrix is sparsely populated and it should not take that much computational effort.

Listing 8.12

1. function Dn = invdiag(D)

2. % invdiag function comptes the inverse of a diagonal matrix

3. % input: D - diagonal matrix

4. % output: Dn - diagonal inverse matrix

5. n = length(D); Dn = zeros(n);

6. for i=l:n

7. Dn(i, i) = l/D(i.i);

8. end;

The following code listing implements the Jacobi method using the above utility functions. It takes three input arguments; the coefficient matrix, right-hand side vector and error tolerance.

Listing 8.13

1. function x = jacobi(A, f, tol)

2. % jacobi function solves a linear system of equations using Jacobi method

3. % input: A - input coefficient matrix

4. % f - right-hand side vector

5. % output: x - solution of linear system of equations

6. n = length(A);

7. x0=zeros(n,1);

8. 1 = eye(n);

9. [L, D, U] = ldu(A);

10. invD = invdiag(D);

11. x = (I-invD*A)*x0 + invD*f;

12. i = 0;

13. while norm(x-x0) > tol

14. i=i+l;

15. x0 = x;

16. x = (I-invD*A)*x0 + invD*f;

17. end;

18. s = sprint('Total number of iterations = %d', i);

19. disp(s)

We have the following system of equations given in a matrix form.

The following steps show how to solve it using the Jacobi method.

Listing 8.14

>> A = [2 − 10; − 1 2 − 1; 0−1 2];

>> f = [1/3; 1; -1/3];

>> x = jacobi(A,f, le-6)

Total number of iterations = 38

x =

0.6667

1.0000

0.3333

It takes 38 iterations to reach the solution with desired accuracy.

8.7.3 Gauss-Seidel method

Let us refer again to representation (8.18). We can construct an equivalent form (8.12) in a different manner:

We do not need to invert the matrices L+ D or U+ D, because we can immediately apply the iteration

(8.18)

(8.19)

The matrix L+D is a lower triangular one and U+D is an upper triangular matrix. Therefore, systems (8.18) and (8.19) are easily solved with respect to x^(k+1). Because the explicit calculation of matrix B_GS is impractical, the sufficient convergence conditions are of importance. They are: (1) A is a matrix with strong diagonal dominance; (2) A is an irreducible matrix with weak diagonal dominance; and (3) A is a SPD matrix. In this last case, if the Jacobi method converges s(B_J,)< l), then the Gauss-Seidel method has more rapid convergence because s(B_GS)=s²(B_J) for this type of matrix.

The following function implements the Gauss-Siedel’s iterative method to solve a system of linear equations. The function signature is same as the Jacobi method.

Listing 8.15

1. function x = gauss(A, f, tol)

2. % gauss function solves a linear system of equations using Gauss Seidel method

3. % input: A - input coefficient matrix

4. % f - right-hand side vector

5. % output: x - solution of linear system of eqs

6. n = length(A);

7. x0=zeros(n,1);

8. [L, D, V] = 1du(A);

9. f1 = − L *x0 + f;

10. x = bsubstt(U+D,f1);

11. i = 0;

12. while norm(x-x0) > tol

13. i = i + 1;

14. x0 = x;

15. f1 = − L *x0 + f;

16. x = bsubstt(U+D, f1);

17. end;

18. s = sprint(‘Total number of iterations = %d’, i);

19. disp(s);

In the following, we apply the Gauss-Siedel method to the problem in last example.

Listing 8.16

>> x = gauss(A, f, 1e-6)

Total number of iterations = 20

x =

0.6667

1.0000

0.3333

It shows that the Gauss-Siedel method is superior to the Jacobi method since it takes only 20 iterations to solve the same problem.

8.7.4 Method of relaxation

When employing an iterative method, the main question is how to achieve rapid convergence, or to put this in other words, how to construct an iterative matrix with the smallest s(B) possible. To do this, one should be able to exert control over the properties of iterative matrix B. This can be realized with the use of a special procedure called the method of relaxation.

To begin with, let us discuss the general scheme of this method. Consider system (8.1) with an SPD matrix A. For an arbitrary vector x^(k), which is considered as an approximate solution, we define an error function

In order to define another vector x^(k+ 1) which differs from x^(k) only in the nth component such that φ(x^(k + 1)) < φ(x^(k)), we put x^(k + 1) = x^(k) + αb_n (b_n is the basis vector, see 8.1). Then, it is easy to obtain

(8.20)

where (r^(k))_n is the nth component of the residual vector: (r^(k))_n = (Ax^(k) − f, b_n). The sum of the second and third terms in (8.20) can be made negative; we simply choose  so that

and

where ω is called the relaxation parameter. By choosing parameter a for n= l, …, N we arrive at the following iterative scheme:

(8.21)

The equivalent form (8.10) for this iterative process may be also written as

(8.22)

Now we have an iterative matrix that depends on a parameter ω. One can find such ω that provides the minimal value of s(B_JR(ω)). This parameter is called the optimal relaxation parameter and is denoted as ω_opt. The method of relaxation (8.21) is based on the Jacobi method (B_JR(1) = B_J). It is easy to see that λ(B_JR(ω)) = 1 + ω(λ(B_J) − 1). Let − λ_l ≤ λ(B_J) ≤ λ_r then the optimal relaxation parameter is

and

The next logical step is to construct the relaxation method based on the Gauss-Seidel iteration (8.18). Taking into account representation (8.16), one can transform (8.22) into the following equivalent system:

As before, we do not need to invert matrix D+L and the iterative scheme is

(8.23)

Matrix D+ ωL is a lower triangular one, therefore system (8.23) can be easily solved with respect to X^(k + 1). It was noticed in practice (in some cases it was proved) that convergence of iterative scheme (8.23) with ω< 1 (under-relaxation) is slower in comparison to ω ≥ 1 (over-relaxation). Thus, ω ≥ 1 is usually used and iterative scheme (8.23) is called the method of successive over-relaxation (SOR). There is only one procedure, proposed by David Young, to calculate the optimal relaxation parameter ω. Let us briefly consider this result.

The Jacobi matrix B_J = I − D^− 1 A = − D^− 1L − D^− 1U is consistently ordered if

for any α≠ 0. For example, block tridiagonal matrices of the form

where D_m is a square matrix of size N_m × N_m(m = 1, …, p), L_m and U_m-1 are matrices of size N_m × N_m + 1 and N_m + 1 × N_m respectively (m = 2, …, p; N₁ + … + N_p = N) are consistently ordered. Let A be an SPD consistently ordered matrix. If the Jacobi method converges (s(B_J)<1), then the optimal relaxation parameter for the SOR is calculated by the formula:

(8.24)

Calculating the spectral radius of the Jacobi matrix B_J requires an impractical amount of computation, so this is seldom done. However, relatively inexpensive rough estimates of s(B_J) can yield reasonable estimates for the optimal value of ω_opt.

As for other types of matrices, we do not know how to determine the optimal relaxation parameter. However, there are some conditions which are sufficient for the convergence of the SOR. First, if A is an SPD matrix, then the SOR converges when 0 < ω< 2. In this case the relaxation parameter given by (8.24) is still close to optimal, because ω_opt − 1 ≤ s(B_SOR) ≤ (ω_opt − 1)^1/2. Second, if A is a matrix with strong diagonal dominance, or with weak diagonal dominance and irreducible, then some *>1 exists. The SOR converges when 0 < ω<ω*.

Successive Over Relaxation or SOR method introduces a parameter to control the convergence of the iterative process, as shown in the following implementation. The signature of following function is same as previous functions except the addition of w control parameter.

Listing 8.17

1. function x = sor(A, f, w, tol)

2. %gauss function solves a linear system of equations using Gauss Seidel method

3. % input: A - input coefficient matrix

4. % f - right-hand side vector

5. % w - omega relaxation parameter

6. % tol - error tolerance

7. % output: x - solution of linear system of equations

8. n = length(A);

9. x0 = zeros(n, 1);

10. [L, D, U] = ldutA);

11. fl = (D−w*(D + U))*x0 + w*f;

12. x = fsubstt(D + w*L,fl);

13. i = 0;

14. while norm(x−x0) > tol

15. i=i+l;

16. x0 = x;

17. fl = (D−w*(D + U))*x0 + w*f;

18. x = fsubstt(D + w*L,fl);

19. end;

20. s = sprint(‘Total number of iterations = %d’, i);

21. disp(s);

We apply the SOR method to the same problem discussed in previous sections. There are a number of methods to find the optimal value of control parameter, w. For the sake of simplicity, we have arbitrarily chosen the value in the following.

Readers are encouraged to try different values to see how it impacts the convergence speed.

Listing 8.18

>>x = sor(A,f, 1.2, 1e-6)

Total number of iterations = 9

x =

0.6667

1.0000

0.3333

It takes only 9 iterations to solve the same problem which shows that SOR is the best iterative method to solve a system of linear equations.

8.7.5 Variational iterative methods

It is evident from the foregoing account that, in general, we need some information about the spectral radius of an iterative matrix in order to analyze convergence of an iterative method. Often, it is impractical to calculate or estimate s(B). There are methods wherein convergence is ensured by special parameters, which in turn depend on an approximate solution. It is clear that the realization of such methods gives rise to non-stationary iterative processes (8.12).

We shall consider an iterative scheme of the following form:

(8.25)

where τ_k ≠ 0 is an iterative parameter. Let us introduce residual r^(k) = Ax^(k) − f. Using (8.25), the residual vector r^(k+ 1) associated with the next approximation can be expressed in terms of r^(k)

Thus, we can determine parameter τ_k ≠ 0 in such a manner that r^(k + 1)₂ will be minimal. The norm of residual r^(k+ 1) is expressed as follows:

(8.26)

Condition φ′(τ_k) is the condition of the minimum of φ(τ_k), because

Then τ_κ can be calculated as

(8.27)

Iterative schemes (8.25) and (8.27) together comprise the method of minimal residual. Let us analyze convergence of this method. If we substitute (8.27) into (8.26), then

(8.28)

Both terms in the right-hand side of (8.28) are positive, therefore

for any definite matrix A and this proves convergence of iterative scheme (8.25). Now consider the rate of convergence of the method of minimal residual. Suppose a matrix A of system (8.1) has the following properties:

and

Then

with

If all eigenvalues of matrix A are real, then we can put

and

For SPD matrices one can use iterative scheme (8.25) and calculate τ_k by other means. Instead of the residual, let us consider the actual error e^(k)= x^(k)− x*. Taking into account that Ax*=f, from (8.25) we obtain

(8.29)

We have no way of calculating the actual error, but

So we can determine τ_k by minimizing the value of (e^(k + 1), Ae^(k + 1)). Using (8.29) we obtain

Condition φ′(τ_k) = 0 is the condition of the minimum of φ(τ_k), because φ″(τ_k) = (r^(k), Ar^(k)) > 0, (A is an SPD matrix). So τ_k is determined as follows:

(8.30)

Iterative scheme (8.25) with τ_k from (8.30) is called the method of steepest descent. Properties of this method are similar to the method of minimal residual.

On the basis of the method of steepest descent we can construct an iterative method with more rapid convergence. Let us modify (8.25) and write it in the following form:

(8.31)

Here d^(k) is a direction vector which points towards the solution of a linear system. In the method of steepest descent, direction vectors are residuals r^(k), but this choice of directions turns out to be a poor one for many systems. It is possible to choose better directions by computing direction vectors as

(8.32)

Parameters a_k are determined in such a way as to provide orthogonality of vectors d^(k) and Ad^(k − 1)((d^(k), Ad^(k − 1)) = 0), then

(8.33)

Using the same considerations as those for the method of steepest descent, r_k is determined so that it provides a minimal value of (e^(k + 1), Ae^(k + 1)). Then we have

(8.34)

Iterative scheme (8.31) with equations (8.32), (8.33), and (8.34) comprise the conjugate gradient method. The error of the kth approximation generated by this method obeys the following bound

where

The distinctive feature of this method is that, ignoring round-off errors, we obtain r^(k)= 0 at some k≤ N, therefore x^(k)= x*. By this means we need no more than N iterations to achieve the true solution. For large systems of equations this is not such a promising result. Nevertheless, in some cases the conjugate gradient method and its modifications produce very rapid convergence.

8.8 Exercises

The problems consist of two parts:

1) solve a system of linear equations using LU-factorization (systems 1–20),

2) solve a system of linear equations using Gauss-Seidel method (systems 21–30).

Estimate the error of approximate solution. Methodical comments: to compute the LU-factorization use the function [L,U] = ludecomp (A) ; to estimate the condition number use the function cond (A).

Set of systems of equations to solve: