Chapter 25

Numerical Methods for Linear Systems of Equations

25.1 Introduction

Until now we have mostly restricted ourselves to finding analytical solutions to most of the equations. In many cases, we were only able to do so by applying certain simplifications. However, we often encountered equations that are simply unsolvable analytically. For these equations, we have to turn to numerical methods. In the following sections, we will study the most important techniques commonly encountered and discuss examples of these methods. Most of the methods can be implemented conveniently in any programming language and we will make heavy use of both Maple and C in this chapter. We will first start by discussing numerical methods for solving linear systems of equations. Solving linear systems is one of the most commonly encountered problems in numerics and these methods do therefore, merit further study.

25.1.1 Advantages and Disadvantages of Numerical Methods

Obviously, there are several advantages to analytical solutions over numerical solutions. Analytical solutions are the “universal” solution which means that you can substitute any values for the variables and will immediately obtain the solution to the given scenario. We only have to find the solution in symbolic form once. It will then be valid for all values of the variables. Obviously, an analytical solution may also have certain restrictions, such as that a given variable must never be zero, or always be positive. Another common case is solutions which are made up of several cases. We will see such a case in section 27.3.1 where we study the damped harmonic oscillation. In this case, the values of several of the variables define which equation of a given set of equations represents the correct solution.

In addition, having access to the analytical solution of, e.g., a differential equation allows us to determine the influence of the individual independent variables or parameters quantitatively. As an example, if an equation is made up of several terms, each of which is preceded by a prefactor, we can easily estimate which of these terms is most important by simply checking the prefactors and the potential range of values for the independent variables. If one prefactor dominates the equation, we may be able to drop the remaining terms, thus making it easier to use the equation. Having the analytical solution at hand allows us to quantify the error we make by dropping respective terms. In many cases, a rather inexact solution may be completely sufficient as long as we know the rough dimension of the error we introduced. This can usually be done conveniently if we know the analytical solutions.

Numerical solutions have the disadvantage of not being “universal.” As we drop the symbols and convert all terms to numbers, we cannot assess the influence a change in the value of a parameter or a dependent variable will have upon the final result. We can only obtain such a correlation by running several numerical problems with slightly different values for one variable, while keeping all other values constant. By closely observing the change in the output, we may be able to establish a correlation. However, there are cases where it is somewhat more simple to quantify the influences of terms. As an example, take the simplifications we made during the derivation of the numerical schemes for the first- and second-order derivatives Taylor series (see section 4.2.4) where we dropped higher-order terms stating that they are “less important.” In such cases, it is rather simple to quantify the error we introduce by dropping these terms. However, if an equation contains a large number of independent variables or parameters that are dependent on some of these variables and which may therefore change over time, it is very difficult to exactly quantify the influence of each term on the final result.

In contrast to analytical solutions, which only have to be solved once, numerical solutions must generally be calculated from scratch every time. If 10 problems have to be solved, the analytical solution can be used to simply substitute the 10 sets of values for the parameters and/or independent variables. The numerical solution must be calculated for each of these problems separately, i.e., a total of 10 calculations must be done. In some cases, the numerical output of a previous run can be “reused.” In these cases, the output can be considered as being a rough estimate for a solution, which usually helps a numerical solver come up with a solution to a new problem significantly faster. In many cases, rough solutions may even be mandatory, as the solver will not find a solution at all otherwise.

Despite these obvious disadvantages, numerical solutions are often the only choice to solve a complicated equation. It is surprising how often we encounter problems in engineering mathematics and technical physics which are simply unsolvable analytically. The fundamental equation of fluid mechanics is the Navier-Stokes equation (see Eq. 11.40) which can only be solved analytically by simplifying it significantly. Only by dropping the nonlinear terms and the time-dependent terms we were able to come up with solutions for very simple channel cross-sections (see section 15.4.1). For limited cases, we were able to reconstruct transient solutions both for time and space, thus reincorporating the first-order time-dependent derivative (time-transient) and the second-order space-dependent (space-transient) terms. However, finding analytical solutions that incorporate both, let alone in more than one dimension, is almost impossible. If you recall, we have already used numerical methods to solve the equation of the pendant and sessile drop in section 24. The differential equation which describes the drop contour (see Eq. 24.10) could not be solved by any other means than a numerical scheme.

25.1.2 Concepts Commonly Used in Numerical Methods

In this section we will briefly discuss some of the concepts and terms commonly used in conjuncture with numerical methods.

Numerical Scheme. In general, a numerical scheme can be thought of an algorithmic description which (when correctly applied) will yield a solution to a mathematical problem. You can think of it as a cooking recipe: If you follow it closely, you will end up with a numerical solution to an input problem.

Solvers. In practical applications, a numerical scheme would be implemented in the form of a piece of software code. These programs are usually referred to as solvers. Solvers are computer programs which apply a numerical scheme for solving (differential) equations. Usually, they are bundled in software packages that provide the user with a suitable interface for inputting the problem and for outputting the solution in a practical way.

Iteration. A numerical solver will usually iterate a numerical scheme, coming up with better and better approximations to the solution in a step-by-step fashion. Suppose we want to find an unknown solution vector xsi1_e to an equation of the form f(x)=0si2_e. In order to solve this equation, we have derived a numerical scheme that processes a first or initial guess x0si3_e to a slightly better solution x1si4_e. The scheme could be written in the form

xi+1=g(xi)

si5_e

where g would represent our numerical scheme. The algorithm iterates this scheme transforming the previous solution xisi6_e into the improved solution xi+1si7_e. For each step we can keep track of the change in solution given by

Δxi=xi+1xi

si8_e

If we are approaching the correct value for xsi1_e we find that the difference in Δxi0si10_e. This means that once the change is below a certain (problem-dependent) threshold, we may consider the value as sufficiently exact. At this point we have found the solution, i.e., the correct value for xsi1_e. A good numerical solver or algorithm would let Δxnsi12_e converge rapidly to 0. If a numerical problem approaches the solution, we refer to this as convergence of the problem. If you ever find an error message along the lines of “problem is not converging,” you know that Δxisi13_e does not converge to 0, i.e., the approximations for the solution are not getting better.

The main difference in numerical algorithms lies in the methodology of using xisi6_e for obtaining the next approximation xi+1si7_e.

Fast, Robust, and Stable Numerical Solver. There are certain criteria which are important when comparing numerical schemes and the solvers that implement them. A fast converging solver will make Δxisi13_e converge rapidly to 0. Obviously, this is advantageous because it saves processing time and thus yields the solution more rapidly.

On the other hand, speed is not the only important criterion. In many applications, it is actually significantly more important to have a robust solver. A robust solver is a solver that can start with values of x0si3_e that are far off the correct solution and still converge to the correct solution. This is especially important because the first initial guess we make may be terribly off depending on how much we know about the actual solution. If a solver is not robust against these bad initial guesses, it may not converge to a solution at all.

The third criterion which is of importance when choosing an algorithm is the stability of the algorithm. If an algorithm is stable, it will converge to the solution steadily. As it turns out, many algorithms may catch onto a suitable solution, but (due to factors which can be influenced by the user) they do not converge steadily but get “off-track.” This is often referred to as divergence, i.e., the algorithm diverges from the actual solution and becomes instable. Usually this results in the algorithm not finding any solution at all.

Initial Guess. As already started, we need to provide a first initial guess to most algorithms. In our example, the value x0si3_e is the initial guess. If this initial guess is already close to the solution, we expect a stable and fast convergence to the actual solution of our equation. In many applications, the initial guess is key to obtaining a solution at all. You can see an example of the importance of the initial guess in listing 24.2 where we (numerically) solved the ODE for the pendant drop contour. If we do not supply the solver with a good initial guess for θ, we will find that it does not converge at all.

Accuracy. Numerical solutions are never exact. There are two main reasons for this. First, the numerical scheme we use will be (to a certain degree) inexact. This is due to the fact that during the derivation of the scheme, certain terms are simply ignored. A good example for this are all schemes derived from the Taylor series (see section 4.2.4). Take for example, the equation for the central finite difference (see Eq. 4.5) that is usually only expanded to n = 1, which yields Eq. 4.6. Obviously, the latter equation is not exact as all higher-order terms have been dropped.

The second source of inexactness is due to the fact that numerical accuracy in computer code is always dependent on the accuracy of the data type used during the calculation. In our examples, we will use floating-point values of 8 byte size. This represents a decimal accuracy of roughly 16 decimal places. This is sufficiently exact for most applications. However, there are several cases where numerical inaccuracies can quickly add up to significant errors.

You will see that in almost all examples, we will use a variable to force a certain given accuracy. By convention we use the symbol img or the variable epsilon to indicate the required accuracy. In many cases, forcing a numerical scheme to yield slightly more accurate results can lead to significantly longer calculation times. It is therefore often required to find a tradeoff between precision and processing time.

Computational Cost. Numerical schemes vary widely in the number of operations a microprocessor has to perform in order to complete one iteration. Many schemes require several evaluations of an equation, whereas others require only one. Some schemes will require solving a system of linear equations, whereas others only require a mere addition or multiplication. The overall workload for the processor is usually referred to as the computational cost. A solver with high computational cost runs slower as it requires more work for the processor to complete one iteration. As we will see, there are many cases where an iteration would require solving a more complex equation, e.g., a linear system of equations. We will encounter such an example in section 34.2.2.3 where we need to repeatedly solve a linear system. Instead of doing this computationally rather expensive operation, we solve the equation one by one, taking into account that we end up with a solution that is mathematically not correct. However, by doing so, we obtain more accurate results for each equation. It is computationally less expensive to repeat this operation several times (until a sufficiently exact solution is obtained), instead of solving a linear system of equations.

25.1.3 A Brief Historical Overview of Numerical Methods

As you can see, there are many reasons why the study of numerical methods is required. Interestingly, the field of numerical methods is older than generally imagined. Long before the advent of modern computers, researchers have sought numerical methods to solve complex equations. Obviously, these methods only became widely applicable with the introduction of “computation machines.” One of the earliest contribution to numerical methods is a paper we have already studied, i.e., the attempts to derive the contour of sessile and pendant drops by Bashforth and Adams in 1883 [1]. They introduced the famous Bashforth-Adams schemes that we will study in section 27.2.6. The numerical contributions to this paper are mostly due to Adams. The paper discusses the use of the Taylor series for approximating the next point of a solution using multiple previously calculated points.

The second seminal paper was provided by Runge1 in 1895 [2]. Runge is generally accepted as having introduced modern one-step methods which we will study in section 27.2.8. In contrast to the Bashforth-Adams scheme, Runge’s approach only uses one recent data point to predict the next.

The 20th century saw many important improvements of these early numerical schemes. One of the earliest improvements of Runge’s original method was provided by Heun2 [3]. The method in turn was improved again by Kutta3 [4]. In this paper, he derived the most famous of the Runge-Kutta methods which is the fourth-order scheme that we will study in section 27.2.8.

Predictor-corrector schemes, which we will study in section 27.2.7, were introduced in the seminal paper by Moulton4 in 1926 [5]. Predictor-corrector schemes were already briefly laid out in the paper by Adams and Bashforth but it was Moulton who formulated the first schemes for them.

More information on the development of numerical methods in the 20th century can be found in the noteworthy review by Butcher [6].

25.2 Solutions to Linear Systems of Equations

25.2.1 Introduction

When solving fluid mechanical problems of seemingly trivial flow problems, we are occasionally left with a system of equations that we have to solve for some unknown variables. In general, in order to solve for n independent variables, we need a set of n independent equations. In many applications, these equations are linear, e.g., they do not contain functions such as sine or cosine or exponential expressions of the unknowns. They are of the form

f0(x0,x1,,xn)=0f1(x0,x1,,xn)=0f2(x0,x1,,xn)=0fn(x0,x1,,xn)=0

si19_e  (Eq. 25.1)

As an example let us assume that during solving a fluid mechanical problem, we are left with a set of equations such as

3x0+4x1+2x22=07x0+3x1+x24=0x0+x1+x2=0

si20_e

This is a classical example of a linear system - it contains the unknowns x0, x1, and x2 only in linear form. Solving such a system for the unknowns is simple and can be carried out by hand. However, if we have to solve for more than just three unknowns, manually solving this equation will become difficult. In this case, one defaults to the following concept.

25.2.1.1 Matrix Notation

The method of solving such systems is always the same. We start by rewriting it such that the right-hand side contains all the constants

3x0+4x1+2x2=27x0+3x1+x2=4x0+x1+x2=0

si21_e  (Eq. 25.2)

In the next step, we convert this to matrix notation. For this we note that we actually are looking at an unknown vector xsi1_e of the form

x=(x0x1xn)

si23_e

This vector contains the unknown variables as its entries. We now design a matrix D such that we can rewrite our original system as

Dx=b

si24_e  (Eq. 25.3)

where bsi25_e is the vector notation of our right-hand side.

Inverse Matrix. Next we note that if we know the inverse matrix D−1 of D we could multiply both sides of Eq. 25.3 with it ending up with

D1Dx=D1bx=D1b

si26_e  (Eq. 25.4)

where we exploit the fact that a matrix multiplied with its inverse will result in the identity matrix

In=(10000100001000001)

si27_e

As you can see, Eq. 25.4 would yield the solution to the linear system immediately. The only thing we require is the inverse of the matrix D. As it turns out, it is not necessary for most cases to directly calculate D−1 as there are methods that can do without it. In the following we will introduce two very convenient methods: the Gauß-Jordan elimination and Cramer’s rule.

25.2.2 Gauß-Jordan Elimination

25.2.2.1 Introduction

The Gauß-Jordan elimination is a convenient algorithm that can be used to solve linear systems of equations. The algorithm constructs an identity matrix from the matrix D given by the system of linear equation. The identity matrix is constructed by repetitive multiplication and subtraction of lines of the matrix such that the columns are cleared one by one resulting in the identity matrix. During this process the solution vector xsi1_e is created as the right-hand side.

We will demonstrate the Gauß-Jordan algorithm using the example we already introduced. For this we write the system in matrix form as

(342731111)|(240)

si29_e

As a prerequisite, we require the first column in the first line to have an entry (not a 0). If there was a zero in the first line, we would swap this line with any other line that has a nonzero entry in the first column. In our example, we have 3 as the first entry in this column. We now proceed with the following steps.

25.2.2.2 Dividing the First Line by the First Column Value in the First Line

We now divide the first line by the first column entry resulting in

(14323731111)|(2340)

si30_e

25.2.2.3 Making the First Column Entries of the Other Lines Zero

In the next step we ensure that the other entries in the first column are zero. For this we first multiply the first line by the value in the first column of line i and subtract it from line i. This yields for the second line

(14323032831143111)|(2341430)

si31_e

We do the same for the third line

(143230328311430143123)|(23414323)

si32_e

After which we have the following system left

(14323010311301313)|(232323)

si33_e

25.2.2.4 Dividing the Second Line by the Second Column Value in the Second Line

Remember that we want to gradually transform the matrix to the identity matrix. We already succeeded in putting the value 1 in the first line where it needs to be for the identity matrix. We now proceed by doing the same for the second line. Therefore, we need to divide the second line by the entry in the second column because this is the location where we require a 1 in the second line. Doing so we find

(1432301111901313)|(2321923)

si34_e

25.2.2.5 Making the Second Column Entries of the Other Lines Zero

As before, we now proceed by making the other entries in the second column zero by first multiplying the second line with the value in the second column of line i and then subtracting it from line i. For the first line we find

(102311194301111901313)|(232194321923)

si35_e

which yields

(1021901111901313)|(101921923)

si36_e

We do the same for the third line finding

(102190111190013+131119)|(101921923+13219)

si37_e

which after simplification becomes

(10219011119001019)|(10192191219)

si38_e

25.2.2.6 Dividing the Third Line by the Third Column Value in the Third Line

As you can see, the algorithm is repetitive and is simply repeated for each additional line. For the third line we again divide the line by the value in the third column because this is where we require a 1. Doing this we find

(10219011119001)|(101921912191910)

si39_e

which is

(10219011119001)|(101921965)

si40_e

25.2.2.7 Making the Third Column Entries of the Other Lines Zero

Again we multiply the third line by the entry in the third column of line i and subtract it from line i. For the first line we find

(100011119001)|(10196521921965)

si41_e

which is

(100011119001)|(2521965)

si42_e

For the second line we find

(100010001)|(25219+65111965)

si43_e

which is

(100010001)|(254565)

si44_e

25.2.2.8 Solution

At this point we are finished. We have constructed the identity matrix on the left-hand side and are left with the solution vector on the right-hand side. We thus deduce that our system of linear equations is solved by

x0=25x1=45x2=65

si45_e  (Eq. 25.5)

25.2.3 Cramer’s Rule

25.2.3.1 Introduction

Cramer’s rule1 is another convenient method for solving linear systems of equations. For this we simply need to calculate the determinants of n matrices for solving n equations for n independent unknowns. The method works in the following way. We first write out the equation system in matrix form just as we have done for the Gauß-Jordan elimination (see section 25.2.2). For this system we calculate the determinant det D. We then substitute one by one the columns of the matrix by the right-hand side vector bsi25_e. Replacing the first column creates a new matrix D0 for which we calculate the determinant det D0. We then substitute the second column of D with the right-hand side vector bsi25_e obtaining the matrix D1 for which we calculate the determinant det D1. We repeat this process for all other columns obtaining Di and det D0 for each column i. Then we find the solution to the system of linear equations as

xi=detDidetD

si48_e  (Eq. 25.6)

As you can see, if the matrix can be inverted, i.e., if it has a nonzero determinant, this method is very easy to carry out. In the following section we will briefly rehearse the determinant of matrices before looking at an example.

25.2.3.2 Matrix Determinant

Several mathematical tricks on matrix calculus depend on determinants, which is why we will quickly review how they are calculated. In general, a determinant for a matrix only exists if the matrix has the same number of rows and lines, i.e., n = m. In the following we will quickly detail the determinant for the 1 × 1, 2 × 2 and 3 × 3 matrix.

1 × 1 Matrix. Here is a quick reminder of how to calculate the determinant of a matrix. The determinant of a 1 × 1 matrix is simply the only value entry of the matrix

detE=|e11|=e11

si49_e  (Eq. 25.7)

2 × 2 Matrix. The determinant of a 2 × 2 matrix is calculated in general as

detE=|e11e12e21e22|=e11e22e12e21

si50_e  (Eq. 25.8)

Think of this operation as crossing the matrix with a pen first from top left to bottom right and from top right to bottom left.

3 × 3 Matrix. The determinant of a 3 × 3 matrix is calculated as

detE=|e11e12e13e21e22e23e31e32e33|=e11e22e33+e12e23e31+e13e21e32e13e22e31e11e23e32e12e21e33

si51_e  (Eq. 25.9)

Think of this again as crossing the matrix with a pen. Whenever you write from top left to bottom right, the sign is positive. Whenever you draw from top right to bottom left, the sign is negative. Start with the top left corner and draw a line to the bottom right corner. Then move one entry to the right and repeat the process. Once this line is drawn, move again one entry to the right and repeat the process. You will hit the right corner of the matrix while doing so. In this case, simply reset the pen to the left edge of the matrix at the same height. Another possibility is to simply write out the same matrix again, just at the right side of the first. This will allow the line to continue without hitting the edge of the matrix. Once the three positive terms are found, set the pen to the top right entry and draw a line to the bottom left. This time, the sign is negative. Then move one entry to the right (which will bring you to entry e11) and repeat the process. Do this one more time and the determinant is found. This procedure is formally known as the rule of Sarrus.2

25.2.3.3 Example

We now apply Cramer’s rule to the example for which we have already calculated a solution using the Gauß-Jordan elimination. We start out with the following matrixD and right-hand side vectorbsi25_e

Matrix D.

D=(342731111)b=(240)

si53_e

We first calculate the determinant ofD according to Eq. 25.9 as

detD = (3 · 3 · 1) + (4 · 1 · 1) + (2 · 7 · 1) − (2 · 3 · 1) − (3 · 1 · 1) − (4 · 7 · 1) = −10

Matrix D0. We now substitute the first column by vector bsi25_e obtainingD0 as

D0=(242431011)

si55_e

For which we find the determinant

det D0 = (2 · 3 · 1) + (4 · 1 · 0) + (2 · 4 · 1) − (2 · 3 · 0) − (2 · 1 · 1) − (4 · 4 · 1) = −4

Matrix D1. We now substitute the first column by vector bsi25_e obtaining D1 as

D1=(322741101)

si57_e

For which we find the determinant

det D1 = (3 · 4 · 1) + (2 · 1 · 1) + (2 · 7 · 0) − (2 · 4 · 1) − (3 · 1 · 0) − (2 · 7 · 1) = −8

Matrix D2. We now substitute the first column by vector bsi25_e obtaining D2 as

D2=(342734110)

si59_e

For which we find the determinant

det D2 = (3 · 3 · 0) + (4 · 4 · 1) + (2 · 7 · 1) − (2 · 3 · 1) − (3 · 4 · 1) − (4 · 7 · 0) = 12

Solution. We can now write down the solution to our linear system of equation according to Eq. 25.6

x0=detD0detD=410=25x1=detD1detD=810=45x2=detD2detD=1210=65

si60_e

As you can see, these results are identical to the ones we found for the Gauß-Jordan method (see section 25.2.2.8).

25.2.4 Gauß-Jordan Inversion

25.2.4.1 Introduction

As we have seen, it is not necessary to directly calculate the inverse matrixD1 to solve a system of linear equation. However, in some cases matrices must be inverted and as it turns out, the Gauß-Jordan algorithm can be used for this task as well. While performing the Gauß-Jordan algorithm, we gradually transferred our original matrix to the identity matrix. On the way, we also rewrote the right-hand side which yielded our solution vector. Proceeding exactly identically we can also set the identity matrix on the right-hand side which would be transformed in the process into the inverse matrixD1. The concept is the same, we are only creating a different right-hand side.

25.2.4.2 Setup

We will demonstrate this procedure using the same system of linear equations we have worked on so far. We begin with the matrix on the left-hand side and the identity matrix on the right-hand side in which case we find

(342731111)|(100010001)

si61_e

25.2.4.3 Dividing the First Line by the First Column Value in the First Line

As before, we first divide the first line by the first column entry resulting in

(14323731111)|(1300010001)

si62_e

25.2.4.4 Making the First Column Entries of the Other Lines Zero

In the next step we ensure that the other entries in the first column are zero by multiplying the first line by the value in the first column of line i and subtracting it from line i

(14323032831143111)|(13007310001)

si63_e

The same is done for the third line

(14323019311301313)|(130073101301)

si64_e

As you can see, the left-hand side did not change compared to the process we performed during the Gauß-Jordan elimination. As we will see, the right-hand side will slowly develop into the inverse matrix we are seeking.

25.2.4.5 Clearing the Second Column

We now proceed by applying the same procedure as before to the second line by dividing it by the entry in the second column

(1432301111901313)|(130071931901301)

si65_e

As before we now proceed in making the other entries in the second column zero

(1021901111901313)|(134371943319071931901301)

si66_e

which yields

(1021901111901313)|(319419071931901301)

si67_e

We do the same for the third line finding

(10219011119001019)|(319419071931904191191)

si68_e

25.2.4.6 Clearing the Third Column

The last thing to do is to clear the third column for which we find

(10219011119001)|(31941907193190251101910)

si69_e

Again we multiply the third line by the entry in the third column of line i and subtract it from line i. For the first line we find

(100011119001)|(1515157193190251101910)

si70_e

For the second line we find

(100010001)|(151515351101110251101910)

si71_e

25.2.4.7 Solution

As this point we are done as we have found the inverse matrix D−1 ofD to be

D1=(151515351101110251101910)

si72_e

According to Eq. 25.4 we can obtain the solution vector by simply multiplying it with the right-hand side vector bsi25_e in which case we find

x=D1b=(151515351101110251101910)(240)=(152+154+1503521104111002521104+19100)=(254565)

si74_e

where we have derived the same solution as before (see section 25.2.2.8).

25.2.5 Summary

As you can see, there are several relevant schemes for solving linear systems of equations. All of these schemes are general, i.e., they can be used for a wide range of linear systems of equations. In the following section, we will study numerical schemes which can be used to solve these systems of equations.

25.3 Numerical Solutions to Linear Systems of Equations

25.3.1 Introduction

As we have seen, there are several methods which can be used to solve a linear system of equations analytically. However, in most cases, the analytical solutions are somewhat impractical to apply numerically directly. This is why several methods have been developed that transfer the analytical concepts discussed into numerical schemes, which can be easily implemented. In this section we will discuss the most important methods which are used for such applications.

25.3.2 Jacobi Method

25.3.2.1 Introduction

The first method we will study is generally referred to as the Jacobi1 method. This method requires a decomposition of A into a matrixD containing only the downwards-diagonal and a matrix R which takes up all the rest of the matrix. The general form is

A=(a11a12a1na21a22a2nan1an2ann)=(a11000a22000ann)+(0a12a1na210a2nan1an2an30)=D+R

si75_e  (Eq. 25.10)

We can then solve the linear system A xsi1_e = bsi25_e as

Ax=b(D+R)x=bDx=bRxx=D1(bRx)

si78_e

which is difficult to solve, as xsi1_e appears on both sides of the equation. This is why this scheme is turned into an iterative scheme according to

x(k+1)=D1(bRx(k))

si80_e  (Eq. 25.11)

where the value of x(k)si81_e of iteration k is used to calculate a better approximation for x(k+1)si82_e at iteration k+1.

25.3.2.2 Special Matrices

Before going into details on why these matrices are required, we will quickly introduce the specific types of matrices here.

Lower-Triangular Matrix. A lower-triangular matrix is a matrix which only has nonzero entries on the downwards-diagonal and below it

ALower-triangular=(a11a0a0a21a22a0an1an2ann)

si83_e

Strictly Lower-Triangular Matrix. A strictly lower-triangular matrix has zero entries on the downwards-diagonal and nonzero entries below it

Astrictly lower-triangular=(000a2100an1an20)

si84_e

Upper-Triagonal Matrix. Likewise, an upper-triangular matrix only has nonzero entries on the downwards-diagonal and above it

Aupper-triangular=(a11a12a1n0a22a2n00ann)

si85_e

Strictly Upper-Triangular Matrix. A strictly upper-triangular matrix has zero entries on the downwards-diagonal and nonzero entries above it

Astrictly upper-triangular=(0a12a1n00a2n000)

si86_e

Unit-Upper-Triangular Matrix. A unit-upper-triangular matrix is a matrix which has 1 as entries on the downwards-diagonal and nonzero entries above it

Aunit-upper-triangular=(1a12a1n01a2n001)

si87_e

Unit-Lower-Triangular Matrix. Likewise, a unit-lower-triangular matrix is a matrix which has 1 as all entries on the downwards-diagonal and nonzero entries below it

Aunit-lower-triangular=(100a2110an1an21)

si88_e

Diagonal Matrix. A diagonal matrix only has nonzero on the downwards-diagonal

Adiagonal=(a11000a22000ann)

si89_e

Tridiagonal Matrix. A tridiagonal matrix is a matrix which only has nonzero entries on the downwards-diagonal and in the columns left and right of the diagonal

Atridiagonal=(a11a1200a21a22a2300a32a330000ann)

si90_e

These matrices are especially relevant for simplified methods such as the Thomas algorithm (see section 25.3.8).

25.3.2.3 Iterative Scheme

If we carry out the matrix multiplication in the bracket of the right-hand side in Eq. 25.11 we find the following equation for row i

binjiaijx(k)j

si91_e  (Eq. 25.12)

Obviously, the entry dij of D−1 (the inverse ofD) in row i is simply given by

dii=1aii

si92_e  (Eq. 25.13)

as the matrix has no other entries. Using Eq. 25.12 and Eq. 25.13 we can rewrite Eq. 25.11 such that the entry xi(k+1)si93_e in line i of x(k+1)si82_e is given by

xi(k+1)=1aii(binjiaijx(k)j)

si95_e  (Eq. 25.14)

Eq. 25.14 is the iterative method used to calculate an approximation solution of xsi1_e. For each entry xi(k+1)si93_e it requires all values of the previous approximation x(k)isi98_ein row i. We therefore need to first write the new approximation for line i in an intermediary storage buffer which we can transfer to the actual matrix once the line is complete. We therefore require an intermediary buffer of size n.

25.3.2.4 Example

We will now use the Jacobi method for solving our original example (Eq. 25.2) given by

A=(342731111),b=(240)

si99_e

We start with the approximation

x(0)=(000)

si100_e

Applying Eq. 25.14 we find the following approximation for x(1)si101_e

x(1)=(13(2(4.0+2.0))13(4(7.0+1.0))11(0(1.0+2.0)))=(23430)

si102_e

In the second iteration we find

x(2)=(13(2(4.43+2.0))13(4(7.23+1.0))11(0(1.23+2.43)))=(109292)

si103_e

Likewise we find for the third iteration

x(3)=(13(2(4.292.2))13(4(7.1091.2))11(0(1.1091.29)))=(62271242743)

si104_e

Here we can already see that the algorithm deviates pretty far from the solution we expect. This is one of the characteristics of the more simple numerical solvers: They tend to become numerically unstable. The Jacobi method is especially sensitive for small values on the downwards-diagonal because these values result in very large values due to the division operation (see Eq. 25.14). The Jacobi method will fail immediately if one of these values is 0, because obviously a division by zero will cause an overflow. This is a problem which we will encounter for almost all algorithms we will discuss in this section.

Pivoting. As stated, divisions by zero as a consequence of a 0 on the downwards-diagonal of A will cause any algorithm to terminate due to a division by zero. There is one simple method by which these problems can be avoided, which is shuffling the order of the equations. Obviously, this is perfectly allowed as changing the order of a linear system of equations does not change the result. If we change the rows such that we collect the largest values on the downwards-diagonal, we avoid divisions by zero and we also increase the numerical stability. This method is referred to as pivoting. Pivoting is key to making numerical methods more stable. We will detail this process in section 25.3.5.8 when discussing the LU-decomposition which is one of the most commonly used methods for solving linear systems of equations.

25.3.2.5 Maple Worksheet

Before studying the methods by which we can increase the numerical stability, we will first implement the Jacobi method as a Maple worksheet in order to avoid redoing the calculation by hand. The function DoJacobi shown in listing 25.1 implements this method. It accepts the matrix m and the right-hand side b, as well as the initial guess for x(0)si105_e and a maximum number of iterations kmax as parameters. The function first determines the rank of the matrix given (see line 4) before creating vectors for x(k)si81_e (termed xcurrent) and x(k+1)si82_e (termed xnext). xcurrent is primed with the initial values given to the function. The function then performs the required number of iterations (see line 10) first creating the sum required for Eq. 25.14 in line 15. Eq. 25.14 is applied in line 18 for finding the row i in the new approximation x(k+1)si82_e. Once this vector is calculated completely, the data is copied into xcurrent for the next approximation (see line 21).

Listing 25.1

[Maple] Listing for the Jacobi method. A digital version of the listing can be found under the name JacobiMethod.mw in the supplementary material.

1  restart:read "Core.txt":
2
3  DoJacobi:=proc(m::Matrix,b::Vector,x0::Vector,kmax:=10) local i,j,k,xcurrent,xnext,summed,n:
4   n:=Size(m,1):
5   xcurrent:=Vector(1..n):
6   for i to n do
7    xcurrent[i]:=x0[i]:
8   end do:
9   xnext:=Vector(1..n):
10  for k to kmax do
11   for i to n do
12    summed:=0:
13    for j to n do
14     if i<>j then
15      summed:=summed+m[i,j]*xcurrent[j]
16     end if:
17    end do:
18    xnext[i]:=(b[i]-summed)/(m[i,i]):
19   end do:
20   for i to n do
21     xcurrent[i]:=evalf(xnext[i]):
22   end do:
23  end do:
24  return xcurrent:
25 end proc:
26
27 m:=Matrix([[3,4,2],[7,3,1],[1,1,1]]);b:=Vector([2,4,0]);
28 DoJacobi(m,b,Vector([0,0,0]));
29
30 #swap rows
31 m:=Matrix([[7,3,1],[3,4,2],[1,1,1]]);b:=Vector([4,2,0]);
32 DoJacobi(m,b,Vector([0,0,0]));
33
34 #make all entries on the downwards-diagonal large
35 m:=Matrix([[7,3,1],[3,4,2],[1,1,10]]);b:=Vector([4,2,0]);
36 DoJacobi(m,b,Vector([0,0,0]));

The function returns with the found approximation for xsi1_e. Line 27 outputs the example for which we just calculated the first iterations by hand. After 10 iterations, the output will be something like

x=(737.2889973804.6078680702.4456638)

si110_e

which is obviously wrong as we know from our previous calculations (see Eq. 25.5).

25.3.2.6 Increasing the Numerical Stability

As already stated, we can increase the numerical stability of the Jacobi method by putting the largest values on the downwards-diagonal. In our original problem, we can achieve this by switching lines one and two. This is shown in listing 25.1, line 31. The output of this line is

x=(0.031534596000.077555692002.127796802)

si111_e

which obviously is still wrong, but somewhat closer to the correct solution. The problem is that the third entry on the downwards-diagonal matrix is 1 and therefore still a reasonable small value. If we change this value to 10 (see listing 25.1, line 35) we end up with

Microfluidics: Modeling, Mechanics, and Mathematics

x=(0.51492753540.13925432220.06771201379)

si112_e

which is obviously a different solution as we changed the input equations by modifying the matrix. However, this result is very close to the correct solution of this (modified) problem, which is

x=(0.51933701660.14364640880.06629834254)

si113_e

and reasonably close to the solution we found using the Jacobi method.

25.3.2.7 Summary

In this section we have studied the Jacobi method, which is one of the most simple methods for solving linear systems of equations. The simplicity of the method comes with some severe disadvantages, such as low numerical stability and (in many cases) inexact solutions especially if the entries on the downwards-diagonal are small. We have seen that in cases where these values are reasonably large, the method yields close-to-correct results. Obviously, a method that requires us to change our equations in order to yield a solution at all is not useful. However, there are several modifications of the Jacobi method that make the scheme significantly more exact.

25.3.3 Gauß-Seidel Method

25.3.3.1 Introduction

The next method we will study is generally referred to as the Gauß-Seidel method which was originally developed by von Seidel1 in 1874 [9] following concepts introduced by Gauß.It is a modification of the classical Jacobi method. As we have seen in Eq. 25.14 the Jacobi method calculates all values xij(k+1)si114_e using the values of the previous xij(k)si115_e approximations xij(k)si116_e which is why we wrote these values into a buffer first (see listing 25.1, line 18) and copied them into the current estimation xi(k+1)si93_e only after all values were determined (see listing 25.1, line 21).

The Gauß-Seidel method uses the values xij(k+1)si114_e calculated for j < i for determining the value xij(k+1)si114_e of the values xij(k)si116_e. Algorithmically this means that we can directly replace the old approximations with the new approximations. The general formula is thus given by

xi(k+1)=1aii(bii1j=1aijx(k+1)inj=i+1aijxj(k))

si121_e  (Eq. 25.15)

As we will see, this is not only easier to implement in code, but also results in a sufficiently more stable numerical scheme.

25.3.3.2 Maple Worksheet

As the procedure is the same as for the Jacobi method, we will not repeat the manual example but rather implement it in code directly. The function DoGaussSeidel implements the Gauß-Seidel method (see Eq. 25.15). As you can see, the implementation is very similar to listing 25.1. However, for Gauß-Seidel we do not require an additional buffer, but store the improved values of the approximation for x(k+1)si82_e directly in xcurrent (see line 17).

25.3.3.3 Example

Let us now compare the output of the Gauß-Seidel method in comparison to the Jacobi method using the same example. Listing 25.2, line 23 again tries to find a solution to the unmodified system of equations. The output we receive after 10 iterations is

Listing 25.2

[Maple] Listing for the Gauß-Seidel method. This listing is part of the listing GaussSeidelSOR.mw, which can be found in the supplementary material.

1  restart:read "Core.txt":
2
3  DoGaussSeidel:=proc(m::Matrix,b::Vector,x0::Vector,kmax:=10) local i,j,k,xcurrent,summed,n:
4   n:=Size(m,1):
5   xcurrent:=Vector(1..n):
6   for i to n do
7    xcurrent[i]:=x0[i]:
8   end do:
9   for k to kmax do
10    for i to n do
11     summed:=0:
12     for j to n do
13      if i<>j then
14       summed:=summed+m[i,j]*xcurrent[j]
15      end if:
16     end do:
17     xcurrent[i]:=(b[i]-summed)/(m[i,i]):
18    end do:
19   end do:
20   return xcurrent:
21 end proc:
22
23 m:=Matrix([[3,4,2],[7,3,1],[1,1,1]]);b:=Vector([2,4,0]);
24 DoGaussSeidel(m,b,Vector([0,0,0]));
25
26 #swap rows
27 m:=Matrix([[7,3,1],[3,4,2],[1,1,1]]);b:=Vector([4,2,0]);
28 DoGaussSeidel(m,b,Vector([0,0,0]));

x=(868.89982202215.4245921346.524770)

si123_e

which is obviously still wrong. This is the output created by listing 25.2, line 23. Just as the Jacobi method, the Gauß-Seidel method tends to become numerically unstable if the largest values are not clustered on the

downwards-diagonal. We can improve this situation if we switch the first and the second row as we did before for the Jacobi method. If we run this second example in listing 25.2, line 27 we obtain the output

x=(0.39954082940.80093577201.200476601)

si124_e

which is the close-to-correct result (compare with Eq. 25.5). Even though the Gauß-Seidel method failed on the original system of equations, it gives us the correct result if we reorder the equations such that the largest values are collected on the downwards-diagonal. As already stated, this process is referred to as pivoting and is a crucial step in finding solutions to linear systems of equations. If we implemented pivoting the Gauß-Seidel method it is very likely to be a good candidate for solving a linear system of equations.

25.3.3.4 Summary

As we have seen, the slight modification to the Jacobi method which resulted in the Gauß-Seidel method increases the numerical stability and allowed us to find (actually for the first time) a numerical solution to our original linear system of equations. On the downside, we had to reorder the equations in order to achieve numerical stability.

25.3.4 Successive Over-Relaxation

25.3.4.1 Introduction

In the next step, we will discuss a modification to the original Gauß-Seidel method which is referred to as SOR (successive over-relaxation). The concept of over-relaxation is used very frequently in numerics. Whenever a new approximation x(k+1)si125_e is calculated by a numerical scheme from an old approximation x(k)si126_e we usually tend to take the new value as the “better” approximation. However, in many cases, this is not necessarily so. Whenever a numerical scheme operates very close to the limit of stability, the new values may be a step in the right direction toward the actual solution. However, in many cases, these steps are too big. This may result in a numerical schemes becoming unstable almost immediately. One way around this problem is to calculate the next

approximation to be somewhere in between x(k)si126_e and x(k+1)si125_e. The fundamental equation for SOR applied to the Gauß-Seidel method is given by

xi(k+1)=(1w)x(k)iwaii(bii1j=1aijx(k+1)jnj=i+1aijxj(k))

si129_e  (Eq. 25.16)

where we introduced the relaxation parameter w with 0 ≤ ω ≤ 1. If you inspect Eq. 25.16, you can see that this parameters is nothing other than a percentage value weighting the contributions of the last approximation x(k+1)si125_eagainst the new approximation x(k)si126_e. For ω = 0 we would obtain no contribution of the new approximation x(k+1)si82_e and would remain at the last approximation x(k)si126_e. For ω = 1 we obtain the formula for Gauß-Seidel (see Eq. 25.15). Here, all contributions from the last approximation x(k)si126_e are neglected. It can be shown that stable solutions can be derived from this scheme for all values 0 ≤ ω ≤ 2 which is the usual bound accepted by most numerical solvers for SOR.

25.3.4.2 Maple Worksheet

At this point we will modify the implementation of the Gauß-Seidel method to obtain the SOR scheme. The listing is shown in listing 25.3. As you can see, the implementation is pretty much identical, but for the introduction of the additional parameter omega that accepts the relaxation parameter and the adaption of line 15 which implements Eq. 25.16.

Listing 25.3

[Maple] Listing for the SOR method. This listing is part of the listing GaussSeidelSOR.mw, which can be found in the supplementary material.

1  DoSOR:=proc(mml::Matrix,b::Vector,x0::Vector,omega,kmax:=10) local i,j,k,xcurrent,summed,n:
2   n:=Size(m,1):
3   xcurrent:=Vector(1..n):
4   for i to n do
5     xcurrent[i]:=x0[i]:
6   end do:
7   for k to kmax do
8     for i to n do
9       summed:=0:
10      for j to n do
11        if i<>j then
12          summed:=summed+m[i,j]*xcurrent[j]
13        end if:
14      end do:
15      xcurrent[i]:=(1-omega)*xcurrent[i]+omega*evalf((b[i]-summed)/(m[i,i])):
16     end do:
17   end do:
18   return xcurrent:
19  end proc:
20
21  mml:=Matrix([[3,4,2],[7,3,1],[1,1,1]],0.4);b:=Vector([2,4,0]);
22  DoSOR(m,b,Vector([0,0,0]));
23
24  #swap rows
25  mml:=Matrix([[7,3,1],[3,4,2],[1,1,1]],0.4,50);b:=Vector([4,2,0]);
26  DoSOR(m,b,Vector([0,0,0]));

25.3.4.3 Example

If we now run our original example (see line 21) we will obtain the following output

x=(0.42904488840.69279089811.087225019)

si135_e

which is very close to the actual solution. However, if you modify ω or increase the number of iterations, you will see that the solver is still very instable. Therefore, guessing a good value for ω can be very tedious. However, if we run the example where we switched row one and two (see line 25), you will see that we obtain the correct solution for any value of ω, although the required number of iterations (here arbitrarily set to 50) differs. This is one important aspect of SOR methods: The choice of ω significantly influences the convergence and thus the performance of the solver. In fact, for many problems there are means of guessing a good first value for ω, but very often it is simply a question of trial-and-error. If a solver does not converge for a given ω0 the value should be modified and the solver checked for convergence until a suitable value is found.

Export to CoreFunctions.txt. As this point, we will export the function DoSOR into CoreFunctions.txt. SOR is a very handy method which we will use later on to solve some of the more difficult problems of fluid mechanics.

25.3.4.4 Pivoting

As we have seen in the previous example, we were able to obtain a stable and correct solution after swapping the first and the second rows. The reason why this yields more stable results is due to the fact that we collected the largest values on the downwards-diagonal of our matrix. As this value forms the denominator in the division operation in Eq. 25.16, we keep the increment values small. If the values on the downwards-diagonal are very small, the increments become large and the solver becomes unstable.

An Even More Challenging Problem. To further illustrate this point, let us assume a slight modification of our original problem given by

A=(042731111)

si136_e  (Eq. 25.17)

If we try to solve this example using our function DoSOR, we would receive an error as a division by zero will be encountered. In fact, the algorithm will fail immediately if any number of the downwards-diagonal is zero. If there is a very small value on the downwards-diagonal, the algorithm may (as we have seen) become unstable. There is one obvious way of turning Eq. 25.17 into a solvable problem by simply swapping the rows such that the 0 is moved off the diagonal. This process is referred to as pivoting.

Implementing Pivoting. As stated, we could very easily turn Eq. 25.17 into a solvable problem by simply swapping the first row with any other row, thus moving the zero away from the downwards-diagonal. In general, the algorithm will be most stable if the largest values are collected on the diagonal. Please note that we are perfectly allowed to make these swapping operations as long as we also swap the solution vector xsi1_e and the right-hand side vector bsi25_e of our linear system. In these cases, all we really do is change the order of the system of equations. This will obviously not change the solution.

We require an algorithmic implementation of this “row-swapping.” This process is generally referred to as pivoting. Pivoting is applied during the processing of each row in our algorithm. In row i we look for the largest value of all entries (l, i) in column i with i ≤ l ≤ n. We determine this value to be in column l = r. This element is referred to as the pivot. We would then swap the content of row i with row r thus placing the largest value we can find in column i into row i where it would then be located on the diagonal. We then process as previously. At the beginning of the next row, we perform a new pivoting operation and so on. Obviously, we need to document the pivoting, i.e., row-swapping because we need to apply the same operations to xsi1_e and bsi25_e.

Partial and Full Pivoting. If we proceed using the described scheme, we only determine the pivot element looking at the entries in the current column i. This process is referred to as partial pivoting. We could obviously also look into all subsequent columns i < l ≤ n and repeat the pivoting there. This process is then referred to as full pivoting. Obviously, full pivoting is computationally more expensive and (in most cases) not necessary, as partial pivoting usually gives sufficient stability.

Implementing Pivoting. We will now reimplement our function DoSOR in a way that it supports pivoting. The listing is shown in listing 25.4. We implement the pivoting using pointers instead of swapping complete rows. For this we introduce the pivoting vector piv. You can think of this vector as being the row order of the matrix. It is first initialized to the sequence 1, 2, 3, i.e., the array is ordered. As the pivoting progresses, the array is shuffled. Instead of moving the entire blocks of data (which is computationally more expensive) we merely shuffle the order of the pivoting vector. As an example, if we swapped rows one and two, the pivoting vector would be changed from (1, 2, 3) to (2, 1, 3), meaning that the original first row of the matrix is now at the second row and the original second row is now first. By using such an index pointer, we only have to switch the order of these numbers. If we return the results in the order given originally, we do not necessarily have to return the pivoting vector to the user.

Listing 25.4

[Maple] Listing for the SOR method with pivoting. This listing is part of the listing GaussSeidelSOR.mw, which can be found in the supplementary material.

1 DoSORPivot:=proc(mml::Matrix,b::Vector,x0::Vector,omega,kmax:=10) local i,j,k,l,xcurrent,summed, store,n,piv:
2   n:=Size(m,1):
3   xcurrent:=Vector(1..n):
4   for i to n do
5     xcurrent[i]:=x0[i]:
6   end do:
7   piv:=Vector(1..n):
8   for i to n do
9     piv[i]:=i:
10  end do:
11  for k to kmax do
12    for i to n do
13      for l from i+1 to n do
14        if abs(m[piv[l],i])>abs(m[piv[i],i]) then
15          store:=piv[l]:
16          piv[l]:=piv[i]:
17          piv[i]:=store:
18        end if:
19      end do:
20      summed:=0:
21      for j to n do
22        if i<>j then
23          summed:=summed+m[piv[i],j]*xcurrent[j]
24        end if:
25      end do:
26      xcurrent[i]:=(1-omega)*xcurrent[i]+omega*evalf((b[piv[i]]-summed)/(m[piv[i],i])):
27    end do:
28   end do:
29   return piv,x
30 end proc:
31
32 mml:=Matrix([[3,4,2],[7,3,1],[1,1,1]]);b:=Vector([2,4,0]);
33 DoSORPivot(m,b,Vector([0,0,0]),0.6,50);
34
35 mml:=Matrix([[0,4,2],[7,0,1],[1,1,1]]);
36 DoSORPivot(m,b,Vector([0,0,0]),0.6,50);

In line 7 the vector is first initialized to the sequence (1, 2, . . . , n). We then proceed by working line-by-line as in listing 25.3. You will see that the listing is the same except for the fact that instead of using direct access to the respective row in the matrix m we now use the entry i of the pivoting vector piv when querying line i. The actual pivoting operation is performed first in the line loop starting from line 12. The comparison operation is performed in line 14. If the absolute value of the entry (l, i) (accessed via the pivoting vector as there may have been a previous pivoting operation) is bigger than the entry (i, i), we swap the two rows. We proceed by checking this for all rows i < l ≤ n for row i thus implementing partial pivoting.

The rest of the listing is analogous to listing 25.3. Please note that every time we have to access row i of the matrix, we have to do so via the pivoting vector as can be seen in line 23 and line 26. We also have to access the entries of b using the pivoting vector. However, the order of xcurrent is kept, therefore the pivoting operation will not be visible to the user.

If we run our example (see line 32), we will see that SORPivot now returns a stable and correct solution. If we output the pivoting vector using a print command, we can see that the algorithm changed the order of the rows ending up with the sequence 2, 1, 3, i.e., it swapped the first and the second lines, thus putting the largest values on the diagonal. We can put our function to an even more severe test by running our second example (see line 35). You will see that the function reports the result

x=(123)

si141_e

which is the correct result. Pivoting allowed us to solve this problem which would have crashed our original implementation of SOR immediately due to a zero on the downwards-diagonal.

25.3.4.5 Summary

In this section we have introduce SOR which is a modification of the classical Gauß-Seidel method. In SOR the next approximation to the solution is derived from both the original approximation x(k)si81_e and the approximation x(k+1)si82_e found from the Gauß-Seidel method. The relaxation parameter ω is used to calculate a weighted average of the two values. In many cases, SOR will allow a solver to find a solution where it previously failed due to instability. However, we have seen that this modification still has issues for the known problems of having large values off the downwards-diagonal and zero values on the downwards-diagonal. We therefore implemented partial pivoting which allows swapping rows such that the large values are moved onto and the zero values off the downwards-diagonal. Using pivoting we were not only able to find a stable solution to the problem for which the regular SOR implementation failed, but also to a problem which included a zero on the downwards-diagonal. In general, it is necessary to implement pivoting as sparsely populated matrices are the most common ones encountered, thus there is a high chance of ending up with a zero somewhere on the downwards-diagonal. Pivoting comes with little computational overhead and ensures stability in almost all cases.

25.3.5 LU-Decomposition

25.3.5.1 Introduction

In this section we will discuss the LU-decomposition method. This method was originally developed by Turing1 in 1948 [10]. As the name suggests, this scheme requires the decomposition of the matrix A into two matrices with given properties. We have already performed a somewhat simple matrix decomposition in Eq. 25.10 when studying the Jacobi method.

In LU-decomposition, the matrixD is decomposed into a unit-lower-triangular matrix L (therefore lii = 1) and an upper triangular matrix U which (when multiplied) form the matrix A according to

A=(a11a12a1na21a22a2na3na3na3n)=(100l2110ln1ln21)(u11u12u1n0u22u2n00unn)=LU

si144_e

The LU-decomposition is helpful because it allows solving a system of linear equations as a two-set problem transferring

Ax=bL(Ux)=b

si145_e

into the two equations

Ly=b

si146_e  (Eq. 25.18)

Ux=y

si147_e  (Eq. 25.19)

Forward-Substitution. Now why is this advantageous? If we look at the way L looks, we can rewrite Eq. 25.18 as

Ly=(100l2110ln1ln21)(y1y2y3)=(b1b2b3)=b

si148_e

from which we can derive the system of equations

y1=b1l21y1+y2=b2nk=1lnkyn=bn

si149_e

which we can solve line by line. Once we find yi we can insert this into the next line i + 1 thus solving yi+1 and so forth. This is why this process is referred to as forward-substitution. We thus find the general solution to yi as

nk=1lnkyn=bn

si150_e  (Eq. 25.20)

Back-Substitution. Once ysi151_e has been determined, we can use Eq. 25.19 to calculate xsi1_e as

Ux=(u11u12u1n0u22u2n00unn)(x1x2x3)=(y1y2y3)=y

si153_e

from which we can derive the following system of equations

u11x1+u12x2+u1nxn=y1u22x2+u2nxny2unnxnyn

si154_e

which we solve line by line starting from the last line moving on to the first. This is why this scheme is referred to as back-substitution or backwards-substitution. We thus find the general solution as

xi=1uii(yinki+1uikxk)

si155_e  (Eq. 25.21)

“Reusable” Solutions. As you can see, neither Eq. 25.20 nor Eq. 25.21 is difficult to implement algorithmically. However, one major advantage of this scheme is the fact that once L and U of a problem described by A have been found, Eq. 25.20 and Eq. 25.21 can be applied directly. This means that the same problem can be solved for different right-hand side vectors bsi25_e and the solutions can be obtained independently, i.e., in parallel. This effectively allows “reusing” the solution algorithmically.

25.3.5.2 Derivation

Obviously, we have to find the unknown constants l and u of our matrices L and U in order to be able to use Eq. 25.20 and Eq. 25.21. We will do this derivation on a matrix of rank 3, but the concept can be extended to any quadratic matrix. We find

D=(a11a12a13a21a22a23a31a23a33)=(100l2111l31l321)(u11u12u130u22u2300u33)=LU

si157_e  (Eq. 25.22)

Obviously, we need to determine the missing constants. If we perform the matrix multiplication L U in Eq. 25.22 we find

D=(a11a12a13a21a22a23a31a23a33)=(u11u12u13u11l21u12l21+u22u13l21+u23u11l31+u12l32u12l31+u22l32u13l31+u23l33)

si158_e  (Eq. 25.23)

From Eq. 25.23 we can directly derive

u11=a11u12=a12u13=a13l21=a21u21u22=a22u12l21u23=a23u12l21l31=1u11(a31u12l32)

si159_e  (Eq. 25.24)

l32=1u22(a32u12l31)

si160_e  (Eq. 25.25)

u33=a33u13l31+u23l32

si161_e  (Eq. 25.26)

As you can see, we proceed in a line-by-line fashion. The way we decomposed the matrix allows us to directly note down the entries of the upper-triangular matrix. We then proceed by finding the first entry on the second line, which is the entry of the lower-triangular matrix. Once we have this value, we can calculate the remaining two entries of the second line which are the two entries of the upper-triangular matrix. In the third line we first calculate the last two missing entries of the lower-triangular matrix before finding the last missing value of the upper triangular matrix. Obviously, there is a certain pattern involved here.

25.3.5.3 Calculation Pattern for uij

When inspecting Eq. 25.24 and Eq. 25.25, we see that there is a certain pattern for calculating uij which is given by

uij=aiji1k=1ukjlik

si162_e  (Eq. 25.27)

Analytical Derivation. This equation can also be derived analytically. If writing out the matrix multiplication we find that

aij=nk=1likukj

si163_e  (Eq. 25.28)

where we use i as the line index and j as the column index. Now as U is an upper-triangular matrix, uij = 0 for all j < i. Therefore, we can simplify Eq. 25.28 to

aij=ik=1likukj

si164_e  (Eq. 25.29)

In the second step we can also pull out the downwards-diagonal entries of U, i.e., all entries k = i from the sum by noting that lii = 1 as L is a unit-lower-triangular matrix with entries of 1 on the downward-diagonal. Therefore we rewrite Eq. 25.29 to

aij=i1k=1likukj+liiuij=i1k=1likukj+uij

si165_e  (Eq. 25.30)

which we can rewrite to

uij=aiji1k=1likukj=aiji1k=1ukjlik

si166_e

which is exactly Eq. 25.27.

25.3.5.4 Calculation Pattern for lij

Likewise, when inspecting Eq. 25.26 we see that the pattern for calculating lij is

lij=1ujj(aiji1k=1ukjlik)

si167_e  (Eq. 25.31)

Again, we want to verify this formula analytically.

Analytical Derivation. Returning to Eq. 25.28 we note that lij = 0 for all j > i as L is a unit-lower-triangular matrix. In this case we can rewrite Eq. 25.28 to

aij=jk=1likukj

si168_e  (Eq. 25.32)

Again, we pull the values on the downwards-diagonal out of the sum in which case (Eq. 25.32) becomes

aij=i1k=1likukj+lijujj

si169_e

which we can rewrite to

lij=1uij(aiji1k=1likukj)=1uij(aiji1k=1ukjlik)

si170_e

which is exactly Eq. 25.31.

Comparison. If we compare the scheme of the LU-decomposition with the other methods for solving linear systems of equations we have studied so far, we will note one important difference: The LU-decomposition is not an iterative scheme. In fact, once we performed the decomposition, we are able to calculate the solution exactly. Obviously, this comes at the disadvantage of more complex calculations. Iterative schemes are useful whenever a “sufficiently exact” solution is required. The LU-decomposition gives the exact solution but it is computationally more expensive.

25.3.5.5 Procedure

Now that we have the equations for calculating the entries lij and uij, we can start implementing the method. However, returning to our example from which we derived Eq. 25.31 and Eq. 25.27 we note one important point: The entries of the matrices L and U are calculated interdependently. This is reflected by the loop with the index k in the equations. For a given line i we first calculate all values of lij with 1 ≤ j < i. After these values have been calculated, we proceed by calculating all values uij with i ≤ j ≤ n with n being the number of rows/columns of the matrix which is referred to as the rank. So in the first line (i = 1), we had no value lij to calculate but three values uij. In the second line (i = 2) we first calculated one value lij (l21) before calculating two values of uij (u22 and u23) and so on.

A Manual Example. We will now quickly demonstrate a manual example of this algorithm before implementing it in code. We perform the LU-decomposition on the example with which we have worked so far, taking the matrix from Eq. 25.2

A=(342731111)=(100l2110l31l321)(u11u12u130u22u2300u33)=LU

si171_e

We begin with the first row i = 1. For i = 1 there are no values of lij to calculate and we can immediately turn to finding the values of uij. As we have no lines above the current line (as we are operating on the first line) we can simply copy down the first row of values for U which are

u11=a11=3u12=a12=4u13=a13=1

si172_e

We then proceed to the second line. Here we first need to find the value of l21 according to Eq. 25.31

Ul21=1u11(a210k=1uk1l2k)=a21u11=73

si173_e

Likewise we find the values for u22 and u23 according to Eq. 25.27 as

u22=a221k=1uk2l2k=a22u12=3473=193u23=a231k=1uk3l2k=a23u21=1273=113

si174_e

which completes the second row to

A=(342731111)=(1007310l31l321)(342019311300u33)=LU

si175_e

In the last row we first proceed by finding l31 and l32 according to Eq. 25.31

l31=1u1(a310k=1uk1l3k)=a31u11=13l32=1u22(a321k=1uk2l3k)=a32u12l31u22=119

si176_e

Likewise we find the last value u33 according to Eq. 25.27 as

u33=a332k=1uk3l3k=a33(u13l31+u23l32)=1(23113)=1019

si177_e

which gives the final result

A=(342731111)=(1007310131191)(3420193113001019)=LU

si178_e  (Eq. 25.33)

Obviously, carrying out this example by hand is somewhat tedious, which is why we will now implement it as an algorithm.

25.3.5.6 Maple Worksheet

At this point we will write a small Maple worksheet that carries out the LU-decomposition for us. The listing is shown in listing 25.5. The LU-decomposition is implemented by the function DoLUDecomposition. It expects the matrix to decompose as the sole parameter m. Please note that this matrix must be a square matrix. First the function determines the rank of the matrix (see line 4) and initializes the lower-unit-triangular matrix L with ones along the downwards-diagonal (see line 5). The upper-triangular matrix U is created empty (see line 9). The function then proceeds in a row-by-row fashion using the row index i (see line 10).

Listing 25.5

[Maple] Listing for LU-decomposition. This listing is part of the listing LUDecomposition.mw, which can be found in the supplementary material.

1 restart:read "Core.txt":
2
3 DoLUDecomposition:=proc(m::Matrix) local n,i,j,k,summed,L,U:
4   n:=Size(m,1):
5   L:=Matrix(n,n):
6   for i to n do
7    L[i,i]:=1:
8   end do:
9   U:=Matrix(n,n):
10   for i to n do
11    #calculate l[ij]
12    for j from 1 to i-1 do
13     summed:=0:
14     for k from 1 to j-1 do
15      summed:=summed+U[k,j]*L[i,k]:
16     end do:
17     L[i,j]:=1/U[j,j]*(m[i,j]-summed):
18   end do:
19   #calculate u[ij]
20   for j from i to n do
21    summed:=0:
22    for k from 1 to i-1 do
23        summed:=summed+U[k,j]*L[i,k]:
24    end do:
25    U[i,j]:=m[i,j]-summed:
26   end do:
27  end do:
28  return L,U;
29 end proc:
30
31 m:=Matrix([[3,4,2],[7,3,1],[1,1,1]]);
32 DoLUDecomposition(m);

The first values calculated are the values lij for which the columns 1 ≤ j ≤ i − 1 in line 12 and the rows 1 ≤ k ≤ j − 1 in line 14 are iterated. The sum value summed is created this way in line 15. This value is then used to calculate lij in line 17 according to Eq. 25.31.

Once lij is determined, the function calculates uij in a likewise fashion. Here, the columns are iterated for i = j ≤ n in line 20 and the rows for 1 ≤ k ≤ i − 1 in line 22. Again the sum value summed is calculated which is required for uij calculated in line 25 according to Eq. 25.27.

Having iterated all rows, the function returns the matrices L and U in line 28. Line 31 uses the function for decomposing our example matrix creating the output we already found by hand.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset