Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Solving Nonlinear Equations

Finding a solution for an equation of the form f(x) = 0 is a frequently-encountered problem. It is not as easy to find roots of this equation as it might at first appear. Exact solutions are only possible if f(x) is a polynomial of degree n ≤ 9. By “exact solution,” we mean there is some procedure to calculate a root by using the parameters of the equation (as in the equation ax² + bx + c = 0). In what follows, the term “root of an equation f(x) = 0” will be used interchangeably with “zero of f(x)”.

The root finding procedure is based on an iterative process or, in other words, on a process of successive approximations. An iterative process has four stages:

1. Root localization – that is, finding an interval which contains the only root.

2. Choosing an initial value, x₀, or, in other words, letting x₀ be an approximate value of x*.

3. Employing some procedure k times to obtain a set of approximations x_k, starting from x₀.

4. Finding out how close every approximation, x_k, is to the exact solution (in terms of prescribed accuracy ε_p). If some approximation is in the ε_p-neighborhood of x*, then the iterative process is complete.

The last point will be satisfied only if:

that is, approximations x_k converge to x*. That is why a great deal of attention will be given to conditions wherein an iterative process converges.

7.1 Calculation of Roots with the use of Iterative Functions

In what follows, we will consider another approach to calculate roots. An equation f(x) = 0 may be represented in the following equivalent form:

(7.1)

where g(x) is an iterative function. Thus, f(x*) = 0 is consistent with x* = g(x*). It turns out that form (7.1) is more appropriate for the construction of iterative algorithms. An iterative process is constructed in the following manner: let us set the value of x₀ and the succeeding approximations will be calculated by the formula

(7.2)

Figure 7.1 shows the geometrical interpretation of the convergent iterative process (7.2).

Figure 7.1 Successive approximations generated by iterative scheme (7.2): a) g′(x) > 0 and b) g′(x) < 0.

The following theorem explains conditions which provide convergence of the iterative process (7.2).

Theorem 7.1

Let I = [a, b] be a closed, bounded interval, and let g(x) be a function from I to I. In addition, g(x) satisfies the Lipschitz condition

for arbitrary points y and z in I. Let x₀ ∈ I and x_k + l = g(x_k). Then the sequence {x_k} converges to the unique solution x * ∈ I of the equation x = g(x).

(7.3)

This gives us a bound on the error of the kth approximant. It should be noted that if L is close to one, convergence is very slow. So, a good part of our effort will be concentrated on how to construct an iterative process which converges rapidly.

There are a lot of methods for solving an equation f(x) = 0. We will consider only basic approaches. The iterative processes to be considered may be categorized as follows:

• One-point iteration:

(7.4)

• Multi-point iteration:

(7.5)

where β₁(x), …, β_n(x) are some functions.

• One-point iteration with memory:

(7.6)

Now, the term “rapid convergence” should be explained in more detail. To do this, we take the iterative process (7.4) as an example. To begin with, let us introduce the error of kth approximation as e_k = x_k − x *, then x_k = x * + e_k and x_k + 1 = x * + e_k + 1. After substituting these expressions in the iterative process (7.4), we can expand g(x* + e_k) in terms of e_k in the neighborhood of x* (assuming that all derivatives of g(x) up to order p are continuous):

Taking into account that x* = g(x*) we arrive at

(7.7)

If g⁽ⁿ⁾(x*) = 0 for n = 1, …, p − 1 and g^(p)(x *) ≠ 0, then an iterative function, g, has an order, p, and thus it follows immediately from equation (7.7) that

(7.8)

where p is called the order of the sequence {x_k} and C is called the asymptotic error constant. When p = 1, an iterative sequence has linear convergence to x*, while with p > 1, convergence is superlinear. It is clear that if e_k is sufficiently small, then with p > 1 an iterative sequence may converge very rapidly.

Next, we need to discuss how to estimate how closely the kth approximation approaches the exact solution x*. To do this, we can use inequality (7.3). If x_k ∈ [x * − ε_p, x * + ε_p], then the following conditions are fulfilled for absolute and relative error estimation, respectively:

(7.9)

In addition to conditions (7.9), usually another condition must be satisfied

This condition follows from the formulation of the problem. Therefore, an iterative process is finished when one of those conditions is fulfilled.

7.1.1 One-point iterative processes

To begin with, we will consider the case when g′(x *) ≠ 0. Then, as it follows from equation (7.7) that e_k + 1 = g′(x *)e_k and with the proviso that

(7.10)

convergence is linear. In practice, one cannot use condition (7.10), but we can require that the following condition be satisfied:

(7.11)

In this case, theoretical condition (7.10) is also satisfied. The simplest way to represent a source equation in form (7.1) is as follows:

(7.12)

and the associated iterative scheme is

(7.13)

This is the method of fixed-point iteration. In order that condition (7.11) is satisfied at every point in I, we should determine parameter α as follows:

To be more specific, we can choose

The initial value x₀ should be chosen from interval I.

Convergence of the method of fixed-point iteration is not very rapid, but there is a procedure to decrease the number of iterations using the same information. Expression (7.7) for method (7.13) can be written as

Let us consider two consecutive iterations:

After eliminating R, we can express x* from those two expressions

(7.14)

If we take z_k + 1 as an approximation to x*, then the accuracy of this approximation is O((e_k)²) – that is, the sequence {z_k} has convergence of the second order, as opposed to the sequence {x_k} which has linear convergence. The transformation {x_k} → {z_k} is called Aitken’s δ²-process.

There are a lot of procedures to achieve superlinear convergence. Let the derivatives of f(x) up to order s be continuous in I. Then, in the neighborhood of x_k∈ I, f(x) can be represented as a polynomial using Taylor’s formula (Figure 7.2)

Figure 7.2 Polynomial approximation of the function f(x) near the point x_k.

(7.15)

where y_k lies in the interval determined by x_k and x. The next approximation to x* can be found from the equation

(7.16)

By this means, expressing x_k + 1 from equation (7.16), we obtain the transformation x_k →x_k + 1, and this defines some iterative scheme. It is easy to show that the iterative sequence generated by (7.15) and (7.16) has the order p = s.

Next, we need to discuss how to choose an initial value x₀ which initiates a convergent iterative process. Let x*∈I, with derivative ƒ^(s)(x) continuous in I and f′(x)^(s)(x)≠ 0 for all x∈ I. When conditions

(7.17)

(7.18)

are satisfied, and min(x*, x_k) < x_k + 1 < max(x*, x_k), then the iterative sequence {x_k} generated by (7.15) and (7.16) converges monotonically to x*.

Now, let us consider the particular methods which follow from general approach (7.15) and (7.16). When s = 2, equation (7.16) has the following form:

Solving this equation with respect to x_k + 1, we obtain the following iterative scheme:

(7.19)

with g(x) = x − f(x)/f′(x). This is the well known Newton’s method and it has a simple geometrical interpretation. Function y(x) = f(x_k) + f′(x_k)(x − x_k) is the tangent to the curve of f(x) at the point x_k. Then approximation x_k + l is the intersection point of the tangent with the x-axis. To choose an initial value x₀ it is necessary to use condition (7.17) with s = 7.

When s = 3, equation (7.16) has the form

(7.20)

Solving this equation with respect to x_k + 1, we obtain

(7.21)

To satisfy Theorem 7.1, we must add the last two terms of equation (7.21) when f′(x) > 0 and subtract them when f′(x) < 0. It should also be taken into account that the sum of the last two terms in expression (7.21) approaches zero near the root. This may result in the loss of accuracy. It is known that for quadratic equation ax² + bx + c = 0, the following relation is valid:

After applying this conversion to (7.21), we finally arrive at the following iterative scheme:

(7.22)

An initial value x₀ should be chosen from condition (7.18) with s = 8.

The polynomial P_s(x_k + 1) may be reduced to a first-degree polynomial without changing the order of the iterative sequence generated. Two possibilities to modify equation (7.20) follow:

1) replace one of the (x_k + 1 − x_k) factors in (x_k + 1 − x_k)² by − f(x_k)/f′(x_k). After solving the modified equation with respect to (x_k + l − x_k) we obtain the following iterative scheme:

(7.23)

This is the Halley method.

2) replace (x_k + rx_k)² by (f(x_k)/f′(x_k))⁷ Then after solving the modified equation with respect to (x_k + 1 − x_k) we get

(7.24)

These methods are more attractive in comparison to iterative scheme (7.22) because they have the same cubic convergence but computation of the square root is avoided.

The following function implements the fixed-point method to find the root of an equation f(x) = 0 such that the equation could be re written as x = g(x). It takes three input arguments; the first argument is a function handle. Here, it is the function g(x) which is related to function f(x), for which we want to find out the root. The second argument is the initial guess value of root. The last argument is the error tolerance of the search method as discussed previously. The code listing is as follows.

Listing 7.1

1. function y = fixedpoint(g, x0, tol)

2. % fixed point function finds the root of a given function

3. %f(x) = 0 with the intial guess x0 and accuracy of tol

4. % such that the equation f(x) = 0 could be written as

5. % x = g(x)

6. % input: g - input function such that x = g(x)

7. % x0 - initial guess value of root

8. % tol - error tolerance

9. % output: y - root value

10. y_old = x0;

11. y_new = feval(g, x0);

12. i = 0;

13. while (abs(y_new - y_old) > tol)

14. i = i + l;

15. y_old = y_new;

16. y_new = feval(g, y_old);

17. s = sprint(‘Iteration = %d Approximate root = %5.10f, i, y_new);

18. disp(s);

19. end;

20. y = y_ new;

To find the root of an equation f(x) = cos(x) − x = 0. We will first re-write the equation as x = g(x) = cos(x). Then, we will implement the following function.

Listing 7.2

Function y = g(x)

% x = cos(x)

y = cos (x);

Then we will call the fixedpoint function as follows.

Listing 7.3

> > y = fixedpoint(@g, 0, 1e-6)

Iteration = 1 Approximate root = 0.5403023059

….

Iteration = 34 Approximate root = 0.7390855264

y =

0.73908552636192

We don’t show all the results to save the space. It took 34 iterations to reach the approximate solution within desired error tolerance. We observed that even after trying with different initial root values closer to the root the performance of this method did not improve.

The following function implements the controlled fixed-point iterative method. It is a variation of the fixed-point method. The signature of the function remains almost the same except for the addition of the alpha control parameter. We do not need to transform the equation from f(x) = 0 to x = g(x) so the function handle in first argument should point to the m-script implementation of f(x) rather than g (x). The code is listed in the following.

Listing 7.4

1. function y = cfixedpoinu(f, x0, alpha, tol)

2. % cfixedpoint function finds the root of a given function

3. % f(x) = 0 with the intial guess x0 and accuracy of tol

4. % such that the iteration is controlled using parameter alpha

5. % input: f – input function such that f(x) = 0

6. % x0- initial guess value of root

7. % alpha - control parameter

8. % tol - error tolerance

9. % output: y - root value

10. y_old = x0;

11. Y_new = y_old + alpha * feval(f, x0);

12. i = 0;

13. while (abs(y_new - y_old) > tol)

14. i = i + I;

15. Y_old = y_new;

16. y_new = y_old + alpha * feval(f, y_old);

17. s = sprintf(‘Iteration = %d Approximate root = %5.10f’, i, Y_new);

18. disp(s);

19. end;

20. y = y_new

We apply this function on the same problem, i.e.,

First, we should implement this function using m-script as given in the following listing.

Listing 7.5

1. function y = f(x)

2. % f(x) = cos(x)- x = 0

3. y = cos(x)-x

Then, to apply the controlled fixed-point method on this function, we will make the following function call from the interpreter.

Listing 7.6

> > y = cfixedpoint(@f, 0,0.5, Ie-6)

Iteration = 1 Approximate root = 0.7379893799

Iteration = 2 Approximate root = 0. 7389060909

Iteration = 3 Approximate root = 0.7390559087

Iteration = 4 Approximate root = 0.7390803638

Iteration = 5 Approximate root = 0.7390843549

Iteration = 6 Approximate root = 0.7390850062

y =

0.73908500619363

The result obtained is the same as in previous sections but it takes only 6 iterations to solve the same problem. Controller fixed-point method is an enormous improvement in terms of performance compared to the fixed-point method.

7.1.2 Multi-point iterative processes

To construct an iterative process with cubic convergence using one-point iteration, we need to employ the higher-order derivatives of f(x). This may sometimes result in much computational effort. Another way to achieve cubic convergence is to use a multi-point iterative scheme (see (7.5)). One of the procedures to construct such a scheme is based on a superposition. Let us calculate the next approximation x_k + 1, in two stages (note that the first formula represents Newton’s method):

A combination of these two formulae results in the following iterative scheme:

(7.25)

The order of the iterative sequence is three, that is, e_k + 1 = C(e_k)³ In some cases, a multi-point iterative scheme is more efficient in comparison to one-point iteration. Only the function itself and its first derivative are employed and again we have rapid convergence. The construction procedure for iterative scheme (7.25) enables us to adopt the same condition for choosing the initial values as for Newton’s method.

By analogy with scheme (7.25), we can construct another iterative scheme:

and after the combination of these expressions we arrive at

This iterative scheme also produces the sequence with the order of convergence p = 3.

The following function implements Newton’s method of root approximation. In this function, the first argument points to the implementation of function f(x) whereas the second argument points to the implementation of first derivative of this function, f′(x). The third argument, x0, is the initial guess of the root and last argument, tol, is the error tolerance.

Listing 7.7

1. function y = newton(f, df x0, tol)

2. % newton method finds the root of a given function f(x) = 0

3. % input: f – input function such that f(x) = 0

4. % df- first derivative of the function f(x)

5. % x0 - initial guess value of root

6. % tol - error tolerance

7. % output: y - root value

8. y_old = x0;

9. y_new = y_old - feval(f, y_old) /feval(df, y_old);

10. i = 0;

11. while (abs (y_new - y_old) > tol)

12. i = i + l;

13. y_old = y_new;

14. y_new = y_old-feval(f,y_old)/feval(df,y_old);

15. s = sprintf(‘Iteration = %d Approximate root = %5.10f, i, y_new);

16. disp(s);

17. end;

18. y = y_new;

We use the same example as in the previous sections.

The implementation of this function is shown previously. The derivative of this function is as follows.

And the implementation of the derivative is as follows.

Listing 7.8

1. function y = df(x)

2. y = − sin(x) − 1;

Then, to apply Newton’s method on this function, we will make the following function call from the interpreter.

Listing 7.9

> > y = newton(@f, @df 0, 1e-6)

Iteration = 1 Approximate root = 0.7503638678

Iteration = 2 Approximate root = 0.7391128909

Iteration = 3 Approximate root = 0.7390851334

Iteration = 4 Approximate root = 0.7390851332

y =

21 0.73908513321516

Once again, the result obtained is the same as in previous methods and it takes 4 iterations to reach the solution. This is the best method so far. The only disadvantage of this method is that the user has to write the implementation of the derivative of the function as well. Readers could try to modify the above implementation and use the numerical approach to find out the derivative of the function. Although it will take longer to run the function, the user will not have to write the extra function implementation.

The multi-point method is implemented in the following. The function takes the same arguments as in Newton’s method.

Listing 7.10

function x = multipoint(f, df, x0, tol)

% newton method finds the root of a given function f(x) = 0

% input: f – input function such that f(x) = 0

% df- first derivative of the function f(x)

% x0 - initial guess value of root

% tol - error tolerance

% output: x - root value

x_old = x0;

y = x_old –feval(f, x_old) / feval(df, x_old);

x_new = y – feval(f, y) / feval(df, x_old);

i = 0;

while (abs(x_new - x_old) > tol)

i = i + 1;

x_old = x_new;

y = x_old – feval(f, x_old) / feval(df, x_old);

x_new = y – feval(f, y) / feval(df, x_old);

s = sprintf(‘Iteration = %d Approximate root = %5.10f’, i, x_new);

disp(s);

end;

x = x_new;

We use the same example as in the previous sections.

To apply the multipoint method, we will make the following function call from the interpreter.

Listing 7.11

> > y = multipoint(@f, @df, 0, 1e-6)

Iteration = 1 Approximate root = 0.7379315267

Iteration = 2 Approximate root = 0.7390851331

Iteration = 3 Approximate root = 0.7390851332

ans =

0.73908513321516

This method proves faster than Newton’s method. The result is the same as before and it takes even fewer iterations to reach the solution.

7.1.3 One-point iterative processes with memory

Despite the power of methods with superlinear convergence, their use of the derivatives f′(x) and f″(x) can be troublesome. It often requires much more effort to evaluate f′(x) and ƒ″(x) than f(x). In addition, many applications involve functions that do not have closed-form expressions for their derivatives.

In this section we consider some modifications to one-point iterations that avoid the exact evaluation of derivatives. The first derivative of f(x) at the point x_k may be approximately evaluated in the following manner:

Substituting this expression in (7.19) we obtain

(7.26)

This is the secant method. Convergence of this iterative scheme is described by expression (7.8) with some constant C and p ≈ 1.62, which is slower than Newton’s method (p = 2). However, the secant method requires just one evaluation of f(x_k) per iteration (we can use the value f(x_k − 1) from the previous iteration). To start an iterative process we need to choose two initial values x₀ and x₁ To define these initial values, we can use the same condition we used for Newton’s method, except that the condition should be applied to x₀ and x₁:

(7.27)

The implementation of secant method in the following requires two guess values of the root, unlike the previous method.

Listing 7.12

1. function y = secant(f, df, x0, xl, tal)

2. %This method finds the root of a given function f(x) = 0

3. % input: f- input function such that f(x) = 0

4. % x0 < first initial guess value of root

5. % xl - second initial guess value of root

6. % tol - error tolerance

7. % output: y - root value

8. y_old_old = x0;

9. y_old = x1;

10. y_new = y_old-feval(f,y_old) * (y_old - y_old_old)/ (feval(f, y_old) –feval(f, y_old_old));

11. i = 0;

12. while (abs(y_new -y_old) > tol)

13. i = i + 1;

14. y_old_old = y_old;

15. y_old = y_new;

16. y_new = y_old – feval(f, y_old) * (y_old – y_old_old)/ (feval(f,y old) –feval(f, y_old_old»;

17. s = sprint(‘Iteration = %d Approximate root = %5.10f, i, y_new);

18. disp(s);

19. end;

20. y = y_new;

We use the same example as in the previous sections.

The implementation of these functions is shown previously. To apply the secant method, we will make the following function call from the interpreter.

Listing 7.13

> > y = secant(@f, @df, 0, 1, 1e-6)

Iteration = 1 Approximate root = 0.7362989976

Iteration = 2 Approximate root = 0.7391193619

Iteration = 3 Approximate root = 0.7390851121

Iteration = 4 Approximate root = 0.7390851332

y =

0.73908513321500

Secant method took only 4 iterations to find the root. Its performance is comparable to the multi-point method to solve this example.

7.2 Exercises

Find the root of equation f(x) = 0. If the equation has more than one root, then find the root that has the smallest magnitude. Calculate the root with prescribed tolerance using one of the iterative methods. To solve the problem find an interval that contains the root and initial guess that initiates the convergent iterative process. Suggested iterative methods are

(1) simple iteration with Aitken’s δ²-process,

(2) Newton’s method (7.19),

(3) one-point iteration (7.22),

(4) one-point iteration (7.23),

(5) one-point iteration (7.24),

(6) multi-point iteration (7.25),

(7) secant method (7.26)

Set of equations to solve: