Comments about Newton's method and approximating partials

In this section, we will only consider single variable functions. The proof for multivariable functions is similar except one would have to use inverse matrices of the Jacobian instead of division by f ' making proof much more difficult to read.

Proof that Newton's series converges quadratically

Let r be a root of f(x) = 0 so that f(r) = 0.
Assume that f, f' and f" are continuous and differentiable near r and f'(r) ≠ 0.
Lets use Newton's method so that
xk+1 = xk - f(xk) / f'(xk)
Define errors ek = r - xk and ek+1 = r - xk+1
We will now expand ek in a way that will eventually prove to be very useful. First, the iteration formula
xk+1 = xk - f(xk) / f'(xk)
Thus
xk = xk+1 + f(xk) / f'(xk)
Returning to ek, we obtain
ek = r - xk
   = r - xk+1 - f(xk) / f'(xk)
   = ek+1 - f(xk) / f'(xk)

Let's expand f(r) in the Lagrange form of the Taylor series
  f(r) = f(xk + ek)
       = f(xk) + f'(xk)ek + f"(ξk)ek2 / 2
where ξk is between xk and r. But f(r) = 0 so
    0 = f(xk) + f'(xk)ek + f"(ξk)ek2 / 2 
Divide by f'(xk) to obtain
    0 =  f(xk) / f'(xk) + ek + f"(ξk)ek2 / [2 f'(xk)]
Now replace ek by its value obtained earlier.
   0 = f(xk) / f'(xk) +  ek+1 - f(xk) / f'(xk) + f"(ξk)ek2 / [2 f'(xk)]
   0 = ek+1 + f"(ξk)ek2 / [2 f'(xk)]
   ek+1 = {-f"(ξk) / [2 f'(xk)]} ek2
If we maximize -f"(ξk) / [2 f'(xk) in an appropriate interval near r and apply absolute values, we see that the convergence is quadratic if x0 is close enough to r.

Consider replacing f ' (xk) by an approximation

(Again in this section, we will only consider single variable functions. The proof for multivariable functions is similar except one would have to use inverse matrices of the Jacobian instead of division by f ' making proof much more difficult to read.)

Mathematical analysis of the approximation method shows almost quadratic convergence providing f '(x) ≠ 0 at the solution. It also shows that there are serious problems if f '(x) = 0 at the solution. This analysis follows the proof that Newton's method converges quadratically. Changes in the above proof are shown in italics.

Let r be a root of f(x) = 0 so that f(r) = 0.
Assume that f, f' and f" are continuous near r and f'(r) ≠ 0.
Lets use an alternative to Newton's method by replacing f '(xk) by the approximation
 d(xk, Δ) = f'(xk + Δ) - f'(xk)
                     Δ
and using the iteration

xk+1 = xk - f(xk)/d(xk, Δ)
Define errors ek = r - xk and ek+1 = r - xk+1.

We will now expand ek in a way that will eventually prove to be very useful. First, the iteration formula
xk+1 = xk - f(xk) / d(xk, Δ)
Thus
xk = xk+1 + f(xk) / d(xk, Δ)
Returning to ek, we obtain
ek = r - xk
   = r - xk+1 - f(xk) / d(xk, Δ)
   = ek+1 - f(xk) / d(xk, Δ)

Let's expand f(r) in the Lagrange form of the Taylor series
  f(r) = f(xk - ek)
       = f(xk) - f'(xk)ek + f"(ξk)ek2/2 
where ξk is between xk and r. But f(r) = 0 so
    0 = f(xk) - f'(xk)ek + f"(ξk)ek2/2 
Divide by f'(xk) to obtain
    0 = f(xk) / f'(xk) + ek + f"(ξk)ek2 / [2 f'(xk)]
Now replace ek by its value obtained earlier.
   0 = f(xk) / f'(xk) +  ek+1 - f(xk) / d(xk, Δ) + f"(ξk)ek2 / [2 f'(xk)]
   0 = ek+1 +  f(xk) / f'(xk) - f(xk) / d(xk, Δ) + f"(ξk)ek2 / [2 f'(xk)]
   ek+1 = -f(xk) / f'(xk) + f(xk) / d(xk, Δ)  + {-f"(ξk) / [2 f'(xk)]} ek2
        = -f(xk) * [1 / f'(xk) - 1 / d(xk, Δ)] + {-f"(ξk) / [2 f'(xk)]} ek2
        = Q(xk, δk) + Ck ek2 
where
  Q(xk, Δ) = -f(xk) * [1 / f'(xk) - 1 / d(xk, Δ)]
and
  Ck = -f"(ξk)
       2f'(xk)
Consider Q(xk, Δ). If Q(xk, Δ) is zero, then the method would converge quadratically. But in general, it will not be. Lets examine [1 / f'(xk) - 1 / d(xk, Δ)] more closely. Because
  lim Q(xk, Δ)
 Δ → 0
              = lim  f(x+Δ) - f(x) = f'(x)
               Δ → 0       Δ      
We see
  lim  [1 / f'(xk) - 1 / d(xk, Δ)] = 0
 Δ → 0
Thus we can make Q(xk) as close to 0 as desired by making Δ small enough. Unfortunately, in practice, we will lose numerical accuracy if it is too small. This would be a significant problem if we were using single precision arithmetic. But in modern computers and many languages (including Javascript), double precision is used without much inefficiency. (This technique would not be recommended if calculations are carried out in single precision.) Experimentally it has been found that using Δ from 10-4 to 10-6 often approximates the derivative very well and the resulting convergence appears to be quadratic. (Sometimes, negative values of Δ can cause faster convergence.) Even smaller values have been used successfully.

The expression for Q(xk, Δ) also shows that there can be serious problems with convergence if f '(r) = 0 because both f '(xk) and d(xk, Δ) approach 0 as xk approaches the solution. While iteration patterns when f '(r) ≠ 0 normally closely resemble those of regular Newton's method, they often differ when f '(r) = 0 as the iterations start getting closer r.

Newton's Method with multiple variables

Consider Newton's method for systems with multiple variables. For example, if f is a vector function with 2 functions of 2 variables, then we could write

     X = ⌈x⌉      F(x, y) = ⌈f0(x, y)⌉      E = ⌈x - r0
         ⌊y⌋                ⌊f1(x, y)⌋          ⌊y - r1

     J(x, y) = ⌈δf0(x, y)/δx   δf0(x, y)/δy⌉
               ⌊δf1(x, y)/δx   δf1(x, y)/δy⌋
Then Newton's method can be carried out with
     Xk+1 = Xk - J(xk, yk)-1F(xk, yk)
Actually it is not necessary to calculate J(xk, yk)-1. Instead one can solve
          J(xk,yk)Y = F(xk, yk)
for Y and iterate with Xk+1 = Xk - Y.

The proof that the sequence converges quadratically is much the same as for the single variable proof with f' replaced with J. Division by f' is replaced by a left multiplication of J-1. f " is replaced by Hessian matrix, H, the matrix of second partials, and f"ek2 is replaced by EkTHEk. A norm such as max norm is used in place of absolute values.

If there are more than 2 variables, the vectors and matrices are expanded in the obvious way.

Approximating Newton's method with multiple variables

For 2 dimensional systems, the partials in the Jacobian are replaced by
     δf0(xk, yk)/δx = f0(xk + Δ, yk) - f0(xk, yk)
                                  Δ             

     δf0(xk, yk)/δy = f0(xk, yk + Δ) - f0(xk, yk)
                                  Δ             
Likewise for f1 when approximating Newton's method. The same concept is used for systems with more variables. Experience has shown that Δ = 0.0001 often produces quadratic like convergence.

©James Brink, 11/12/2021