Comments about solutions

In this section, we will only consider single variable functions. The proof for multivariable functions is similar except one would have to use inverse matrices of the Jacobian instead of division by f ' making proof much more difficult to read.

Proof that Newton's series converges quadratically

Let r be a root of f(x) = 0 so that f(r) = 0.
Assume that f, f' and f" are continuous and differentiable near r and f'(r) ≠ 0.
Lets use Newton's method so that
x_k+1 = x_k - f(x_k) / f'(x_k)
Define errors e_k = r - x_k and e_k+1 = r - x_k+1
We will now expand e_k in a way that will eventually prove to be very useful. First, the iteration formula
x_k+1 = x_k - f(x_k) / f'(x_k)
Thus
x_k = x_k+1 + f(x_k) / f'(x_k)
Returning to e_k, we obtain
e_k = r - x_k = r - x_k+1 - f(x_k) / f'(x_k) = e_k+1 - f(x_k) / f'(x_k)
Let's expand f(r) in the Lagrange form of the Taylor series

  f(r) = f(x_k + e_k)
       = f(x_k) + f'(x_k)e_k + f"(ξ_k)e_k² / 2

where ξ_k is between x_k and r. But f(r) = 0 so

    0 = f(x_k) + f'(x_k)e_k + f"(ξ_k)e_k² / 2

Divide by f'(x_k) to obtain

    0 =  f(x_k) / f'(x_k) + e_k + f"(ξ_k)e_k² / [2 f'(x_k)]

Now replace e_k by its value obtained earlier.

   0 = f(x_k) / f'(x_k) +  e_k+1 - f(x_k) / f'(x_k) + f"(ξ_k)e_k² / [2 f'(x_k)]
   0 = e_k+1 + f"(ξ_k)e_k² / [2 f'(x_k)]
   e_k+1 = {-f"(ξ_k) / [2 f'(x_k)]} e_k²

If we maximize -f"(ξ_k) / [2 f'(x_k) in an appropriate interval near r and apply absolute values, we see that the convergence is quadratic if x₀ is close enough to r.

Consider replacing f ' (x_k) by an approximation

(Again in this section, we will only consider single variable functions. The proof for multivariable functions is similar except one would have to use inverse matrices of the Jacobian instead of division by f ' making proof much more difficult to read.)

Let r be a root of f(x) = 0 so that f(r) = 0.
Assume that f, f' and f" are continuous near r and f'(r) ≠ 0.
Lets use an alternative to Newton's method by replacing f '(x_k) by the approximation
d(x_k, Δ) = f'(x_k + Δ) - f'(x_k) Δ
and using the iteration
x_k+1 = x_k - f(x_k)/d(x_k, Δ)
Define errors e_k = r - x_k and e_k+1 = r - x_k+1.

We will now expand e_k in a way that will eventually prove to be very useful. First, the iteration formula
x_k+1 = x_k - f(x_k) / d(x_k, Δ)
Thus
x_k = x_k+1 + f(x_k) / d(x_k, Δ)
Returning to e_k, we obtain
e_k = r - x_k = r - x_k+1 - f(x_k) / d(x_k, Δ) = e_k+1 - f(x_k) / d(x_k, Δ)
Let's expand f(r) in the Lagrange form of the Taylor series

  f(r) = f(x_k - e_k)
       = f(x_k) - f'(x_k)e_k + f"(ξ_k)e_k²/2

where ξ_k is between x_k and r. But f(r) = 0 so

    0 = f(x_k) - f'(x_k)e_k + f"(ξ_k)e_k²/2

Divide by f'(x_k) to obtain

    0 = f(x_k) / f'(x_k) + e_k + f"(ξ_k)e_k² / [2 f'(x_k)]

Now replace e_k by its value obtained earlier.

   0 = f(x_k) / f'(x_k) +  e_k+1 - f(x_k) / d(x_k, Δ) + f"(ξ_k)e_k² / [2 f'(x_k)]
   0 = e_k+1 +  f(x_k) / f'(x_k) - f(x_k) / d(x_k, Δ) + f"(ξ_k)e_k² / [2 f'(x_k)]
   e_k+1 = -f(x_k) / f'(x_k) + f(x_k) / d(x_k, Δ)  + {-f"(ξ_k) / [2 f'(x_k)]} e_k²
        = -f(x_k) * [1 / f'(x_k) - 1 / d(x_k, Δ)] + {-f"(ξ_k) / [2 f'(x_k)]} e_k²
        = Q(x_k, δ_k) + C_k e_k²

where
Q(x_k, Δ) = -f(x_k) * [1 / f'(x_k) - 1 / d(x_k, Δ)]
and
C_k = -f"(ξ_k) 2f'(x_k)
Consider Q(x_k, Δ). If Q(x_k, Δ) is zero, then the method would converge quadratically. But in general, it will not be. Lets examine [1 / f'(x_k) - 1 / d(x_k, Δ)] more closely. Because
lim Q(x_k, Δ) Δ → 0 = lim f(x+Δ) - f(x) = f'(x) Δ → 0 Δ
We see
lim [1 / f'(x_k) - 1 / d(x_k, Δ)] = 0 Δ → 0
Thus we can make Q(x_k) as close to 0 as desired by making Δ small enough. Unfortunately, in practice, we will lose numerical accuracy if it is too small. This would be a significant problem if we were using single precision arithmetic. But in modern computers and many languages (including Javascript), double precision is used without much inefficiency. (This technique would not be recommended if calculations are carried out in single precision.) Experimentally it has been found that using Δ from 10^-4 to 10^-6 often approximates the derivative very well and the resulting convergence appears to be quadratic. (Sometimes, negative values of Δ can cause faster convergence.) Even smaller values have been used successfully.

The expression for Q(x_k, Δ) also shows that there can be serious problems with convergence if f '(r) = 0 because both f '(x_k) and d(x_k, Δ) approach 0 as x_k approaches the solution. While iteration patterns when f '(r) ≠ 0 normally closely resemble those of regular Newton's method, they often differ when f '(r) = 0 as the iterations start getting closer r.

Newton's Method with multiple variables

Consider Newton's method for systems with multiple variables. For example, if f is a vector function with 2 functions of 2 variables, then we could write


     X = ⌈x⌉      F(x, y) = ⌈f₀(x, y)⌉      E = ⌈x - r₀⌉
         ⌊y⌋                ⌊f₁(x, y)⌋          ⌊y - r₁⌋

     J(x, y) = ⌈δf₀(x, y)/δx   δf₀(x, y)/δy⌉
               ⌊δf₁(x, y)/δx   δf₁(x, y)/δy⌋

Then Newton's method can be carried out with

     X_k+1 = X_k - J(x_k, y_k)^-1F(x_k, y_k)

Actually it is not necessary to calculate J(x_k, y_k)^-1. Instead one can solve
J(x_k,y_k)Y = F(x_k, y_k)
for Y and iterate with X_k+1 = X_k - Y.

The proof that the sequence converges quadratically is much the same as for the single variable proof with f' replaced with J. Division by f' is replaced by a left multiplication of J^-1. f " is replaced by Hessian matrix, H, the matrix of second partials, and f"e_k² is replaced by E_k^THE_k. A norm such as max norm is used in place of absolute values.

If there are more than 2 variables, the vectors and matrices are expanded in the obvious way.

Approximating Newton's method with multiple variables

For 2 dimensional systems, the partials in the Jacobian are replaced by

     δf₀(x_k, y_k)/δx = f₀(x_k + Δ, y_k) - f₀(x_k, y_k)
                                  Δ             

     δf₀(x_k, y_k)/δy = f₀(x_k, y_k + Δ) - f₀(x_k, y_k)
                                  Δ

Likewise for f₁ when approximating Newton's method. The same concept is used for systems with more variables. Experience has shown that Δ = 0.0001 often produces quadratic like convergence.

Comments about Newton's method and approximating partials

Proof that Newton's series converges quadratically

Consider replacing f ' (xk) by an approximation

Newton's Method with multiple variables

Approximating Newton's method with multiple variables

Consider replacing f ' (x_k) by an approximation