Efficient Halley's method for nonlinear solving

Introduction

Say we have a system of $n$ equations with $n$ unknowns $f(x)=0$, and $f\in \mathbb R^n\to\mathbb R^n$ is sufficiently smooth.

Given a initial guess $x_0$, Newton's method finds a solution by iterating like

\[x_{i+1}=x_i-J(x_i)^{-1}f(x_i)\]

and this method converges quadratically.

We can make it converge faster using higher-order derivative information. For example, Halley's method iterates like

\[x_{i+1}=x_i-(a_i\odot a_i)\oslash(a_i-b_i/2)\]

where the vector multiplication and division $\odot,\oslash$ are defined element-wise, and term $a_i$ and $b_i$ are defined by $J(x_i)a_i = f(x_i)$ and $J(x_i)b_i = H(x_i)a_ia_i$.

Halley's method is proved to converge cubically, which is faster than Newton's method. Here, we demonstrate that with TaylorDiff.jl, you can compute the Hessian-vector-vector product $H(x_i)a_ia_i$ very efficiently, such that the Halley's method is almost as cheap as Newton's method per iteration.

Implementation

We first define the two iteration schemes mentioned above:

using TaylorDiff, LinearAlgebra
import ForwardDiff

function newton(f, x, p; tol = 1e-12, maxiter = 100)
    fp = Base.Fix2(f, p)
    for i in 1:maxiter
        fx = fp(x)
        error = norm(fx)
        println("Iteration $i: x = $x, f(x) = $fx, error = $error")
        error < tol && return
        J = ForwardDiff.jacobian(fp, x)
        a = J \ fx
        @. x -= a
    end
end

function halley(f, x, p; tol = 1e-12, maxiter = 100)
    fp = Base.Fix2(f, p)
    for i in 1:maxiter
        fx = f(x, p)
        error = norm(fx)
        println("Iteration $i: x = $x, f(x) = $fx, error = $error")
        error < tol && return
        J = ForwardDiff.jacobian(fp, x)
        a = J \ fx
        hvvp = derivative(fp, x, a, Val(2))
        b = J \ hvvp
        @. x -= (a * a) / (a - b / 2)
    end
end

halley (generic function with 1 method)

Note that in Halley's method, the hessian-vector-vector product is computed with derivative(fp, x, a, Val(2)). It is guaranteed that asymptotically this is only taking 2x more time compared to evaluating fp(x) itself.

Now we define some test function:

f(x, p) = x .* x - p

f (generic function with 1 method)

The Newton's method takes 6 iterations to converge:

newton(f, [1., 1.], [2., 2.])

Iteration 1: x = [1.0, 1.0], f(x) = [-1.0, -1.0], error = 1.4142135623730951
Iteration 2: x = [1.5, 1.5], f(x) = [0.25, 0.25], error = 0.3535533905932738
Iteration 3: x = [1.4166666666666667, 1.4166666666666667], f(x) = [0.006944444444444642, 0.006944444444444642], error = 0.009820927516480105
Iteration 4: x = [1.4142156862745099, 1.4142156862745099], f(x) = [6.007304882871267e-6, 6.007304882871267e-6], error = 8.495612038666664e-6
Iteration 5: x = [1.4142135623746899, 1.4142135623746899], f(x) = [4.510614104447086e-12, 4.510614104447086e-12], error = 6.378971641140442e-12
Iteration 6: x = [1.4142135623730951, 1.4142135623730951], f(x) = [4.440892098500626e-16, 4.440892098500626e-16], error = 6.280369834735101e-16

While the Halley's method takes 4 iterations to converge:

halley(f, [1., 1.], [2., 2.])

Iteration 1: x = [1.0, 1.0], f(x) = [-1.0, -1.0], error = 1.4142135623730951
Iteration 2: x = [1.4, 1.4], f(x) = [-0.04000000000000026, -0.04000000000000026], error = 0.05656854249492417
Iteration 3: x = [1.4142131979695431, 1.4142131979695431], f(x) = [-1.0306887576749801e-6, -1.0306887576749801e-6], error = 1.4576140196894333e-6
Iteration 4: x = [1.414213562373095, 1.414213562373095], f(x) = [-4.440892098500626e-16, -4.440892098500626e-16], error = 6.280369834735101e-16