Efficient Halley's method for nonlinear solving
Introduction
Say we have a system of $n$ equations with $n$ unknowns $f(x)=0$, and $f\in \mathbb R^n\to\mathbb R^n$ is sufficiently smooth.
Given a initial guess $x_0$, Newton's method finds a solution by iterating like
\[x_{i+1}=x_i-J(x_i)^{-1}f(x_i)\]
and this method converges quadratically.
We can make it converge faster using higher-order derivative information. For example, Halley's method iterates like
\[x_{i+1}=x_i-(a_i\odot a_i)\oslash(a_i-b_i/2)\]
where the vector multiplication and division $\odot,\oslash$ are defined element-wise, and term $a_i$ and $b_i$ are defined by $J(x_i)a_i = f(x_i)$ and $J(x_i)b_i = H(x_i)a_ia_i$.
Halley's method is proved to converge cubically, which is faster than Newton's method. Here, we demonstrate that with TaylorDiff.jl, you can compute the Hessian-vector-vector product $H(x_i)a_ia_i$ very efficiently, such that the Halley's method is almost as cheap as Newton's method per iteration.
Implementation
We first define the two iteration schemes mentioned above:
using TaylorDiff, LinearAlgebra
import ForwardDiff
function newton(f, x, p; tol = 1e-12, maxiter = 100)
fp = Base.Fix2(f, p)
for i in 1:maxiter
fx = fp(x)
error = norm(fx)
println("Iteration $i: x = $x, f(x) = $fx, error = $error")
error < tol && return
J = ForwardDiff.jacobian(fp, x)
a = J \ fx
@. x -= a
end
end
function halley(f, x, p; tol = 1e-12, maxiter = 100)
fp = Base.Fix2(f, p)
for i in 1:maxiter
fx = f(x, p)
error = norm(fx)
println("Iteration $i: x = $x, f(x) = $fx, error = $error")
error < tol && return
J = ForwardDiff.jacobian(fp, x)
a = J \ fx
hvvp = derivative(fp, x, a, Val(2))
b = J \ hvvp
@. x -= (a * a) / (a - b / 2)
end
end
halley (generic function with 1 method)
Note that in Halley's method, the hessian-vector-vector product is computed with derivative(fp, x, a, Val(2))
. It is guaranteed that asymptotically this is only taking 2x more time compared to evaluating fp(x)
itself.
Now we define some test function:
f(x, p) = x .* x - p
f (generic function with 1 method)
The Newton's method takes 6 iterations to converge:
newton(f, [1., 1.], [2., 2.])
Iteration 1: x = [1.0, 1.0], f(x) = [-1.0, -1.0], error = 1.4142135623730951
Iteration 2: x = [1.5, 1.5], f(x) = [0.25, 0.25], error = 0.3535533905932738
Iteration 3: x = [1.4166666666666667, 1.4166666666666667], f(x) = [0.006944444444444642, 0.006944444444444642], error = 0.009820927516480105
Iteration 4: x = [1.4142156862745099, 1.4142156862745099], f(x) = [6.007304882871267e-6, 6.007304882871267e-6], error = 8.495612038666664e-6
Iteration 5: x = [1.4142135623746899, 1.4142135623746899], f(x) = [4.510614104447086e-12, 4.510614104447086e-12], error = 6.378971641140442e-12
Iteration 6: x = [1.4142135623730951, 1.4142135623730951], f(x) = [4.440892098500626e-16, 4.440892098500626e-16], error = 6.280369834735101e-16
While the Halley's method takes 4 iterations to converge:
halley(f, [1., 1.], [2., 2.])
Iteration 1: x = [1.0, 1.0], f(x) = [-1.0, -1.0], error = 1.4142135623730951
Iteration 2: x = [1.4, 1.4], f(x) = [-0.04000000000000026, -0.04000000000000026], error = 0.05656854249492417
Iteration 3: x = [1.4142131979695431, 1.4142131979695431], f(x) = [-1.0306887576749801e-6, -1.0306887576749801e-6], error = 1.4576140196894333e-6
Iteration 4: x = [1.414213562373095, 1.414213562373095], f(x) = [-4.440892098500626e-16, -4.440892098500626e-16], error = 6.280369834735101e-16