Advanced tutorial
We present contexts and sparsity handling with DifferentiationInterface.jl.
using BenchmarkTools
using DifferentiationInterface
import ForwardDiff, Zygote
using SparseConnectivityTracer: TracerSparsityDetector
using SparseMatrixColorings
Contexts
Assume you want differentiate a multi-argument function with respect to the first argument.
f_multiarg(x, c) = c * sum(abs2, x)
The first way, which works with every backend, is to create a closure:
f_singlearg(c) = x -> f_multiarg(x, c)
Let's see it in action:
backend = AutoForwardDiff()
x = float.(1:3)
gradient(f_singlearg(10), backend, x)
3-element Vector{Float64}:
20.0
40.0
60.0
However, for performance reasons, it is sometimes preferrable to avoid closures and pass all arguments to the original function. We can do this by wrapping c
into a Constant
and giving this constant to the gradient
operator.
gradient(f_multiarg, backend, x, Constant(10))
3-element Vector{Float64}:
20.0
40.0
60.0
Preparation also works in this case, even if the constant changes before execution:
prep_other_constant = prepare_gradient(f_multiarg, backend, x, Constant(-1))
gradient(f_multiarg, prep_other_constant, backend, x, Constant(10))
3-element Vector{Float64}:
20.0
40.0
60.0
For additional arguments which act as mutated buffers, the Cache
wrapper is the appropriate choice instead of Constant
.
Sparsity
Sparse AD is very useful when Jacobian or Hessian matrices have a lot of zeros. So let us write functions that satisfy this property.
f_sparse_vector(x::AbstractVector) = diff(x .^ 2) + diff(reverse(x .^ 2))
f_sparse_scalar(x::AbstractVector) = sum(f_sparse_vector(x) .^ 2)
Dense backends
When we use the jacobian
or hessian
operator with a dense backend, we get a dense matrix with plenty of zeros.
x = float.(1:8);
8-element Vector{Float64}:
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
dense_first_order_backend = AutoForwardDiff()
J_dense = jacobian(f_sparse_vector, dense_first_order_backend, x)
7×8 Matrix{Float64}:
-2.0 4.0 0.0 0.0 0.0 0.0 14.0 -16.0
0.0 -4.0 6.0 0.0 0.0 12.0 -14.0 0.0
0.0 0.0 -6.0 8.0 10.0 -12.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 6.0 -8.0 -10.0 12.0 0.0 0.0
0.0 4.0 -6.0 0.0 0.0 -12.0 14.0 0.0
2.0 -4.0 0.0 0.0 0.0 0.0 -14.0 16.0
dense_second_order_backend = SecondOrder(AutoForwardDiff(), AutoZygote())
H_dense = hessian(f_sparse_scalar, dense_second_order_backend, x)
8×8 Matrix{Float64}:
112.0 -32.0 0.0 0.0 0.0 0.0 -112.0 128.0
-32.0 96.0 -96.0 0.0 0.0 -192.0 448.0 -256.0
0.0 -96.0 256.0 -192.0 -240.0 576.0 -336.0 0.0
0.0 0.0 -192.0 224.0 320.0 -384.0 0.0 0.0
0.0 0.0 -240.0 320.0 368.0 -480.0 0.0 0.0
0.0 -192.0 576.0 -384.0 -480.0 1120.0 -672.0 0.0
-112.0 448.0 -336.0 0.0 0.0 -672.0 1536.0 -896.0
128.0 -256.0 0.0 0.0 0.0 0.0 -896.0 1120.0
The results are correct but the procedure is very slow. By using a sparse backend, we can get the runtime to increase with the number of nonzero elements, instead of the total number of elements.
Sparse backends
Recipe to create a sparse backend: combine a dense backend, a sparsity detector and a compatible coloring algorithm inside AutoSparse
. The following are reasonable defaults:
sparse_first_order_backend = AutoSparse(
dense_first_order_backend;
sparsity_detector=TracerSparsityDetector(),
coloring_algorithm=GreedyColoringAlgorithm(),
)
sparse_second_order_backend = AutoSparse(
dense_second_order_backend;
sparsity_detector=TracerSparsityDetector(),
coloring_algorithm=GreedyColoringAlgorithm(),
)
Now the resulting matrices are sparse:
jacobian(f_sparse_vector, sparse_first_order_backend, x)
7×8 SparseArrays.SparseMatrixCSC{Float64, Int64} with 26 stored entries:
-2.0 4.0 ⋅ ⋅ ⋅ ⋅ 14.0 -16.0
⋅ -4.0 6.0 ⋅ ⋅ 12.0 -14.0 ⋅
⋅ ⋅ -6.0 8.0 10.0 -12.0 ⋅ ⋅
⋅ ⋅ ⋅ 0.0 0.0 ⋅ ⋅ ⋅
⋅ ⋅ 6.0 -8.0 -10.0 12.0 ⋅ ⋅
⋅ 4.0 -6.0 ⋅ ⋅ -12.0 14.0 ⋅
2.0 -4.0 ⋅ ⋅ ⋅ ⋅ -14.0 16.0
hessian(f_sparse_scalar, sparse_second_order_backend, x)
8×8 SparseArrays.SparseMatrixCSC{Float64, Int64} with 40 stored entries:
112.0 -32.0 ⋅ ⋅ ⋅ ⋅ -112.0 128.0
-32.0 96.0 -96.0 ⋅ ⋅ -192.0 448.0 -256.0
⋅ -96.0 256.0 -192.0 -240.0 576.0 -336.0 ⋅
⋅ ⋅ -192.0 224.0 320.0 -384.0 ⋅ ⋅
⋅ ⋅ -240.0 320.0 368.0 -480.0 ⋅ ⋅
⋅ -192.0 576.0 -384.0 -480.0 1120.0 -672.0 ⋅
-112.0 448.0 -336.0 ⋅ ⋅ -672.0 1536.0 -896.0
128.0 -256.0 ⋅ ⋅ ⋅ ⋅ -896.0 1120.0
Sparse preparation
In the examples above, we didn't use preparation. Sparse preparation is more costly than dense preparation, but it is even more essential. Indeed, once preparation is done, sparse differentiation is much faster than dense differentiation, because it makes fewer calls to the underlying function.
Some result analysis functions from SparseMatrixColorings.jl can help you figure out what the preparation contains. First, it records the sparsity pattern itself (the one returned by the detector).
jac_prep = prepare_jacobian(f_sparse_vector, sparse_first_order_backend, x)
sparsity_pattern(jac_prep)
7×8 SparseArrays.SparseMatrixCSC{Bool, Int64} with 26 stored entries:
1 1 ⋅ ⋅ ⋅ ⋅ 1 1
⋅ 1 1 ⋅ ⋅ 1 1 ⋅
⋅ ⋅ 1 1 1 1 ⋅ ⋅
⋅ ⋅ ⋅ 1 1 ⋅ ⋅ ⋅
⋅ ⋅ 1 1 1 1 ⋅ ⋅
⋅ 1 1 ⋅ ⋅ 1 1 ⋅
1 1 ⋅ ⋅ ⋅ ⋅ 1 1
In forward mode, each column of the sparsity pattern gets a color.
column_colors(jac_prep)
8-element Vector{Int64}:
1
2
1
2
3
4
3
4
And the colors in turn define non-overlapping groups (for Jacobians at least, Hessians are a bit more complicated).
column_groups(jac_prep)
4-element Vector{SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}}:
[1, 3]
[2, 4]
[5, 7]
[6, 8]
Sparsity speedup
When preparation is used, the speedup due to sparsity becomes very visible in large dimensions.
xbig = rand(1000)
jac_prep_dense = prepare_jacobian(f_sparse_vector, dense_first_order_backend, zero(xbig))
@benchmark jacobian($f_sparse_vector, $jac_prep_dense, $dense_first_order_backend, $xbig)
BenchmarkTools.Trial: 436 samples with 1 evaluation.
Range (min … max): 4.645 ms … 163.316 ms ┊ GC (min … max): 9.04% … 96.90%
Time (median): 5.256 ms ┊ GC (median): 15.82%
Time (mean ± σ): 11.678 ms ± 26.198 ms ┊ GC (mean ± σ): 48.62% ± 19.91%
█▂
██▇█▆▁██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▅▁▁▁▁▁▆ ▆
4.65 ms Histogram: log(frequency) by time 160 ms <
Memory estimate: 57.63 MiB, allocs estimate: 1515.
jac_prep_sparse = prepare_jacobian(f_sparse_vector, sparse_first_order_backend, zero(xbig))
@benchmark jacobian($f_sparse_vector, $jac_prep_sparse, $sparse_first_order_backend, $xbig)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 21.530 μs … 4.192 ms ┊ GC (min … max): 0.00% … 98.76%
Time (median): 27.461 μs ┊ GC (median): 0.00%
Time (mean ± σ): 36.130 μs ± 80.223 μs ┊ GC (mean ± σ): 12.69% ± 7.31%
█▇▆▆
▅████▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▂▂▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▂▂▃▂ ▃
21.5 μs Histogram: frequency by time 139 μs <
Memory estimate: 305.31 KiB, allocs estimate: 27.
Better memory use can be achieved by pre-allocating the matrix from the preparation result (so that it has the correct structure).
jac_buffer = similar(sparsity_pattern(jac_prep_sparse), eltype(xbig))
@benchmark jacobian!($f_sparse_vector, $jac_buffer, $jac_prep_sparse, $sparse_first_order_backend, $xbig)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 18.414 μs … 4.334 ms ┊ GC (min … max): 0.00% … 98.82%
Time (median): 23.869 μs ┊ GC (median): 0.00%
Time (mean ± σ): 31.194 μs ± 81.547 μs ┊ GC (mean ± σ): 10.42% ± 5.96%
▅▆▆█▆▃▂▁▁ ▂▃▂ ▂
█████████▆▅▇▆▆▅▅▅▅▆▄▅▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇███ █
18.4 μs Histogram: log(frequency) by time 110 μs <
Memory estimate: 234.75 KiB, allocs estimate: 18.
And for optimal speed, one should write non-allocating and type-stable functions.
function f_sparse_vector!(y::AbstractVector, x::AbstractVector)
n = length(x)
for i in eachindex(y)
y[i] = abs2(x[i + 1]) - abs2(x[i]) + abs2(x[n - i]) - abs2(x[n - i + 1])
end
return nothing
end
ybig = zeros(length(xbig) - 1)
f_sparse_vector!(ybig, xbig)
ybig ≈ f_sparse_vector(xbig)
true
In this case, the sparse Jacobian should also become non-allocating (for our specific choice of backend).
jac_prep_sparse_nonallocating = prepare_jacobian(f_sparse_vector!, zero(ybig), sparse_first_order_backend, zero(xbig))
jac_buffer = similar(sparsity_pattern(jac_prep_sparse_nonallocating), eltype(xbig))
@benchmark jacobian!($f_sparse_vector!, $ybig, $jac_buffer, $jac_prep_sparse_nonallocating, $sparse_first_order_backend, $xbig)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 12.864 μs … 33.142 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 13.135 μs ┊ GC (median): 0.00%
Time (mean ± σ): 13.222 μs ± 832.350 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▁█▇
▂▄███▆▂▂▂▂▂▂▂▂▂▂▂▁▁▂▂▂▂▂▁▁▁▁▁▂▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▂▂▂▂▂▂▂▂▂ ▂
12.9 μs Histogram: frequency by time 17.1 μs <
Memory estimate: 0 bytes, allocs estimate: 0.