Operators
If there are some concepts you do not understand, take a look at the book The Elements of Differentiable Programming (Blondel and Roulet, 2024).
List of operators
Given a function f(x) = y
, there are several differentiation operators available. The terminology depends on:
- the type and shape of the input
x
- the type and shape of the output
y
- the order of differentiation
Below we list and describe all the operators we support.
The package is thoroughly tested with inputs and outputs of the following types: Float64
, Vector{Float64}
and Matrix{Float64}
. We also expect it to work on most kinds of Number
and AbstractArray
variables. Beyond that, you are in uncharted territory. We voluntarily keep the type annotations minimal, so that passing more complex objects or custom structs might work in some cases, but we make no guarantees about that yet.
High-level operators
These operators are computed using only the input x
.
operator | order | input x | output y | operator result type | operator result shape |
---|---|---|---|---|---|
derivative | 1 | Number | Any | similar to y | size(y) |
second_derivative | 2 | Number | Any | similar to y | size(y) |
gradient | 1 | Any | Number | similar to x | size(x) |
jacobian | 1 | AbstractArray | AbstractArray | AbstractMatrix | (length(y), length(x)) |
hessian | 2 | AbstractArray | Number | AbstractMatrix | (length(x), length(x)) |
Low-level operators
These operators are computed using the input x
and another argument t
of type NTuple
, which contains one or more tangents. You can think of tangents as perturbations propagated through the function; they live either in the same space as x
or in the same space as y
.
operator | order | input x | output y | element type of t | operator result type | operator result shape |
---|---|---|---|---|---|---|
pushforward (JVP) | 1 | Any | Any | similar to x | similar to y | size(y) |
pullback (VJP) | 1 | Any | Any | similar to y | similar to x | size(x) |
hvp | 2 | Any | Number | similar to x | similar to x | size(x) |
Variants
Several variants of each operator are defined:
- out-of-place operators return a new derivative object
- in-place operators mutate the provided derivative object
Mutation and signatures
Two kinds of functions are supported:
- out-of-place functions
f(x) = y
- in-place functions
f!(y, x) = nothing
In-place functions only work with pushforward
, pullback
, derivative
and jacobian
. The other operators hvp
, gradient
and hessian
require scalar outputs, so it makes no sense to mutate the number y
.
This results in various operator signatures (the necessary arguments and their order):
function signature | out-of-place operator (returns result ) | in-place operator (mutates result ) |
---|---|---|
out-of-place function f | op(f, backend, x, [t]) | op!(f, result, backend, x, [t]) |
in-place function f! | op(f!, y, backend, x, [t]) | op!(f!, y, result, backend, x, [t]) |
The positional arguments between f
/f!
and backend
are always mutated, regardless of the bang !
in the operator name. In particular, for in-place functions f!(y, x)
, every variant of every operator will mutate y
.
Preparation
Principle
In many cases, AD can be accelerated if the function has been called at least once (e.g. to record a tape) or if some cache objects are pre-allocated. This preparation procedure is backend-specific, but we expose a common syntax to achieve it.
operator | preparation (different point) | preparation (same point) |
---|---|---|
derivative | prepare_derivative | - |
gradient | prepare_gradient | - |
jacobian | prepare_jacobian | - |
second_derivative | prepare_second_derivative | - |
hessian | prepare_hessian | - |
pushforward | prepare_pushforward | prepare_pushforward_same_point |
pullback | prepare_pullback | prepare_pullback_same_point |
hvp | prepare_hvp | prepare_hvp_same_point |
In addition, the preparation syntax depends on the number of arguments accepted by the function.
function signature | preparation signature |
---|---|
out-of-place function | prepare_op(f, backend, x, [t]) |
in-place function | prepare_op(f!, y, backend, x, [t]) |
Preparation creates an object called prep
which contains the the necessary information to speed up an operator and its variants. The idea is that you prepare only once, which can be costly, but then call the operator several times while reusing the same prep
.
op(f, backend, x, [t]) # slow because it includes preparation
op(f, prep, backend, x, [t]) # fast because it skips preparation
The prep
object is the last argument before backend
and it is always mutated, regardless of the bang !
in the operator name.
Reusing preparation
It is not always safe to reuse the results of preparation. For different-point preparation, the output prep
of
prepare_op(f, [y], backend, x, [t, contexts...])
can be reused in subsequent calls to
op(f, prep, [other_y], backend, other_x, [other_t, other_contexts...])
provided that the following conditions all hold:
f
andbackend
remain the sameother_x
has the same type and size asx
other_y
has the same type and size asy
other_t
has the same type and size ast
- all the elements of
other_contexts
have the same type and size as the corresponding elements ofcontexts
For same-point preparation, the same rules hold with two modifications:
other_x
must be equal tox
- any element of
other_contexts
with typeConstant
must be equal to the corresponding element ofcontexts
Reusing preparation with different types or sizes may work with some backends and error with others, so it is not allowed by the API of DifferentiationInterface.
These rules hold for the majority of backends, but there are some exceptions. The most important exception is ReverseDiff and its taping mechanism, which is sensitive to control flow inside the function.