ReverseDiff API
Gradients of f(x::AbstractArray{<:Real}...)::Real
ReverseDiff.gradient
— FunctionReverseDiff.gradient(f, input, cfg::GradientConfig = GradientConfig(input))
If input
is an AbstractArray
, assume f
has the form f(::AbstractArray{<:Real})::Real
and return ∇f(input)
.
If input
is a tuple of AbstractArray
s, assume f
has the form f(::AbstractArray{<:Real}...)::Real
(such that it can be called as f(input...)
) and return a Tuple
where the i
th element is the gradient of f
w.r.t. input[i].
Note that cfg
can be preallocated and reused for subsequent calls.
If possible, it is highly recommended to use ReverseDiff.GradientTape
to prerecord f
. Otherwise, this method will have to re-record f
's execution trace for every subsequent call.
ReverseDiff.gradient!
— FunctionReverseDiff.gradient!(result, f, input, cfg::GradientConfig = GradientConfig(input))
Returns result
. This method is exactly like ReverseDiff.gradient(f, input, cfg)
, except it stores the resulting gradient(s) in result
rather than allocating new memory.
result
can be an AbstractArray
or a Tuple
of AbstractArray
s. The result
(or any of its elements, if isa(result, Tuple)
), can also be a DiffResults.DiffResult
, in which case the primal value f(input)
(or f(input...)
, if isa(input, Tuple)
) will be stored in it as well.
ReverseDiff.gradient!(tape::Union{GradientTape,CompiledGradient}, input)
If input
is an AbstractArray
, assume tape
represents a function of the form f(::AbstractArray)::Real
and return ∇f(input)
.
If input
is a tuple of AbstractArray
s, assume tape
represents a function of the form f(::AbstractArray...)::Real
and return a Tuple
where the i
th element is the gradient of f
w.r.t. input[i].
ReverseDiff.gradient!(result, tape::Union{GradientTape,CompiledGradient}, input)
Returns result
. This method is exactly like ReverseDiff.gradient!(tape, input)
, except it stores the resulting gradient(s) in result
rather than allocating new memory.
result
can be an AbstractArray
or a Tuple
of AbstractArray
s. The result
(or any of its elements, if isa(result, Tuple)
), can also be a DiffResults.DiffResult
, in which case the primal value f(input)
(or f(input...)
, if isa(input, Tuple)
) will be stored in it as well.
Jacobians of f(x::AbstractArray{<:Real}...)::AbstractArray{<:Real}
ReverseDiff.jacobian
— FunctionReverseDiff.jacobian(f, input, cfg::JacobianConfig = JacobianConfig(input))
If input
is an AbstractArray
, assume f
has the form f(::AbstractArray{<:Real})::AbstractArray{<:Real}
and return J(f)(input)
.
If input
is a tuple of AbstractArray
s, assume f
has the form f(::AbstractArray{<:Real}...)::AbstractArray{<:Real}
(such that it can be called as f(input...)
) and return a Tuple
where the i
th element is the Jacobian of f
w.r.t. input[i].
Note that cfg
can be preallocated and reused for subsequent calls.
If possible, it is highly recommended to use ReverseDiff.JacobianTape
to prerecord f
. Otherwise, this method will have to re-record f
's execution trace for every subsequent call.
ReverseDiff.jacobian(f!, output, input, cfg::JacobianConfig = JacobianConfig(output, input))
Exactly like ReverseDiff.jacobian(f, input, cfg)
, except the target function has the form f!(output::AbstractArray{<:Real}, input::AbstractArray{<:Real}...)
.
ReverseDiff.jacobian!
— FunctionReverseDiff.jacobian!(result, f, input, cfg::JacobianConfig = JacobianConfig(input))
Returns result
. This method is exactly like ReverseDiff.jacobian(f, input, cfg)
, except it stores the resulting Jacobian(s) in result
rather than allocating new memory.
result
can be an AbstractArray
or a Tuple
of AbstractArray
s. The result
(or any of its elements, if isa(result, Tuple)
), can also be a DiffResults.DiffResult
, in which case the primal value f(input)
(or f(input...)
, if isa(input, Tuple)
) will be stored in it as well.
ReverseDiff.jacobian!(result, f!, output, input, cfg::JacobianConfig = JacobianConfig(output, input))
Exactly like ReverseDiff.jacobian!(result, f, input, cfg)
, except the target function has the form f!(output::AbstractArray{<:Real}, input::AbstractArray{<:Real}...)
.
ReverseDiff.jacobian!(tape::Union{JacobianTape,CompiledJacobian}, input)
If input
is an AbstractArray
, assume tape
represents a function of the form f(::AbstractArray{<:Real})::AbstractArray{<:Real}
or f!(::AbstractArray{<:Real}, ::AbstractArray{<:Real})
and return tape
's Jacobian w.r.t. input
.
If input
is a tuple of AbstractArray
s, assume tape
represents a function of the form f(::AbstractArray{<:Real}...)::AbstractArray{<:Real}
or f!(::AbstractArray{<:Real}, ::AbstractArray{<:Real}...)
and return a Tuple
where the i
th element is tape
's Jacobian w.r.t. input[i].
Note that if tape
represents a function of the form f!(output, input...)
, you can only execute tape
with new input
values. There is no way to re-run tape
's tape with new output
values; since f!
can mutate output
, there exists no stable "hook" for loading new output
values into the tape.
ReverseDiff.jacobian!(result, tape::Union{JacobianTape,CompiledJacobian}, input)
Returns result
. This method is exactly like ReverseDiff.jacobian!(tape, input)
, except it stores the resulting Jacobian(s) in result
rather than allocating new memory.
result
can be an AbstractArray
or a Tuple
of AbstractArray
s. The result
(or any of its elements, if isa(result, Tuple)
), can also be a DiffResults.DiffResult
, in which case the primal value of the target function will be stored in it as well.
Hessians of f(x::AbstractArray{<:Real})::Real
ReverseDiff.hessian
— FunctionReverseDiff.hessian(f, input::AbstractArray, cfg::HessianConfig = HessianConfig(input))
Given f(input::AbstractArray{<:Real})::Real
, return f
s Hessian w.r.t. to the given input
.
Note that cfg
can be preallocated and reused for subsequent calls.
If possible, it is highly recommended to use ReverseDiff.HessianTape
to prerecord f
. Otherwise, this method will have to re-record f
's execution trace for every subsequent call.
ReverseDiff.hessian!
— FunctionReverseDiff.hessian!(result::AbstractArray, f, input::AbstractArray, cfg::HessianConfig = HessianConfig(input))
ReverseDiff.hessian!(result::DiffResult, f, input::AbstractArray, cfg::HessianConfig = HessianConfig(result, input))
Returns result
. This method is exactly like ReverseDiff.hessian(f, input, cfg)
, except it stores the resulting Hessian in result
rather than allocating new memory.
If result
is a DiffResults.DiffResult
, the primal value f(input)
and the gradient ∇f(input)
will be stored in it along with the Hessian H(f)(input)
.
ReverseDiff.hessian!(tape::Union{HessianTape,CompiledHessian}, input)
Assuming tape
represents a function of the form f(::AbstractArray{<:Real})::Real
, return the Hessian H(f)(input)
.
ReverseDiff.hessian!(result::AbstractArray, tape::Union{HessianTape,CompiledHessian}, input)
ReverseDiff.hessian!(result::DiffResult, tape::Union{HessianTape,CompiledHessian}, input)
Returns result
. This method is exactly like ReverseDiff.hessian!(tape, input)
, except it stores the resulting Hessian in result
rather than allocating new memory.
If result
is a DiffResults.DiffResult
, the primal value f(input)
and the gradient ∇f(input)
will be stored in it along with the Hessian H(f)(input)
.
The AbstractTape
API
ReverseDiff works by recording the target function's execution trace to a "tape", then running the tape forwards and backwards to propagate new input values and derivative information.
In many cases, it is the recording phase of this process that consumes the most time and memory, while the forward and reverse execution passes are often fast and non-allocating. Luckily, ReverseDiff provides the AbstractTape
family of types, which enable the user to pre-record a reusable tape for a given function and differentiation operation.
Note that pre-recording a tape can only capture the the execution trace of the target function with the given input values. Therefore, re-running the tape (even with new input values) will only execute the paths that were recorded using the original input values. In other words, the tape cannot any re-enact branching behavior that depends on the input values. You can guarantee your own safety in this regard by never using the AbstractTape
API with functions that contain control flow based on the input values.
Similarly to the branching issue, a tape is not guaranteed to capture any side-effects caused or depended on by the target function.
ReverseDiff.GradientTape
— TypeReverseDiff.GradientTape(f, input, cfg::GradientConfig = GradientConfig(input))
Return a GradientTape
instance containing a pre-recorded execution trace of f
at the given input
.
This GradientTape
can then be passed to ReverseDiff.gradient!
to take gradients of the execution trace with new input
values. Note that these new values must have the same element type and shape as input
.
See ReverseDiff.gradient
for a description of acceptable types for input
.
ReverseDiff.JacobianTape
— TypeReverseDiff.JacobianTape(f, input, cfg::JacobianConfig = JacobianConfig(input))
Return a JacobianTape
instance containing a pre-recorded execution trace of f
at the given input
.
This JacobianTape
can then be passed to ReverseDiff.jacobian!
to take Jacobians of the execution trace with new input
values. Note that these new values must have the same element type and shape as input
.
See ReverseDiff.jacobian
for a description of acceptable types for input
.
ReverseDiff.JacobianTape(f!, output, input, cfg::JacobianConfig = JacobianConfig(output, input))
Return a JacobianTape
instance containing a pre-recorded execution trace of f
at the given output
and input
.
This JacobianTape
can then be passed to ReverseDiff.jacobian!
to take Jacobians of the execution trace with new input
values. Note that these new values must have the same element type and shape as input
.
See ReverseDiff.jacobian
for a description of acceptable types for input
.
ReverseDiff.HessianTape
— TypeReverseDiff.HessianTape(f, input, cfg::HessianConfig = HessianConfig(input))
Return a HessianTape
instance containing a pre-recorded execution trace of f
at the given input
.
This HessianTape
can then be passed to ReverseDiff.hessian!
to take Hessians of the execution trace with new input
values. Note that these new values must have the same element type and shape as input
.
See ReverseDiff.hessian
for a description of acceptable types for input
.
ReverseDiff.compile
— FunctionReverseDiff.compile(t::AbstractTape)
Return a fully compiled representation of t
of type CompiledTape
. This object can be passed to any API methods that accept t
(e.g. gradient!(result, t, input)
).
In many cases, compiling t
can significantly speed up execution time. Note that the longer the tape, the more time compilation may take. Very long tapes (i.e. when length(t)
is on the order of 10000 elements) can take a very long time to compile.
The AbstractConfig
API
For the sake of convenience and performance, all "extra" information used by ReverseDiff's API methods is bundled up in the ReverseDiff.AbstractConfig
family of types. These types allow the user to easily feed several different parameters to ReverseDiff's API methods, such as work buffers and tape configurations.
ReverseDiff's basic API methods will allocate these types automatically by default, but you can reduce memory usage and improve performance if you preallocate them yourself.
ReverseDiff.GradientConfig
— TypeReverseDiff.GradientConfig(input, tp::InstructionTape = InstructionTape())
Return a GradientConfig
instance containing the preallocated tape and work buffers used by the ReverseDiff.gradient
/ReverseDiff.gradient!
methods.
Note that input
is only used for type and shape information; it is not stored or modified in any way. It is assumed that the element type of input
is same as the element type of the target function's output.
See ReverseDiff.gradient
for a description of acceptable types for input
.
ReverseDiff.GradientConfig(input, ::Type{D}, tp::InstructionTape = InstructionTape())
Like GradientConfig(input, tp)
, except the provided type D
is assumed to be the element type of the target function's output.
ReverseDiff.JacobianConfig
— TypeReverseDiff.JacobianConfig(input, tp::InstructionTape = InstructionTape())
Return a JacobianConfig
instance containing the preallocated tape and work buffers used by the ReverseDiff.jacobian
/ReverseDiff.jacobian!
methods.
Note that input
is only used for type and shape information; it is not stored or modified in any way. It is assumed that the element type of input
is same as the element type of the target function's output.
See ReverseDiff.jacobian
for a description of acceptable types for input
.
ReverseDiff.JacobianConfig(input, ::Type{D}, tp::InstructionTape = InstructionTape())
Like JacobianConfig(input, tp)
, except the provided type D
is assumed to be the element type of the target function's output.
ReverseDiff.JacobianConfig(output::AbstractArray, input, tp::InstructionTape = InstructionTape())
Return a JacobianConfig
instance containing the preallocated tape and work buffers used by the ReverseDiff.jacobian
/ReverseDiff.jacobian!
methods. This method assumes the target function has the form f!(output, input)
Note that input
and output
are only used for type and shape information; they are not stored or modified in any way.
See ReverseDiff.jacobian
for a description of acceptable types for input
.
ReverseDiff.JacobianConfig(result::DiffResults.DiffResult, input, tp::InstructionTape = InstructionTape())
A convenience method for JacobianConfig(DiffResults.value(result), input, tp)
.
ReverseDiff.HessianConfig
— TypeReverseDiff.HessianConfig(input::AbstractArray, gtp::InstructionTape = InstructionTape(), jtp::InstructionTape = InstructionTape())
Return a HessianConfig
instance containing the preallocated tape and work buffers used by the ReverseDiff.hessian
/ReverseDiff.hessian!
methods. gtp
is the tape used for the inner gradient calculation, while jtp
is used for outer Jacobian calculation.
Note that input
is only used for type and shape information; it is not stored or modified in any way. It is assumed that the element type of input
is same as the element type of the target function's output.
ReverseDiff.HessianConfig(input::AbstractArray, ::Type{D}, gtp::InstructionTape = InstructionTape(), jtp::InstructionTape = InstructionTape())
Like HessianConfig(input, tp)
, except the provided type D
is assumed to be the element type of the target function's output.
ReverseDiff.HessianConfig(result::DiffResults.DiffResult, input::AbstractArray, gtp::InstructionTape = InstructionTape(), jtp::InstructionTape = InstructionTape())
Like HessianConfig(input, tp)
, but utilize result
along with input
to construct work buffers.
Note that result
and input
are only used for type and shape information; they are not stored or modified in any way.
Optimization Annotations
ReverseDiff.@forward
— MacroReverseDiff.@forward(f)(args::Real...)
ReverseDiff.@forward f(args::Real...) = ...
ReverseDiff.@forward f = (args::Real...) -> ...
Declare that the given function should be differentiated using forward mode automatic differentiation. Note that the macro can be used at either the definition site or at the call site of f
. Currently, only length(args) <= 2
is supported. Note that, if f
is defined within another function g
, f
should not close over any differentiable input of g
. By using this macro, you are providing a guarantee that this property holds true.
This macro can be very beneficial for performance when intermediate functions in your computation are low dimensional scalar functions, because it minimizes the number of instructions that must be recorded to the tape. For example, take the function sigmoid(n) = 1. / (1. + exp(-n))
. Normally, using ReverseDiff to differentiate this function would require recording 4 instructions (-
, exp
, +
, and /
). However, if we apply the @forward
macro, only one instruction will be recorded (sigmoid
). The sigmoid
function will then be differentiated using ForwardDiff's Dual
number type.
This is also beneficial for higher-order elementwise function application. ReverseDiff overloads map
/broadcast
to dispatch on @forward
-applied functions. For example, map(@forward(f), x)
will usually be more performant than map(f, x)
.
ReverseDiff overloads many Base scalar functions to behave as @forward
functions by default. A full list is given by DiffRules.diffrules()
.
ReverseDiff.@skip
— MacroReverseDiff.@skip(f)(args::Real...)
ReverseDiff.@skip f(args::Real...) = ...
ReverseDiff.@skip f = (args::Real...) -> ...
Declare that the given function should be skipped during the instruction-recording phase of differentiation. Note that the macro can be used at either the definition site or at the call site of f
. Note that, if f
is defined within another function g
, f
should not close over any differentiable input of g
. By using this macro, you are providing a guarantee that this property holds true.
ReverseDiff overloads many Base scalar functions to behave as @skip
functions by default. A full list is given by ReverseDiff.SKIPPED_UNARY_SCALAR_FUNCS
and ReverseDiff.SKIPPED_BINARY_SCALAR_FUNCS
.
ChainRules integration
ReverseDiff.@grad_from_chainrules
— Macro@grad_from_chainrules f(args...; kwargs...)
The @grad_from_chainrules
macro provides a way to import adjoints(rrule) defined in ChainRules to ReverseDiff. One must provide a method signature to import the corresponding rrule
. In the provided method signature, one should replace the types of arguments to which one wants to take derivatives with respect with ReverseDiff.TrackedReal
and ReverseDiff.TrackedArray
respectively. For example, we can import rrule
of f(x::Real, y::Array)
like below:
ReverseDiff.@grad_from_chainrules f(x::TrackedReal, y::TrackedArray)
ReverseDiff.@grad_from_chainrules f(x::TrackedReal, y::Array)
ReverseDiff.@grad_from_chainrules f(x::Real, y::TrackedArray)