ReverseDiff API

Gradients of `f(x::AbstractArray{<:Real}...)::Real`

ReverseDiff.gradient — Function

ReverseDiff.gradient(f, input, cfg::GradientConfig = GradientConfig(input))

If input is an AbstractArray, assume f has the form f(::AbstractArray{<:Real})::Real and return ∇f(input).

If input is a tuple of AbstractArrays, assume f has the form f(::AbstractArray{<:Real}...)::Real (such that it can be called as f(input...)) and return a Tuple where the ith element is the gradient of f w.r.t. input[i].

Note that cfg can be preallocated and reused for subsequent calls.

If possible, it is highly recommended to use ReverseDiff.GradientTape to prerecord f. Otherwise, this method will have to re-record f's execution trace for every subsequent call.

ReverseDiff.gradient! — Function

ReverseDiff.gradient!(result, f, input, cfg::GradientConfig = GradientConfig(input))

Returns result. This method is exactly like ReverseDiff.gradient(f, input, cfg), except it stores the resulting gradient(s) in result rather than allocating new memory.

result can be an AbstractArray or a Tuple of AbstractArrays. The result (or any of its elements, if isa(result, Tuple)), can also be a DiffResults.DiffResult, in which case the primal value f(input) (or f(input...), if isa(input, Tuple)) will be stored in it as well.

ReverseDiff.gradient!(tape::Union{GradientTape,CompiledGradient}, input)

If input is an AbstractArray, assume tape represents a function of the form f(::AbstractArray)::Real and return ∇f(input).

If input is a tuple of AbstractArrays, assume tape represents a function of the form f(::AbstractArray...)::Real and return a Tuple where the ith element is the gradient of f w.r.t. input[i].

ReverseDiff.gradient!(result, tape::Union{GradientTape,CompiledGradient}, input)

Returns result. This method is exactly like ReverseDiff.gradient!(tape, input), except it stores the resulting gradient(s) in result rather than allocating new memory.

result can be an AbstractArray or a Tuple of AbstractArrays. The result (or any of its elements, if isa(result, Tuple)), can also be a DiffResults.DiffResult, in which case the primal value f(input) (or f(input...), if isa(input, Tuple)) will be stored in it as well.

Jacobians of `f(x::AbstractArray{<:Real}...)::AbstractArray{<:Real}`

ReverseDiff.jacobian — Function

ReverseDiff.jacobian(f, input, cfg::JacobianConfig = JacobianConfig(input))

If input is an AbstractArray, assume f has the form f(::AbstractArray{<:Real})::AbstractArray{<:Real} and return J(f)(input).

If input is a tuple of AbstractArrays, assume f has the form f(::AbstractArray{<:Real}...)::AbstractArray{<:Real} (such that it can be called as f(input...)) and return a Tuple where the ith element is the Jacobian of f w.r.t. input[i].

Note that cfg can be preallocated and reused for subsequent calls.

If possible, it is highly recommended to use ReverseDiff.JacobianTape to prerecord f. Otherwise, this method will have to re-record f's execution trace for every subsequent call.

ReverseDiff.jacobian(f!, output, input, cfg::JacobianConfig = JacobianConfig(output, input))

Exactly like ReverseDiff.jacobian(f, input, cfg), except the target function has the form f!(output::AbstractArray{<:Real}, input::AbstractArray{<:Real}...).

ReverseDiff.jacobian! — Function

ReverseDiff.jacobian!(result, f, input, cfg::JacobianConfig = JacobianConfig(input))

Returns result. This method is exactly like ReverseDiff.jacobian(f, input, cfg), except it stores the resulting Jacobian(s) in result rather than allocating new memory.

result can be an AbstractArray or a Tuple of AbstractArrays. The result (or any of its elements, if isa(result, Tuple)), can also be a DiffResults.DiffResult, in which case the primal value f(input) (or f(input...), if isa(input, Tuple)) will be stored in it as well.

ReverseDiff.jacobian!(result, f!, output, input, cfg::JacobianConfig = JacobianConfig(output, input))

Exactly like ReverseDiff.jacobian!(result, f, input, cfg), except the target function has the form f!(output::AbstractArray{<:Real}, input::AbstractArray{<:Real}...).

ReverseDiff.jacobian!(tape::Union{JacobianTape,CompiledJacobian}, input)

If input is an AbstractArray, assume tape represents a function of the form f(::AbstractArray{<:Real})::AbstractArray{<:Real} or f!(::AbstractArray{<:Real}, ::AbstractArray{<:Real}) and return tape's Jacobian w.r.t. input.

If input is a tuple of AbstractArrays, assume tape represents a function of the form f(::AbstractArray{<:Real}...)::AbstractArray{<:Real} or f!(::AbstractArray{<:Real}, ::AbstractArray{<:Real}...) and return a Tuple where the ith element is tape's Jacobian w.r.t. input[i].

Note that if tape represents a function of the form f!(output, input...), you can only execute tape with new input values. There is no way to re-run tape's tape with new output values; since f! can mutate output, there exists no stable "hook" for loading new output values into the tape.

ReverseDiff.jacobian!(result, tape::Union{JacobianTape,CompiledJacobian}, input)

Returns result. This method is exactly like ReverseDiff.jacobian!(tape, input), except it stores the resulting Jacobian(s) in result rather than allocating new memory.

result can be an AbstractArray or a Tuple of AbstractArrays. The result (or any of its elements, if isa(result, Tuple)), can also be a DiffResults.DiffResult, in which case the primal value of the target function will be stored in it as well.

Hessians of `f(x::AbstractArray{<:Real})::Real`

ReverseDiff.hessian — Function

ReverseDiff.hessian(f, input::AbstractArray, cfg::HessianConfig = HessianConfig(input))

Given f(input::AbstractArray{<:Real})::Real, return fs Hessian w.r.t. to the given input.

Note that cfg can be preallocated and reused for subsequent calls.

If possible, it is highly recommended to use ReverseDiff.HessianTape to prerecord f. Otherwise, this method will have to re-record f's execution trace for every subsequent call.

ReverseDiff.hessian! — Function

ReverseDiff.hessian!(result::AbstractArray, f, input::AbstractArray, cfg::HessianConfig = HessianConfig(input))

ReverseDiff.hessian!(result::DiffResult, f, input::AbstractArray, cfg::HessianConfig = HessianConfig(result, input))

Returns result. This method is exactly like ReverseDiff.hessian(f, input, cfg), except it stores the resulting Hessian in result rather than allocating new memory.

If result is a DiffResults.DiffResult, the primal value f(input) and the gradient ∇f(input) will be stored in it along with the Hessian H(f)(input).

ReverseDiff.hessian!(tape::Union{HessianTape,CompiledHessian}, input)

Assuming tape represents a function of the form f(::AbstractArray{<:Real})::Real, return the Hessian H(f)(input).

ReverseDiff.hessian!(result::AbstractArray, tape::Union{HessianTape,CompiledHessian}, input)

ReverseDiff.hessian!(result::DiffResult, tape::Union{HessianTape,CompiledHessian}, input)

Returns result. This method is exactly like ReverseDiff.hessian!(tape, input), except it stores the resulting Hessian in result rather than allocating new memory.

If result is a DiffResults.DiffResult, the primal value f(input) and the gradient ∇f(input) will be stored in it along with the Hessian H(f)(input).

The `AbstractTape` API

ReverseDiff works by recording the target function's execution trace to a "tape", then running the tape forwards and backwards to propagate new input values and derivative information.

In many cases, it is the recording phase of this process that consumes the most time and memory, while the forward and reverse execution passes are often fast and non-allocating. Luckily, ReverseDiff provides the AbstractTape family of types, which enable the user to pre-record a reusable tape for a given function and differentiation operation.

Note that pre-recording a tape can only capture the the execution trace of the target function with the given input values. Therefore, re-running the tape (even with new input values) will only execute the paths that were recorded using the original input values. In other words, the tape cannot any re-enact branching behavior that depends on the input values. You can guarantee your own safety in this regard by never using the AbstractTape API with functions that contain control flow based on the input values.

Similarly to the branching issue, a tape is not guaranteed to capture any side-effects caused or depended on by the target function.

ReverseDiff.GradientTape — Type

ReverseDiff.GradientTape(f, input, cfg::GradientConfig = GradientConfig(input))

Return a GradientTape instance containing a pre-recorded execution trace of f at the given input.

This GradientTape can then be passed to ReverseDiff.gradient! to take gradients of the execution trace with new input values. Note that these new values must have the same element type and shape as input.

See ReverseDiff.gradient for a description of acceptable types for input.

ReverseDiff.JacobianTape — Type

ReverseDiff.JacobianTape(f, input, cfg::JacobianConfig = JacobianConfig(input))

Return a JacobianTape instance containing a pre-recorded execution trace of f at the given input.

This JacobianTape can then be passed to ReverseDiff.jacobian! to take Jacobians of the execution trace with new input values. Note that these new values must have the same element type and shape as input.

See ReverseDiff.jacobian for a description of acceptable types for input.

ReverseDiff.JacobianTape(f!, output, input, cfg::JacobianConfig = JacobianConfig(output, input))

Return a JacobianTape instance containing a pre-recorded execution trace of f at the given output and input.

This JacobianTape can then be passed to ReverseDiff.jacobian! to take Jacobians of the execution trace with new input values. Note that these new values must have the same element type and shape as input.

See ReverseDiff.jacobian for a description of acceptable types for input.

ReverseDiff.HessianTape — Type

ReverseDiff.HessianTape(f, input, cfg::HessianConfig = HessianConfig(input))

Return a HessianTape instance containing a pre-recorded execution trace of f at the given input.

This HessianTape can then be passed to ReverseDiff.hessian! to take Hessians of the execution trace with new input values. Note that these new values must have the same element type and shape as input.

See ReverseDiff.hessian for a description of acceptable types for input.

ReverseDiff.compile — Function

ReverseDiff.compile(t::AbstractTape)

Return a fully compiled representation of t of type CompiledTape. This object can be passed to any API methods that accept t (e.g. gradient!(result, t, input)).

In many cases, compiling t can significantly speed up execution time. Note that the longer the tape, the more time compilation may take. Very long tapes (i.e. when length(t) is on the order of 10000 elements) can take a very long time to compile.

The `AbstractConfig` API

For the sake of convenience and performance, all "extra" information used by ReverseDiff's API methods is bundled up in the ReverseDiff.AbstractConfig family of types. These types allow the user to easily feed several different parameters to ReverseDiff's API methods, such as work buffers and tape configurations.

ReverseDiff's basic API methods will allocate these types automatically by default, but you can reduce memory usage and improve performance if you preallocate them yourself.

ReverseDiff.GradientConfig — Type

ReverseDiff.GradientConfig(input, tp::InstructionTape = InstructionTape())

Return a GradientConfig instance containing the preallocated tape and work buffers used by the ReverseDiff.gradient/ReverseDiff.gradient! methods.

Note that input is only used for type and shape information; it is not stored or modified in any way. It is assumed that the element type of input is same as the element type of the target function's output.

See ReverseDiff.gradient for a description of acceptable types for input.

ReverseDiff.GradientConfig(input, ::Type{D}, tp::InstructionTape = InstructionTape())

Like GradientConfig(input, tp), except the provided type D is assumed to be the element type of the target function's output.

ReverseDiff.JacobianConfig — Type

ReverseDiff.JacobianConfig(input, tp::InstructionTape = InstructionTape())

Return a JacobianConfig instance containing the preallocated tape and work buffers used by the ReverseDiff.jacobian/ReverseDiff.jacobian! methods.

Note that input is only used for type and shape information; it is not stored or modified in any way. It is assumed that the element type of input is same as the element type of the target function's output.

See ReverseDiff.jacobian for a description of acceptable types for input.

ReverseDiff.JacobianConfig(input, ::Type{D}, tp::InstructionTape = InstructionTape())

Like JacobianConfig(input, tp), except the provided type D is assumed to be the element type of the target function's output.

ReverseDiff.JacobianConfig(output::AbstractArray, input, tp::InstructionTape = InstructionTape())

Return a JacobianConfig instance containing the preallocated tape and work buffers used by the ReverseDiff.jacobian/ReverseDiff.jacobian! methods. This method assumes the target function has the form f!(output, input)

Note that input and output are only used for type and shape information; they are not stored or modified in any way.

See ReverseDiff.jacobian for a description of acceptable types for input.

ReverseDiff.JacobianConfig(result::DiffResults.DiffResult, input, tp::InstructionTape = InstructionTape())

A convenience method for JacobianConfig(DiffResults.value(result), input, tp).

ReverseDiff.HessianConfig — Type

ReverseDiff.HessianConfig(input::AbstractArray, gtp::InstructionTape = InstructionTape(), jtp::InstructionTape = InstructionTape())

Return a HessianConfig instance containing the preallocated tape and work buffers used by the ReverseDiff.hessian/ReverseDiff.hessian! methods. gtp is the tape used for the inner gradient calculation, while jtp is used for outer Jacobian calculation.

Note that input is only used for type and shape information; it is not stored or modified in any way. It is assumed that the element type of input is same as the element type of the target function's output.

ReverseDiff.HessianConfig(input::AbstractArray, ::Type{D}, gtp::InstructionTape = InstructionTape(), jtp::InstructionTape = InstructionTape())

Like HessianConfig(input, tp), except the provided type D is assumed to be the element type of the target function's output.

ReverseDiff.HessianConfig(result::DiffResults.DiffResult, input::AbstractArray, gtp::InstructionTape = InstructionTape(), jtp::InstructionTape = InstructionTape())

Like HessianConfig(input, tp), but utilize result along with input to construct work buffers.

Note that result and input are only used for type and shape information; they are not stored or modified in any way.

Optimization Annotations

ReverseDiff.@forward — Macro

ReverseDiff.@forward(f)(args::Real...)
ReverseDiff.@forward f(args::Real...) = ...
ReverseDiff.@forward f = (args::Real...) -> ...

Declare that the given function should be differentiated using forward mode automatic differentiation. Note that the macro can be used at either the definition site or at the call site of f. Currently, only length(args) <= 2 is supported. Note that, if f is defined within another function g, f should not close over any differentiable input of g. By using this macro, you are providing a guarantee that this property holds true.

This macro can be very beneficial for performance when intermediate functions in your computation are low dimensional scalar functions, because it minimizes the number of instructions that must be recorded to the tape. For example, take the function sigmoid(n) = 1. / (1. + exp(-n)). Normally, using ReverseDiff to differentiate this function would require recording 4 instructions (-, exp, +, and /). However, if we apply the @forward macro, only one instruction will be recorded (sigmoid). The sigmoid function will then be differentiated using ForwardDiff's Dual number type.

This is also beneficial for higher-order elementwise function application. ReverseDiff overloads map/broadcast to dispatch on @forward-applied functions. For example, map(@forward(f), x) will usually be more performant than map(f, x).

ReverseDiff overloads many Base scalar functions to behave as @forward functions by default. A full list is given by DiffRules.diffrules().

ReverseDiff.@skip — Macro

ReverseDiff.@skip(f)(args::Real...)
ReverseDiff.@skip f(args::Real...) = ...
ReverseDiff.@skip f = (args::Real...) -> ...

Declare that the given function should be skipped during the instruction-recording phase of differentiation. Note that the macro can be used at either the definition site or at the call site of f. Note that, if f is defined within another function g, f should not close over any differentiable input of g. By using this macro, you are providing a guarantee that this property holds true.

ReverseDiff overloads many Base scalar functions to behave as @skip functions by default. A full list is given by ReverseDiff.SKIPPED_UNARY_SCALAR_FUNCS and ReverseDiff.SKIPPED_BINARY_SCALAR_FUNCS.

ChainRules integration

ReverseDiff.@grad_from_chainrules — Macro

@grad_from_chainrules f(args...; kwargs...)

The @grad_from_chainrules macro provides a way to import adjoints(rrule) defined in ChainRules to ReverseDiff. One must provide a method signature to import the corresponding rrule. In the provided method signature, one should replace the types of arguments to which one wants to take derivatives with respect with ReverseDiff.TrackedReal and ReverseDiff.TrackedArray respectively. For example, we can import rrule of f(x::Real, y::Array) like below:

ReverseDiff.@grad_from_chainrules f(x::TrackedReal, y::TrackedArray)
ReverseDiff.@grad_from_chainrules f(x::TrackedReal, y::Array)
ReverseDiff.@grad_from_chainrules f(x::Real, y::TrackedArray)