API Documentation

Rules

ChainRulesCore.frule — Method.

frule((Δf, Δx...), f, x...)

Expressing the output of f(x...) as Ω, return the tuple:

(Ω, ΔΩ)

The second return value is the differential w.r.t. the output.

If no method matching frule((Δf, Δx...), f, x...) has been defined, then return nothing.

Examples:

unary input, unary output scalar function:

julia> dself = NO_FIELDS;

julia> x = rand()
0.8236475079774124

julia> sinx, Δsinx = frule((dself, 1), sin, x)
(0.7336293678134624, 0.6795498147167869)

julia> sinx == sin(x)
true

julia> Δsinx == cos(x)
true

unary input, binary output scalar function:

julia> sincosx, Δsincosx = frule((dself, 1), sincos, x);

julia> sincosx == sincos(x)
true

julia> Δsincosx == (cos(x), -sin(x))
true

Rule Definition Tools

ChainRulesCore.@scalar_rule — Macro.

@scalar_rule(f(x₁, x₂, ...),
             @setup(statement₁, statement₂, ...),
             (∂f₁_∂x₁, ∂f₁_∂x₂, ...),
             (∂f₂_∂x₁, ∂f₂_∂x₂, ...),
             ...)

A convenience macro that generates simple scalar forward or reverse rules using the provided partial derivatives. Specifically, generates the corresponding methods for frule and rrule:

function ChainRulesCore.frule((NO_FIELDS, Δx₁, Δx₂, ...), ::typeof(f), x₁::Number, x₂::Number, ...)
    Ω = f(x₁, x₂, ...)
    $(statement₁, statement₂, ...)
    return Ω, (
            (∂f₁_∂x₁ * Δx₁ + ∂f₁_∂x₂ * Δx₂ + ...),
            (∂f₂_∂x₁ * Δx₁ + ∂f₂_∂x₂ * Δx₂ + ...),
            ...
        )
end

function ChainRulesCore.rrule(::typeof(f), x₁::Number, x₂::Number, ...)
    Ω = f(x₁, x₂, ...)
    $(statement₁, statement₂, ...)
    return Ω, ((ΔΩ₁, ΔΩ₂, ...)) -> (
            NO_FIELDS,
            ∂f₁_∂x₁ * ΔΩ₁ + ∂f₂_∂x₁ * ΔΩ₂ + ...),
            ∂f₁_∂x₂ * ΔΩ₁ + ∂f₂_∂x₂ * ΔΩ₂ + ...),
            ...
        )
end

If no type constraints in f(x₁, x₂, ...) within the call to @scalar_rule are provided, each parameter in the resulting frule/rrule definition is given a type constraint of Number. Constraints may also be explicitly be provided to override the Number constraint, e.g. f(x₁::Complex, x₂), which will constrain x₁ to Complex and x₂ to Number.

At present this does not support defining for closures/functors. Thus in reverse-mode, the first returned partial, representing the derivative with respect to the function itself, is always NO_FIELDS. And in forward-mode, the first input to the returned propagator is always ignored.

The result of f(x₁, x₂, ...) is automatically bound to Ω. This allows the primal result to be conveniently referenced (as Ω) within the derivative/setup expressions.

The @setup argument can be elided if no setup code is need. In other words:

@scalar_rule(f(x₁, x₂, ...),
             (∂f₁_∂x₁, ∂f₁_∂x₂, ...),
             (∂f₂_∂x₁, ∂f₂_∂x₂, ...),
             ...)

is equivalent to:

@scalar_rule(f(x₁, x₂, ...),
             @setup(nothing),
             (∂f₁_∂x₁, ∂f₁_∂x₂, ...),
             (∂f₂_∂x₁, ∂f₂_∂x₂, ...),
             ...)

For examples, see ChainRules' rulesets directory.

Differentials

ChainRulesCore.AbstractZero — Type.

AbstractZero <: AbstractDifferential

This is zero-like differential types. If a AD system encounter a propagator taking as input only subtypes of AbstractZero then it can stop performing any AD operations, as all propagator are linear functions, and thus the final result will be zero.

All AbstractZero subtypes are singleton types. There are two of them Zero() and DoesNotExist().

source

ChainRulesCore.DoesNotExist — Type.

DoesNotExist() <: AbstractZero

This differential indicates that the derivative does not exist. It is the differential for a Primal type that is not differentiable. Such an Integer, or Boolean (when not being used as a represention of a value that normally would be a floating point.) The only valid way to pertube such a values is to not change it at all. As such, DoesNotExist is functionally identical to Zero(), but provides additional semantic information.

If you are adding this differential to a primal then something is wrong. A optimization package making use of this might like to check for such a case.

!!! note: This does not indicate that the derivative it is not implemented, but rather that mathematically it is not defined.

This mostly shows up as the deriviative with respect to dimension, index, or size arguments.

    function rrule(fill, x, len::Int)
        y = fill(x, len)
        fill_pullback(ȳ) = (NO_FIELDS, @thunk(sum(Ȳ)), DoesNotExist())
        return y, fill_pullback
    end

source

ChainRulesCore.Zero — Type.

Zero() <: AbstractZero

The additive identity for differentials. This is basically the same as 0. A derivative of Zero(). does not propagate through the primal function.

source

ChainRulesCore.One — Type.

 One()

The Differential which is the multiplicative identity. Basically, this represents 1.

source

ChainRulesCore.NO_FIELDS — Constant.

NO_FIELDS

Constant for the reverse-mode derivative with respect to a structure that has no fields. The most notable use for this is for the reverse-mode derivative with respect to the function itself, when that function is not a closure.

source

ChainRulesCore.Composite — Type.

Composite{P, T} <: AbstractDifferential

This type represents the differential for a struct/NamedTuple, or Tuple. P is the the corresponding primal type that this is a differential for.

Composite{P} should have fields (technically properties), that match to a subset of the fields of the primal type; and each should be a differential type matching to the primal type of that field. Fields of the P that are not present in the Composite are treated as Zero.

T is an implementation detail representing the backing data structure. For Tuple it will be a Tuple, and for everything else it will be a NamedTuple. It should not be passed in by user.

For Composites of Tuples, iterate and getindex are overloaded to behave similarly to for a tuple. For Composites of structs, getproperty is overloaded to allow for accessing values via comp.fieldname. Any fields not explictly present in the Composite are treated as being set to Zero(). To make a Composite have all the fields of the primal the canonicalize function is provided.

source

ChainRulesCore.canonicalize — Method.

canonicalize(comp::Composite{P}) -> Composite{P}

Return the canonical Composite for the primal type P. The property names of the returned Composite match the field names of the primal, and all fields of P not present in the input comp are explictly set to Zero().

source

ChainRulesCore.InplaceableThunk — Type.

InplaceableThunk(val::Thunk, add!::Function)

A wrapper for a Thunk, that allows it to define an inplace add! function.

add! should be defined such that: ithunk.add!(Δ) = Δ .+= ithunk.val but it should do this more efficently than simply doing this directly. (Otherwise one can just use a normal Thunk).

Most operations on an InplaceableThunk treat it just like a normal Thunk; and destroy its inplacability.

source

ChainRulesCore.Thunk — Type.

Thunk(()->v)

A thunk is a deferred computation. It wraps a zero argument closure that when invoked returns a differential. @thunk(v) is a macro that expands into Thunk(()->v).

Calling a thunk, calls the wrapped closure. If you are unsure if you have a Thunk, call unthunk which is a no-op when the argument is not a Thunk. If you need to unthunk recursively, call extern, which also externs the differial that the closure returns.

julia> t = @thunk(@thunk(3))
Thunk(var"#4#6"())

julia> extern(t)
3

julia> t()
Thunk(var"#5#7"())

julia> t()()
3

When to @thunk?

When writing rrules (and to a lesser exent frules), it is important to @thunk appropriately. Propagation rules that return multiple derivatives may not have all deriviatives used. By @thunking the work required for each derivative, they then compute only what is needed.

How do thunks prevent work?

If we have res = pullback(...) = @thunk(f(x)), @thunk(g(x)) then if we did dx + res[1] then only f(x) would be evaluated, not g(x). Also if we did Zero() * res[1] then the result would be Zero() and f(x) would not be evaluated.

So why not thunk everything?

@thunk creates a closure over the expression, which (effectively) creates a struct with a field for each variable used in the expression, and call overloaded.

Do not use @thunk if this would be equal or more work than actually evaluating the expression itself.

For more details see the manual section on using thunks effectively

source

ChainRulesCore.unthunk — Method.

unthunk(x)

On AbstractThunks this removes 1 layer of thunking. On any other type, it is the identity operation.

In contrast to extern this is nonrecursive.

source

ChainRulesCore.@thunk — Macro.

@thunk expr

Define a Thunk wrapping the expr, to lazily defer its evaluation.

source

ChainRulesCore.extern — Method.

extern(x)

Makes a best effort attempt to convert a differential into a primal value. This is not always a well-defined operation. For two reasons:

It may not be possible to determine the primal type for a given differential.

For example, Zero is a valid differential for any primal.

The primal type might not be a vector space, thus might not be a valid differential type.

For example, if the primal type is DateTime, it's not a valid differential type as two DateTime can not be added (fun fact: Milisecond is a differential for DateTime).

Where it is defined the operation of extern for a primal type P should be extern(x) = zero(P) + x.

Note

Because of its limitations, extern should only really be used for testing. It can be useful, if you know what you are getting out, as it recursively removes thunks, and otherwise makes outputs more consistent with finite differencing.

The more useful action in general is to call +, or in the case of a Thunk to call unthunk.

Warning

extern may return an alias (not necessarily a copy) to data wrapped by x, such that mutating extern(x) might mutate x itself.

source

Internal

ChainRulesCore.AbstractDifferential — Type.

The subtypes of AbstractDifferential define a custom "algebra" for chain rule evaluation that attempts to factor various features like complex derivative support, broadcast fusion, zero-elision, etc. into nicely separated parts.

All subtypes of AbstractDifferential implement the following operations:

+(a, b): linearly combine differential a and differential b

*(a, b): multiply the differential b by the scaling factor a

Base.conj(x): complex conjugate of the differential x

Base.zero(x) = Zero(): a zero.

In general a differential type is the type of a derivative of a value. The type of the value is for contrast called the primal type. Differential types correspond to primal types, although the relation is not one-to-one. Subtypes of AbstractDifferential are not the only differential types. In fact for the most common primal types, such as Real or AbstractArray{Real} the the differential type is the same as the primal type.

In a circular definition: the most important property of a differential is that it should be able to be added (by defining +) to another differential of the same primal type. That allows for gradients to be accumulated.

It generally also should be able to be added to a primal to give back another primal, as this facilitates gradient descent.

source

ChainRulesCore.debug_mode — Function.

debug_mode() -> Bool

Determines if ChainRulesCore is in debug_mode. Defaults to false, but if the user redefines it to return true then extra information will be shown when errors occur.

Enable via:

ChainRulesCore.debug_mode() = true

source