\(\newcommand{\W}[1]{ \; #1 \; }\) \(\newcommand{\R}[1]{ {\rm #1} }\) \(\newcommand{\B}[1]{ {\bf #1} }\) \(\newcommand{\D}[2]{ \frac{\partial #1}{\partial #2} }\) \(\newcommand{\DD}[3]{ \frac{\partial^2 #1}{\partial #2 \partial #3} }\) \(\newcommand{\Dpow}[2]{ \frac{\partial^{#1}}{\partial {#2}^{#1}} }\) \(\newcommand{\dpow}[2]{ \frac{ {\rm d}^{#1}}{{\rm d}\, {#2}^{#1}} }\)
reverse_identity¶
View page sourceAn Important Reverse Mode Identity¶
The theorem and the proof below is a restatement of the results on page 236 of Evaluating Derivatives .
Notation¶
Given a function \(f(u, v)\) where \(u \in \B{R}^n\) we use the notation
Reverse Sweep¶
When using reverse mode we are given a function \(F : \B{R}^n \rightarrow \B{R}^m\), a matrix of Taylor coefficients \(x \in \B{R}^{n \times p}\), and a weight vector \(w \in \B{R}^m\). We define the functions \(X : \B{R} \times \B{R}^{n \times p} \rightarrow \B{R}^n\), \(W : \B{R} \times \B{R}^{n \times p} \rightarrow \B{R}\), and \(W_j : \B{R}^{n \times p} \rightarrow \B{R}\) by
where \(x^{(j)}\) is the j-th column of \(x \in \B{R}^{n \times p}\). The theorem below implies that
A general reverse sweep calculates the values
But the return values for a reverse sweep are specified in terms of the more useful values
Theorem¶
Suppose that \(F : \B{R}^n \rightarrow \B{R}^m\) is a \(p\) times continuously differentiable function. Define the functions \(Z : \B{R} \times \B{R}^{n \times p} \rightarrow \B{R}^n\), \(Y : \B{R} \times \B{R}^{n \times p }\rightarrow \B{R}^m\), and \(y^{(j)} : \B{R}^{n \times p }\rightarrow \B{R}^m\) by
where \(x^{(j)}\) denotes the j-th column of \(x \in \B{R}^{n \times p}\). It follows that for all \(i, j\) such that \(i \leq j < p\),
Proof¶
If follows from the definitions that
For \(k > i\), the k-th partial of \(t^i\) with respect to \(t\) is zero. Thus, the partial with respect to \(t\) is given by
Applying this formula to the case where \(j\) is replaced by \(j - i\) and \(i\) is replaced by zero, we obtain
which completes the proof