reverse_theory

View page source

The Theory of Reverse Mode

Taylor Notation

In Taylor notation, each variable corresponds to a function of a single argument which we denote by t (see Section 10.2 of Evaluating Derivatives ). Here and below \(X(t)\), \(Y(t)\), and Z ( t ) are scalar valued functions and the corresponding p-th order Taylor coefficients row vectors are \(x\), \(y\) and \(z\); i.e.,

\begin{eqnarray} X(t) & = & x^{(0)} + x^{(1)} * t + \cdots + x^{(p)} * t^p + O( t^{p+1} ) \\ Y(t) & = & y^{(0)} + y^{(1)} * t + \cdots + y^{(p)} * t^p + O( t^{p+1} ) \\ Z(t) & = & z^{(0)} + z^{(1)} * t + \cdots + z^{(p)} * t^p + O( t^{p+1} ) \end{eqnarray}

For the purposes of this discussion, we are given the p-th order Taylor coefficient row vectors \(x\), \(y\), and \(z\). In addition, we are given the partial derivatives of a scalar valued function

\[G ( z^{(j)} , \ldots , z^{(0)}, x, y)\]

We need to compute the partial derivatives of the scalar valued function

\[H ( z^{(j-1)} , \ldots , z^{(0)}, x, y) = G ( z^{(j)}, z^{(j-1)} , \ldots , z^{(0)}, x , y )\]

where \(z^{(j)}\) is expressed as a function of the j-1-th order Taylor coefficient row vector for \(Z\) and the vectors \(x\), \(y\); i.e., \(z^{(j)}\) above is a shorthand for

\[z^{(j)} ( z^{(j-1)} , \ldots , z^{(0)}, x, y )\]

If we do not provide a formula for a partial derivative of \(H\), then that partial derivative has the same value as for the function \(G\).

Binary Operators

Addition

The forward mode formula for Addition is

\[z^{(j)} = x^{(j)} + y^{(j)}\]

If follows that for \(k = 0 , \ldots , j\) and \(l = 0 , \ldots , j-1\)

\begin{eqnarray} \D{H}{ x^{(k)} } & = & \D{G}{ x^{(k)} } + \D{G}{ z^{(k)} } \\ \\ \D{H}{ y^{(k)} } & = & \D{G}{ y^{(k)} } + \D{G}{ z^{(k)} } \\ \D{H}{ z^{(l)} } & = & \D{G}{ z^{(l)} } \end{eqnarray}

Subtraction

The forward mode formula for Subtraction is

\[z^{(j)} = x^{(j)} - y^{(j)}\]

If follows that for \(k = 0 , \ldots , j\)

\begin{eqnarray} \D{H}{ x^{(k)} } & = & \D{G}{ x^{(k)} } - \D{G}{ z^{(k)} } \\ \\ \D{H}{ y^{(k)} } & = & \D{G}{ y^{(k)} } - \D{G}{ z^{(k)} } \end{eqnarray}

Multiplication

The forward mode formula for Multiplication is

\[z^{(j)} = \sum_{k=0}^j x^{(j-k)} * y^{(k)}\]

If follows that for \(k = 0 , \ldots , j\) and \(l = 0 , \ldots , j-1\)

\begin{eqnarray} \D{H}{ x^{(j-k)} } & = & \D{G}{ x^{(j-k)} } + \sum_{k=0}^j \D{G}{ z^{(j)} } y^{(k)} \\ \D{H}{ y^{(k)} } & = & \D{G}{ y^{(k)} } + \sum_{k=0}^j \D{G}{ z^{(j)} } x^{(j-k)} \end{eqnarray}

Division

The forward mode formula for Division is

\[z^{(j)} = \frac{1}{y^{(0)}} \left( x^{(j)} - \sum_{k=1}^j z^{(j-k)} y^{(k)} \right)\]

If follows that for \(k = 1 , \ldots , j\)

\begin{eqnarray} \D{H}{ x^{(j)} } & = & \D{G}{ x^{(j)} } + \D{G}{ z^{(j)} } \frac{1}{y^{(0)}} \\ \D{H}{ z^{(j-k)} } & = & \D{G}{ z^{(j-k)} } - \D{G}{ z^{(j)} } \frac{1}{y^{(0)}} y^{(k)} \\ \D{H}{ y^{(k)} } & = & \D{G}{ y^{(k)} } - \D{G}{ z^{(j)} } \frac{1}{y^{(0)}} z^{(j-k)} \\ \D{H}{ y^{(0)} } & = & \D{G}{ y^{(0)} } - \D{G}{ z^{(j)} } \frac{1}{y^{(0)}} \frac{1}{y^{(0)}} \left( x^{(j)} - \sum_{k=1}^j z^{(j-k)} y^{(k)} \right) \\ & = & \D{G}{ y^{(0)} } - \D{G}{ z^{(j)} } \frac{1}{y^{(0)}} z^{(j)} \end{eqnarray}

Standard Math Functions

The standard math functions have only one argument. Hence we are given the partial derivatives of a scalar valued function

\[G ( z^{(j)} , \ldots , z^{(0)}, x)\]

We need to compute the partial derivatives of the scalar valued function

\[H ( z^{(j-1)} , \ldots , z^{(0)}, x) = G ( z^{(j)}, z^{(j-1)} , \ldots , z^{(0)}, x)\]

where \(z^{(j)}\) is expressed as a function of the j-1-th order Taylor coefficient row vector for \(Z\) and the vector \(x\); i.e., \(z^{(j)}\) above is a shorthand for

\[z^{(j)} ( z^{(j-1)} , \ldots , z^{(0)}, x )\]

Contents