\(\newcommand{\W}[1]{ \; #1 \; }\) \(\newcommand{\R}[1]{ {\rm #1} }\) \(\newcommand{\B}[1]{ {\bf #1} }\) \(\newcommand{\D}[2]{ \frac{\partial #1}{\partial #2} }\) \(\newcommand{\DD}[3]{ \frac{\partial^2 #1}{\partial #2 \partial #3} }\) \(\newcommand{\Dpow}[2]{ \frac{\partial^{#1}}{\partial {#2}^{#1}} }\) \(\newcommand{\dpow}[2]{ \frac{ {\rm d}^{#1}}{{\rm d}\, {#2}^{#1}} }\)
cholesky_theory¶
View page sourceAD Theory for Cholesky Factorization¶
Reference¶
See section 3.6 of Sebastian F. Walter’s Ph.D. thesis, Structured Higher-Order Algorithmic Differentiation in the Forward and Reverse Mode with Application in Optimum Experimental Design , Humboldt-Universitat zu Berlin, 2011.
Notation¶
Cholesky Factor¶
We are given a positive definite symmetric matrix \(A \in \B{R}^{n \times n}\) and a Cholesky factorization
where \(L \in \B{R}^{n \times n}\) is lower triangular.
Taylor Coefficient¶
The matrix \(A\) is a function of a scalar argument \(t\). For \(k = 0 , \ldots , K\), we use \(A_k\) for the corresponding Taylor coefficients; i.e.,
where \(o( t^K ) / t^K \rightarrow 0\) as \(t \rightarrow 0\). We use a similar notation for \(L(t)\).
Lower Triangular Part¶
For a square matrix \(C\), \(\R{lower} (C)\) is the lower triangular part of \(C\), \(\R{diag} (C)\) is the diagonal matrix with the same diagonal as \(C\) and
Forward Mode¶
For Taylor coefficient order \(k = 0 , \ldots , K\) the coefficients \(A_k \in \B{R}^{n \times n}\), and satisfy the equation
In the case where \(k=0\), the
The value of \(L_0\) can be computed using the Cholesky factorization. In the case where \(k > 0\),
where
Note that \(B_k\) is defined in terms of Taylor coefficients of \(L(t)\) that have order less than \(k\). We also note that
The first matrix on the right hand side is lower triangular, the second is upper triangular, and the diagonals are equal. It follows that
This expresses \(L_k\) in term of the Taylor coefficients of \(A(t)\) and the lower order coefficients of \(L(t)\).
Lemma 1¶
We use the notation \(\dot{C}\) for the derivative of a matrix valued function \(C(s)\) with respect to a scalar argument \(s\). We use the notation \(\bar{S}\) and \(\bar{L}\) for the partial derivative of a scalar value function \(\bar{F}( S, L)\) with respect to a symmetric matrix \(S\) and an lower triangular matrix \(L\). Define the scalar valued function
We use \(\hat{S}\) for the total derivative of \(\hat{F}\) with respect to \(S\). Suppose that \(\hat{L} ( S )\) is such that
for any \(S(s)\). It follows that
where
Proof¶
We now consider the \((i, j)\) component function, for a symmetric matrix \(S(s)\), defined by
This shows that the formula in the lemma is correct for \(\hat{S}_{i,j}\) and \(\hat{S}_{j,i}\). This completes the proof because the component \((i, j)\) was arbitrary.
Lemma 2¶
We use the same assumptions as in Lemma 1 except that the matrix \(S\) is lower triangular (instead of symmetric). It follows that
where
The proof of this lemma is identical to Lemma 2 except that component function is defined by
Reverse Mode¶
k Equal To 0¶
For the case \(k = 0\),
It follows from Lemma 1 that
where
and \(\bar{A}_0\) is the partial before and after is before and after \(L_0\) is removed from the scalar function dependency.
k Greater Than 0¶
In the case where \(k > 0\),
where \(B_k\) is defined in terms of Taylor coefficients of \(L(t)\) that have order less than \(k\). It follows that
The matrix \(A_k\) is symmetric, it follows that
where
The matrix \(B_k\) is also symmetric, hence
We define the symmetric matrix \(C_k (s)\) by
and remove the dependency on \(C_k\) with
Thus, removing \(C_k\) from the dependency results in the following update to \(\bar{L}_0\):
which is the same as
We still need to remove \(B_k\) from the dependency. It follows from its definition that
We now use the fact that \(\bar{B}_k\) is symmetric to conclude
Each of the \(\dot{L}_\ell\) matrices is lower triangular. Thus, removing \(B_k\) from the dependency results in the following update for \(\ell = 1 , \ldots , k-1\):