Partial matrix inverse for efficient computation of posterior predictive variances

It is becoming increasingly common in large-scale environmental applications exhibiting various scales of interactions to let Q be large (n > 10000) but sparse (Lindgren, 2011). While the large dimension allows for considerable flexibility in representing spatio-temporal fields, the sparsity guarantees the use of numerical methods with favourable computational and memory complexities. One of these methods are the so-called Takahashi equations (Campbell, 1995), which compute recursively a partial inversion matrix (primarily to obtain the marginal variances on the diagonal) using only the sparse Cholesky factor of the precision matrix. This recursive update scheme is an integral component of several inference schemes, including INLA (Rue, 2009).

Recently, I came across one area in my work which could make excessive use of a property of the resulting incomplete inverse resulting from the Takahashi equations, to speed up computations. Unfortunately, the application for which this was intended did not materialise in practice (due to other problems emerging in the bigger picture), but this realisation deserves attention, hence this post.

The item of interest is the computation of the quantity \textrm{diag}(C\Sigma^*C^T) where \Sigma^* = (C^TQ_oC + Q)^{-1}, C is the incidence matrix (from the observation model z = Cx + e and Q_o is  the observation precision matrix which is assumed to be diagonal. This quantity would be needed, for example, if we need to compute (i) second-order statistics concerning parameters appearing in a fine-scale component (here absorbed in e), such as the stochastic volatility model of Katzfuss and Cressie (2012), or (ii) simply the posterior predictive variance at each observation location.

A useful identity allows us to re-write \textrm{diag}(C\Sigma^*C^T) as 

\textrm{diag}(C\Sigma^* C^T) \equiv [C\Sigma^*\circ C]1~~~~ (1)

where \circ is the Hadamard product. Although this might at first seem a trivial quantity to compute, in the large systems we are concerned with, C \in \mathbb{R}^{m \times n}, m can be in the millions and n in the tens (or hundreds) of thousands. A standard backward-forward solve for C\Sigma^* is thus intractable even with the use of fill-in-reduction permutations and although both C and Q^* = \Sigma^{*^{-1}} are sparse and the Cholesky factor of Q^* is easily computed.

We deal with this computational bottleneck by first noting that since C is usually very sparse, in effect only a few elements of \Sigma^* are needed in the right hand side of (1). Denote the covariance matrix with all unnecessary elements zeroed out as \Sigma^{**}. We can relate \Sigma^{**} to \Sigma^* using the zeroes function which we define as

zeroes(\Sigma)_{ij} = \left\{\begin{array}{l} 0 ~~~~ \Sigma_{ij} = 0\\ 1 ~~~~ \Sigma_{ij} \ne 0 \end{array} \right.

Then zeroes(\Sigma^{**}) < zeroes(\Sigma^*) implies that each element which is non-zero in \Sigma^{**} is also non-zero in \Sigma^* (but not vice-versa). For this problem it can then be shown using the left-hand side of (1) that zeroes(\Sigma^{**}) is given by

zeroes(\Sigma^{**}) \equiv zeroes(C^T C)~~~~~(2)

The right hand side of (2) implies that the only elements of the full posterior covariance matrix \Sigma^* that are needed for computing (1) are those which are non-zero in (C^T C). This property is not of much help if we do not know \Sigma^* in the first place. However, we can exploit the fact that the elements of \Sigma^* that we can easily know, using the Takahashi equations, are a superset of those in \Sigma^{**}.

To see this, note that since Q^{*} = C^TQ_oC + Q we can assert the following

zeroes(C^T C) \le zeroes(Q^*)~~~~~(3)

Further by Corollary 2.2 in Rue (2005)

zeroes(Q^*_L) \le zeroes(L^*)~~~~~~(4)

where the subscript L denotes the lower triangular matrix and L^{*} is the lower Cholesky factor of Q^*. This is of significance as the Takahashi equations compute all values of \Sigma^* for which the Cholesky factor is not zero when evaluating the marginal variances (Rue, 2009). Denote this partial matrix inverse as \Sigma^{***}. Then we can assert that

zeroes(L^*) \le zeroes(\Sigma_L^{***})~~~~~~~(5)

The key result follows by combining deductions (2)—(5) and noting that \Sigma^{***} is symmetric to obtain the inequality

zeroes(\Sigma^{**}) \le zeroes(\Sigma^{***})

Hence all the elements needed for the multiplication in (1) are already available in \Sigma^{***}, which is relatively easy to compute using fast linear algebraic routines. This means that we can substitute \Sigma^* with \Sigma^{***} or, even better, with (\Sigma^{***} \circ zeroes(C^T C)), and effectively replace a huge forward-backward solve by a sparse matrix multiplication. A notable advantage is that this operation can be parallelised with minimal effort resulting in significant computational savings (in our implementation we made use of the OpenBLAS library, see previous post). As an indication of achievable speeds, for the problem, in the current study, using the partial matrix inverse for the evaluation of (1) required only 0.3s on a high-end desktop with 32GB of memory, despite the Cholesky factor being on the order of 0.5Gb. Not enough memory was available to compute (1) directly.

Thanks go to Jonathan Rougier for considerable help in organising the ideas for this post.