It is becoming increasingly common in large-scale environmental applications exhibiting various scales of interactions to let be large (
) but sparse (Lindgren, 2011). While the large dimension allows for considerable flexibility in representing spatio-temporal fields, the sparsity guarantees the use of numerical methods with favourable computational and memory complexities. One of these methods are the so-called Takahashi equations (Campbell, 1995), which compute recursively a partial inversion matrix (primarily to obtain the marginal variances on the diagonal) using only the sparse Cholesky factor of the precision matrix. This recursive update scheme is an integral component of several inference schemes, including INLA (Rue, 2009).
Recently, I came across one area in my work which could make excessive use of a property of the resulting incomplete inverse resulting from the Takahashi equations, to speed up computations. Unfortunately, the application for which this was intended did not materialise in practice (due to other problems emerging in the bigger picture), but this realisation deserves attention, hence this post.
The item of interest is the computation of the quantity where
,
is the incidence matrix (from the observation model
and
is the observation precision matrix which is assumed to be diagonal. This quantity would be needed, for example, if we need to compute (i) second-order statistics concerning parameters appearing in a fine-scale component (here absorbed in
), such as the stochastic volatility model of Katzfuss and Cressie (2012), or (ii) simply the posterior predictive variance at each observation location.
A useful identity allows us to re-write as
where is the Hadamard product. Although this might at first seem a trivial quantity to compute, in the large systems we are concerned with,
,
can be in the millions and
in the tens (or hundreds) of thousands. A standard backward-forward solve for
is thus intractable even with the use of fill-in-reduction permutations and although both
and
are sparse and the Cholesky factor of
is easily computed.
We deal with this computational bottleneck by first noting that since is usually very sparse, in effect only a few elements of
are needed in the right hand side of (1). Denote the covariance matrix with all unnecessary elements zeroed out as
. We can relate
to
using the zeroes function which we define as
Then implies that each element which is non-zero in
is also non-zero in
(but not vice-versa). For this problem it can then be shown using the left-hand side of (1) that
is given by
The right hand side of (2) implies that the only elements of the full posterior covariance matrix that are needed for computing (1) are those which are non-zero in
. This property is not of much help if we do not know
in the first place. However, we can exploit the fact that the elements of
that we can easily know, using the Takahashi equations, are a superset of those in
.
To see this, note that since we can assert the following
Further by Corollary 2.2 in Rue (2005)
where the subscript denotes the lower triangular matrix and
is the lower Cholesky factor of
. This is of significance as the Takahashi equations compute all values of
for which the Cholesky factor is not zero when evaluating the marginal variances (Rue, 2009). Denote this partial matrix inverse as
. Then we can assert that
The key result follows by combining deductions (2)—(5) and noting that is symmetric to obtain the inequality
Hence all the elements needed for the multiplication in (1) are already available in , which is relatively easy to compute using fast linear algebraic routines. This means that we can substitute
with
or, even better, with
, and effectively replace a huge forward-backward solve by a sparse matrix multiplication. A notable advantage is that this operation can be parallelised with minimal effort resulting in significant computational savings (in our implementation we made use of the OpenBLAS library, see previous post). As an indication of achievable speeds, for the problem, in the current study, using the partial matrix inverse for the evaluation of (1) required only 0.3s on a high-end desktop with 32GB of memory, despite the Cholesky factor being on the order of 0.5Gb. Not enough memory was available to compute (1) directly.
Thanks go to Jonathan Rougier for considerable help in organising the ideas for this post.