Multi-Scale Process Modelling and Distributed Computation for Spatial Data

I recently published a paper, joint with Jonathan Rougier, on modelling, and computing with, multi-scale spatial processes (click here for the arXiv version). A multi-scale spatial process is, simply put, a superposition of processes with differing spatial scales, where each component process has its own ‘spectral signature,’ or ‘frequency bandwidth.’ The first process captures coarse scales. The second process captures finer scales, and so on.

Global map of sea-surface temperature anomaly produced from approx. 1,000,000 data points, using a spatial process model with two scales. Left panel: Prediction (posterior mean). Right panel: Uncertainty (posterior standard error).

Computing with multi-scale spatial processes is not easy. A straightforward way to proceed would be to introduce some notion of independence at the finer scales, by first tiling up the process and then separately computing with variables and parameters associated with these tiles. While this approach brings with it an ease for modelling nonstationarity, it does not yield a realistic, global, model; indeed the process at the finer scale is deemed to be discontinuous at the boundaries of the tiles. The main aim of this work was to come up with a computational strategy to avoid the independence assumption, while allowing one to model nonstationarity at the finer scales.

The strategy we converged on is the following:

  1. Model the process as a sum of independent Gaussian Markov random fields (GMRFs) to take advantage of sparse linear algebraic libraries when making inference.
  2. Associate each GMRF with a spatial scale, and use informative prior distributions to ‘separate’ the scales associated with each GMRF.
  3. Model the first (coarse) scale GMRF as stationary, and subsequent GMRFs as nonstationary, with increasing degrees of nonstationarity (faster fluctuating length scales, etc.) with decreasing scale.
  4. Define each GMRF over a tessellation of the domain of interest, where finer tessellations are associated with GMRFs with finer scales.
  5. Use the conditional independence and distribution properties of GMRFs (exploiting graph colouring and collapsed distributions) to generate a Markov chain associated with the process that mixes well and converges quickly to the target (posterior) distribution.
A colouring of the spatial domain in the global sea-surface temperature application. Each colour contains process variables at the fine scale that are sampled concurrently within a Gibbs sampling framework.

This paper took many years in the making (about 6 years), partially because it was always a project ‘on the side,’ but also because we were not happy with early iterations of the model. In the first model iteration we only assumed one scale, and simply took advantage of the computational benefits of GMRFs via graph colouring to make inference computationally efficient, but the novelty associated with this single-scale model was questionable: both graph colouring and nonstationary GMRFs are not new, although to the best of my knowledge they have not been treated together.

In the second model iteration we introduced multiple scales, but assumed spatial block-independence at these finer scales. This created posterior standard errors and predictions that were very unrealistic, with sharp discontinuities at the boundaries. I still have an early image of the meshing strategy we adopted, which I show below.

Our second model iteration of a two-scale process on the sphere used a coarse mesh (with vertices at the blue dots, just about visible) superimposed on separately-constructed fine meshes (with vertices at the yellow dots). Overlap of the fine meshes was introduced to reduce the effect of boundary conditions.

In the third model iteration we considered global models for the finer scales as well, and used a graph colouring scheme together with a small approximation to the conditional distribution (eq. 11 in the paper) to make inference on the finer scale component of the processes. This worked well, but GMRF variables close to the boundaries of the tiles, created for the purpose of graph colouring, did not mix well. Furthermore, there was no smoothness constraint on the spatial variances and length scales, which is problematic. I found an image of this early attempt, where I show serious issues with the prediction standard error at the boundaries of the tiles used for graph colouring, together with unrealistic heterogeneity..

Prediction standard errors using our third model iteration, which did not have any smoothness constraints on the spatially-varying variances and length scales, and did not allow for good mixing of the Markov chain at the boundaries of the tiles used for graph colouring. Note how the poor mixing manifests itself in poor estimates of the prediction standard error at the tile boundaries.

In the fourth model iteration we used basis functions to model the variances and length scales of the GMRF at the finer scale, and introduced a series of rotating tiling/colouring schemes. The latter modification solved the mixing issues we had at the boundaries. This final model ‘felt right;’ it worked well, both from a representational point of view, and a computational point of view. The posterior prediction and prediction standard error maps using this fourth iteration are shown at the beginning of this blog post, where we effectively model (and not just predict) globally the sea-surface temperature at a 30 x 30 km resolution (we can predict at much higher resolutions if we want to).

Three tilings of our domain used in our Gibbs sampler to alleviate the problem of poor mixing at tile boundaries.

All in all this project was very rewarding, and I believe it has provided a demonstrable advance in spatial modelling and computation. The results compellingly show an improvement over other state-of-the-art methods for spatial prediction. The only real downside of this approach is the effort required to implement it: it requires one to construct informative prior distributions, manually tessellate the domain, and run a Gibbs sampler for several days for a problem with approximately 1,000,000 data points! Options are currently being explored to automate and speed up the modelling and fitting procedures, but, at the time of writing, there does not seem to be an easy way forward.

Reproducible code for this project is available here.