Multi-Scale Process Modelling and Distributed Computation for Spatial Data

I recently published a paper, joint with Jonathan Rougier, on modelling, and computing with, multi-scale spatial processes (click here for the arXiv version). A multi-scale spatial process is, simply put, a superposition of processes with differing spatial scales, where each component process has its own ‘spectral signature,’ or ‘frequency bandwidth.’ The first process captures coarse scales. The second process captures finer scales, and so on.

Global map of sea-surface temperature anomaly produced from approx. 1,000,000 data points, using a spatial process model with two scales. Left panel: Prediction (posterior mean). Right panel: Uncertainty (posterior standard error).

Computing with multi-scale spatial processes is not easy. A straightforward way to proceed would be to introduce some notion of independence at the finer scales, by first tiling up the process and then separately computing with variables and parameters associated with these tiles. While this approach brings with it an ease for modelling nonstationarity, it does not yield a realistic, global, model; indeed the process at the finer scale is deemed to be discontinuous at the boundaries of the tiles. The main aim of this work was to come up with a computational strategy to avoid the independence assumption, while allowing one to model nonstationarity at the finer scales.

The strategy we converged on is the following:

  1. Model the process as a sum of independent Gaussian Markov random fields (GMRFs) to take advantage of sparse linear algebraic libraries when making inference.
  2. Associate each GMRF with a spatial scale, and use informative prior distributions to ‘separate’ the scales associated with each GMRF.
  3. Model the first (coarse) scale GMRF as stationary, and subsequent GMRFs as nonstationary, with increasing degrees of nonstationarity (faster fluctuating length scales, etc.) with decreasing scale.
  4. Define each GMRF over a tessellation of the domain of interest, where finer tessellations are associated with GMRFs with finer scales.
  5. Use the conditional independence and distribution properties of GMRFs (exploiting graph colouring and collapsed distributions) to generate a Markov chain associated with the process that mixes well and converges quickly to the target (posterior) distribution.
A colouring of the spatial domain in the global sea-surface temperature application. Each colour contains process variables at the fine scale that are sampled concurrently within a Gibbs sampling framework.

This paper took many years in the making (about 6 years), partially because it was always a project ‘on the side,’ but also because we were not happy with early iterations of the model. In the first model iteration we only assumed one scale, and simply took advantage of the computational benefits of GMRFs via graph colouring to make inference computationally efficient, but the novelty associated with this single-scale model was questionable: both graph colouring and nonstationary GMRFs are not new, although to the best of my knowledge they have not been treated together.

In the second model iteration we introduced multiple scales, but assumed spatial block-independence at these finer scales. This created posterior standard errors and predictions that were very unrealistic, with sharp discontinuities at the boundaries. I still have an early image of the meshing strategy we adopted, which I show below.

Our second model iteration of a two-scale process on the sphere used a coarse mesh (with vertices at the blue dots, just about visible) superimposed on separately-constructed fine meshes (with vertices at the yellow dots). Overlap of the fine meshes was introduced to reduce the effect of boundary conditions.

In the third model iteration we considered global models for the finer scales as well, and used a graph colouring scheme together with a small approximation to the conditional distribution (eq. 11 in the paper) to make inference on the finer scale component of the processes. This worked well, but GMRF variables close to the boundaries of the tiles, created for the purpose of graph colouring, did not mix well. Furthermore, there was no smoothness constraint on the spatial variances and length scales, which is problematic. I found an image of this early attempt, where I show serious issues with the prediction standard error at the boundaries of the tiles used for graph colouring, together with unrealistic heterogeneity..

Prediction standard errors using our third model iteration, which did not have any smoothness constraints on the spatially-varying variances and length scales, and did not allow for good mixing of the Markov chain at the boundaries of the tiles used for graph colouring. Note how the poor mixing manifests itself in poor estimates of the prediction standard error at the tile boundaries.

In the fourth model iteration we used basis functions to model the variances and length scales of the GMRF at the finer scale, and introduced a series of rotating tiling/colouring schemes. The latter modification solved the mixing issues we had at the boundaries. This final model ‘felt right;’ it worked well, both from a representational point of view, and a computational point of view. The posterior prediction and prediction standard error maps using this fourth iteration are shown at the beginning of this blog post, where we effectively model (and not just predict) globally the sea-surface temperature at a 30 x 30 km resolution (we can predict at much higher resolutions if we want to).

Three tilings of our domain used in our Gibbs sampler to alleviate the problem of poor mixing at tile boundaries.

All in all this project was very rewarding, and I believe it has provided a demonstrable advance in spatial modelling and computation. The results compellingly show an improvement over other state-of-the-art methods for spatial prediction. The only real downside of this approach is the effort required to implement it: it requires one to construct informative prior distributions, manually tessellate the domain, and run a Gibbs sampler for several days for a problem with approximately 1,000,000 data points! Options are currently being explored to automate and speed up the modelling and fitting procedures, but, at the time of writing, there does not seem to be an easy way forward.

Reproducible code for this project is available here.

Convolution Neural Networks for Statistical Spatio-Temporal Forecasting

An article that I co-authored with Chris Wikle has just been published in a Special Issue edited by Alan Gelfand with the journal Spatial Statistics. An arXiv version is here while the reproducible code can be found here. The methodology in this paper is one which I find very exciting — it essentially takes a framework that has recently appeared in Machine Learning and sets it firmly within a hierarchical statistical modelling framework.

The idea behind it all is relatively simple. Spatio-temporal phenomena are generally governed by diffusions and advections that are constantly changing in time and in space. Modelling these spatio-temporally varying dynamics is hard, estimating them is even harder! Recently, de Bézenac et al. published this very insightful paper showing how, using ideas taken from solving the optical flow problem, one can train a convolution neural network that takes as input an image sequence exhibiting the spatio-temporal dynamics, and spouts out a map showing the spatially-varying advection parameters. The idea is simple: Let the CNN see what happened in the past few time instants and it will tell you in which direction things are flowing, and the strength at which they are flowing! They use a convolution neural network (CNN) to make this happen.

Their code was written in PyTorch, but since R has got an awesome TensorFlow API I decided to make my own code using their architecture and see if it works. I honestly am not sure how ‘deconvolution’ works in a CNN (which is part of their architecture), so what I did was implement a version of their CNN where I just go into a low-dimensional space and stay there (I then use basis functions to go back to my original domain). Without going into details, the architecture we ended up with looks like this:

Figure 1: CNN which takes a sequence of images as inputs and outputs parameters that can reconstruct spatially-varying diffusions and advections.

The performance was impressive right from the start… even on a few simple image sequences of spheres moving around a domain we could estimate dynamics really well.

But de Bézenac’s framework has a small problem — it only works for complete images which are perfectly observed. What happens if some of the pixels are missing or the images are noisy? Well, this is where hierarchical statistical models really come in useful. What we did was recast the CNN as a (high-order) state-space model, and run an ensemble Kalman filter using it as my ‘process model.’ The outcome is a framework which allows you to do extremely fast inference of spatio-temporal dynamic systems with little to no parameter estimation! (Disclaimer: when using a CNN that is already trained!)

To show the flexibility of this methodology we trained the network on about 80,000 sequences of (complete) images of sea-surface temperature from a publicly available data set (the same data set used by de Bézenac et al., see the image at the top of this page and the video here), and then used it to predict the advection parameters of a Gaussian kernel moving in space (which we just simulated ourselves). The result is impressive — even though none of the training images resemble a Gaussian kernel moving in space, the CNN can tell us where and in which direction ‘stuff’ is moving. The CNN has essentially learnt to understand basic physics from images.

Figure 2: Output flow direction and intensity from the fitted CNN (arrows) in response to a Gaussian radial basis function shifting in time (contours), with decreasing transparency used to denote the function at times t − 2, t − 1 and t, respectively.

All of this gets a bit more crazy — we used the CNN in combination with an ensemble Kalman filter to nowcast rainfall. Remember that we trained this CNN on sea-surface temperatures! The nowcasts were better in terms of both mean-squared prediction error and continuous-ranked probability score (CRPS) than anything we could throw at it (uncertainties appear to be very well-calibrated with this model). To add icing on the cake, it gave us the nowcast in a split second… Have a look at the article for more details.

Research Fellow (Postdoc) position at the University of Wollongong, Australia

The Centre for Environmental Informatics (CEI) in the School of Mathematics and Applied Statistics conducts world-leading research in high-level statistical methodology and statistical computing and their application to the environmental sciences.

The postdoctoral Research Fellow will work in the centre with Prof Noel Cressie, Dr Andrew Zammit Mangion, and environmental scientists on the recently awarded Australian Research Council (ARC) grant, “Bayesian inversion and computation applied to atmospheric flux fields.” The project aims to make use of unprecedented sources of measurements, from remote sensing and in situ data, to estimate the sources and sinks of greenhouse gases. This will be solved in the context of Bayesian spatio-temporal inversion and big-data computational strategies.

The requirements for this position are:

  • PhD degree in Statistics/Statistical Science
  • Strong programming skills in R, Python, or MATLAB
  • Demonstrated expertise in Bayesian methods and computation
  • Ability to work effectively in a multi-disciplinary team on the main UOW campus in Wollongong

Closing Date: 01/04/2019, 11:59:00 PM (AEDT)
Salary Details: Level B: $98,019 – $113,353 + 17% superannuation

 

For more details and to apply please visit here.

 

Notes from Spatial Statistics 2017 (conference)

I just returned from an inspiring Spatial Statistics conference held in Lancaster. In this conference I gave a contributed talk on fast computation of prediction variances using sparse linear algebraic methods, described in detail in this manuscript. Following the talk I had fruitful discussions with Doug Nychka and on the methods’ potential use in LatticeKrig in the future.

I am going to cut to the chase and talk about one of the most inspiring talks I ever heard, given by Peter Diggle at the end of the conference. There was nothing “new” in this talk, nor anything cutting edge. I would call it a “story” about a journey in data analysis, from data collection to policy implementation, that contained little nuggets of wisdom that could only come from someone with decades of experience in the field. Peter talked about the problem of river blindness in Africa, and the problem of the side-effects of treatment on people co-infected with Loa Loa. Important recurring messages in Peter’s talk included the following:

  • Never forget the importance of covariates in improving prediction ability in spatial models. The spatial residual field simply captures things we cannot explain.
  • Do not underestimate the difficulty in data collection, and in thinking of ways to improve this (recalling the image of a “data collector” on a motocross bike heading towards a remote town to talk to village elders!).
  • Never give up on trying to convey uncertainty to policy makers, and on using this to motivate improved experimental design.
  • Always try to include experts in the field who will likely tell you where your spatial predictions do not make sense!

The conference contained several interesting talks, too many to mention here. Some which struck me as particularly intriguing, because of their relevance to my own research, included those by

  • Doug Nychka on local modelling of climate. When modelling using precision matrices, gluing together the local models leads to a ‘valid’ global model. Local models are able to capture the variable variance and length scales apparent in climate models and are hence ideal for emulating them.
  • Samir Bhatt who talked about the application of machine learning techniques to spatial modelling. In particular, Samir made the important point that spatial statisticians often only model the covariance between 2 or 3 space-time variables: Why not model the covariances between 40 or so variables (including space and time) as is done in machine learning? He advocated the use of random Fourier features (essentially fitting the covariance function in spectral space), dropout for regularisation, and the use of TensorFlow and stochastic gradient descent for optimisation. I want to note here that the latter points are key in Goodfellow’s new book on Deep Learning and I agree here with Samir that we have a lot to learn in this regard. His final comment on learning to incorporate mechanistic models inside statistical models seemed to divide the audience and Peter noted that it was interesting that this comment came after a talk that was exclusively on black-box models.
  • Won Chang who talked about emulating an ice sheet model (PSU3d-ICE) in order to calibrate its parameters against observations. What was interesting is that calibration was based on both (i) time-series observations of grounding line position and (ii) spatial binary data denoting ice presence/absence (initially studied here). I queried Won as to whether it is possible to calibrate the basal friction coefficient, which is probably the (spatially-distributed) parameter that determines ice evolution the most. In his case a fingerprint method was used whereby the (spatial) pattern of the friction coefficient was fixed and one parameter was used to calibrate it. I have often wondered how one could effectively parameterise the highly spatially irregular friction coefficient for emulation and I think this is still an open question.
  • Jorge Mateu who described an ANOVA method for point patterns by testing whether the K functions are the same or different between groups. The test used was the integrated difference between the K functions which of course has no known distribution under the null hypothesis.
  • James Faulkner who talked about locally adaptive smoothing using Markov fields and shrinkage priors. The key prior distribution here was the horseshoe prior which I’m not too familiar with. James showed some impressive figures where the MRF could be used to recover discontinuities in the signal.
  • Denis Allard who extended the parsimonious multivariate Matern covariance functions of Gneiting et al. to the spatio-temporal case (based on Gneiting’s class of spatio-temporal covariance functions).

This is my second Spatial Statistics conference, and I’m glad to have attended. It is becoming increasingly difficult to choose between the several top-notch conferences on offer in the sort of research I do, but Spatial Statistics remains one of my favourites.

 

 

Our sea-level rise work featured in Scientific American

A couple of months ago in a post I explained how, according to our analysis, the gains in East Antarctica are not enough to offset losses in West Antarctica and hence Antarctica is an overall contributor to sea-level rise. The article giving the details contradicted to a large extent those given by Zwally et al. two years earlier, who obtained large positive mass balances for East Antarctica.

This week an article in Scientific American titled ‘What to Believe in Antarctica’s Great Ice Debate’ delves into the causes of the discrepancies between our study and that of Zwally et al. It makes a very interesting read, and for the statistically inclined it is a reminder of the importance of considering uncertainties at every stage in the analysis chain!

34fb967f-79c6-4fb3-b1c65094248b98121

Image Credit: Mario Tama Getty Images

 

 

 

 

Gains in mass in East Antarctica are not enough to offset losses in West Antarctica

Alba Martin-Español, Jonathan Bamber, and myself, have just published a paper investigating the mass balance (i.e. the gain and loss of ice) of the East Antarctic Ice Sheet. The analysis was carried out using spatio-temporal Bayesian hierarchical modelling, described in detail in a an Environmetrics paper we published in 2015. The model was constructed in such a way so as to combine prior knowledge on physical processes such as ice melt, with observed data, mostly from satellites.

East Antarctica is highly likely to be gaining mass overall and thus contributing to a sea-level drop. However gains here are not enough to offset the big losses currently being seen in West Antarctica (image source).

We used three types of data for this analysis, spanning just over a decade from 2003 to 2013. First, satellite altimetry, which is a technique for making very precise measurements of the changing height of the ice surface using either a laser or a radar fitted to satellites. Hence, altimetry data indicates regions where ice is thickening or thinning. Second, satellite gravimetry, which determines changes in ice sheet mass via repeated and very accurate measurement of the Earth’s gravity field using the two Gravity Recovery and Climate Experiment (GRACE) satellites. Changes in the gravitational pull of the Earth can also be a sign of thickening or thinning ice. Third, global positioning satellite (GPS) data from a network of fixed stations, which allow small changes in the vertical position of the bedrock under the ice sheet to be calculated. This is important, as the vertical motion of the bedrock, arising from the Earth’s crust re-adjusting to historical changes in ice and snow, need to be discounted from the current-day changes in ice thickness observed using altimetry and gravimetry data.

Two processes are known to be responsible for changes in mass of the Antarctic ice sheet: (i) surface mass balance (the difference between snowfall and melt through time), and (ii) ice sheet dynamics (the flow of ice from the interior to the margins of the continent). We undertook experiments using two different scenarios which made different assumptions about these two processes and the vertical motion of the bedrock.

The first scenario assumed that changes in ice mass driven by ice movement are less likely, though not impossible, in areas where the ice moves most slowly. If this is the case, we estimate an average increase in the total mass of the East Antarctic Ice Sheet of 57 gigatons per year between 2003 and 2013. It apportioned about a third of this change to ice movement, and two thirds to changes in surface mass balance.

The second scenario was that a long-term accumulation trend is causing ongoing thickening of the ice sheet in the interior of East Antarctica. In this case, a regional climate model was used to estimate changes in surface mass balance. Based on this assumption, we estimate the change in ice mass balance due to ice movement to be 3 to 5 times larger than in the first experiment. However, estimates of changes in the total mass of the East Antarctic Ice Sheet remained similar (an increase of 49 to 57 gigatons per year between 2003 and 2013).

Importantly for this study, and for our wider understanding of its contribution to global sea level rise now and in the future, we see that ice mass gains in East Antarctica were exceeded by ice mass losses in West Antarctica in all experiments carried out. This strongly indicates that the overall ice mass of Antarctica decreased between 2003 and 2013.

For more details read the paper here.

This post is a slightly modifed version of that appearing on our project website, www.globalmass.eu.

Global Challenges Travel (to the University of Bristol)

In February 2017 I was fortunate enough to spend a month in Bristol on a Travel award by the University of Wollongong Global Challenges Program, aimed at furthering an already existing
collaboration between the University of Wollongong and the University of Bristol. The collaboration is based on work stemming from my previous postdoctoral position at Bristol, and it is in the field of climate science and the statistics that enable it. In this post I discuss two projects which were at the focus of this visit.

The first project is based on carbon dioxide flux inversion which, in a nutshell, is any technique that allows us to determine where the carbon dioxide is coming from, and where it is going, just from readings of carbon dioxide at the surface or from satellites.

The process by which sources and sinks, that is, the flux, generate carbon dioxide (the generative process) is well-understood. Recovering the flux from measurements of carbon dioxide (inversion) is a much harder problem.


Together with Bristol, the Centre for Environmental Informatics at the University of Wollongong is looking at ways in which sophisticated statistical techniques can be used to take into account uncertainty at each of the numerous stages in flux inversion. This means taking into account uncertainty in the satellite retrieval right down to the uncertainty in how sinks and sources of carbon dioxide evolve in time. This work hopes to reconcile several of the existing more conventional approaches that frequently neglect uncertainties at every stage of the flux inversion process, and that may easily result in over-confident estimates of flux. The approach should also allow us to objectively assess to what extent we can reliably estimate flux from, say, satellite data, when taking all uncertainties into account.

With Bristol we are specifically focusing on the OCO-2 satellite, launched by NASA in July 2014. The satellite takes readings of column-averaged carbon dioxide. Given a set of flux fields, we can reproduce the column-averaged carbon dioxide by passing the flux through a transport model (a model which simulates the movement of carbon dioxide in the atmosphere) driven by meteorology fields. These ‘model-outputs’ were provided to us by the team at Bristol. From a statistical perspective, the problem of inversion is complex, as it involves the consideration of transport, as well as all the biases and uncertainties in OCO-2 retrievals, which can be significant. There is also a lot of data: OCO-2 generates on the order of millions usable retrievals per month; how to reliably assemble all these data within an inversion framework is still an open problem. My visit to Bristol proved invaluable in sorting out some of the teething issues that emerge when attempting to do the inversion from satellite data in this way. Our joint work will be presented at the European Geophysical Union in Vienna in April 2017. This project is carried out in collaboration with Noel Cressie (Wollongong), Ann Stavert (Bristol), Matt Rigby (Bristol), Anita Ganesan (Bristol), and Peter Rayner (Melbourne).

Artist’s rendering of NASA’s Orbiting Carbon Observatory (OCO)-2 (Source: NASA)


The second project is based on global sea-level rise. Sea-level rise is both a symptom of, and a contributor to, climate change, and one of the most concerning. For example, 85% of the Australian population lives within 50 km of the coast, and it’s not so much the sea-level rise in itself which is the problem, but the increased risk and impact of storm surges which accompany a higher sea level. Considerable infrastructure re-planning will be needed in the next few decades to counter the effects of what seems an unstoppable rise in sea level.

 

Bruce C. Douglas (1997). “Global Sea Rise: A Redetermination”. Surveys in Geophysics 18: 279–292. DOI:10.1023/A:1006544227856. Can be redistributed under the Creative Commons Attribution-Share Alike 3.0 Unported license.


The GlobalMass team based in Bristol is trying to answer a simple, yet unanswered, question — what are the contributors to global sea-level rise? Global sea-level rise can be due to a number of factors: (i) thermal expansion (as the ocean heats, it expands), (ii) change in salinity (salt increases the water density), (iii) hydrology (dams, rivers, etc.) and (iv) ice sheets and ice caps (as ice sheets melt, the sea level rises). Further, sea-level rise is highly spatially and temporally dependent, making it very difficult to assess what is contributing to it without a deep understanding of the physical processes driving it, and what their regional and temporal impacts are. In order to “source separate” the contributors, we incorporate a large variety of data sources, including Argo buoy data, gravimetry, altimetry and  tide gauges, all at once. The uncertainties, as well as the footprints of the observations, are all taken into account when solving for the separate sea-level rise contributions. This work, funded by the European Union, is a follow-on of the UK-funded RATES project which I worked on for two years, and which resulted in a statistical framework for identifying the contributors to sea-level rise just from Antarctica. This visit to Bristol helped to flesh out some of the statistical problems related to the GlobalMass project objective, and devise methods and a way forward to solve them.

The GlobalMass team consists of 8 researchers from the University of Bristol and project partners from the University of Wollongong, University of Tasmania, NASA, LEGOS, and the University of Colorado, Boulder. More information on the project can be found at globalmass.eu.

GlobalMass meeting, February 2017.