WOMBAT! (for estimating CO₂ sources and sinks)

In 2015, together with Noel Cressie, I started looking at the problem of estimating sources and sinks of greenhouse gases such as methane and carbon dioxide using ideas from spatial and spatio-temporal statistics. We quickly realised that this is an extremely difficult and inter-disciplinary problem, which requires a team with expertise in high-performance computing, atmospheric chemistry, and statistics.

In the early years (2015/2016) Noel and I did some in-roads in developing multivariate statistical models for use in regional flux inversion, drawing on expertise from the atmospheric chemists at my former host institution, the University of Bristol (Matt Rigby, Anita Ganesan, and Ann Stavert), and published two papers on this, one in the journal Chemometrics and Intelligent Laboratory Systems and another in the journal Spatial Statistics. Following this, we started looking into global inversions from OCO-2 (a NASA satellite) data. Progress was slow initially, but in 2018 Noel and I were successful in acquiring Australian Research Council grant money with which we managed to recruit Michael Bertolacci, a research fellow with the perfect skill set and inter-disciplinary mindset, and this is when the project really took off. Two years later, we are participants in the next global OCO-2 model intercomparison project (MIP) and contributors to a COP26 Global Carbon Stocktake product. Our final flux inversion system is called WOMBAT, which is short for the WOllongong Methodology for Bayesian Assimilation of Trace Gases, and a manuscript describing it has now been accepted for publication with Geophys ical Model Development. Other essential contributors to WOMBAT development include Jenny Fisher and Yi Cao, and we have had help from many others too as we acknowledge in the paper.

Wombat in Cradle Mountain, Australia. Photo by Meg Jerrard.

The Model

At the core of WOMBAT is a Bayesian hierarchical model (BHM) which establishes a flux process model, (i.e., a model for the sources and sinks), a mole-fraction process model (i.e., a model for the spatio-temporal evolving atmospheric constituent), and a data model, which characterises the relationship between the mole-fraction field and the data that are ultimately used to infer the fluxes. A summary of the BHM is given below:

The underlying Bayesian hierarchical model used in WOMBAT.

Each of these models is very simple to write down mathematically, but involves quite a lot of work to implement correctly. First, note from the above schematic that the flux process model involves a set of flux basis functions: these need to be extracted/constructed from several published inventories. We use inventories that describe ocean fluxes, fires, biospheric fluxes, and several others, so that the basis functions have spatio-temporal patterns that are reasonably similar to what could be expected from the underlying flux.

Second, the mole-fraction process model involves running a chemical-transport (numerical) model. The chemical-transport model we used is GEOS-Chem, which is a global 3-D model that can simulate the transport and chemical reactions of gases when supplied with meteorological data as input. Running chemical-transport models is nerve-wracking as they have a lot of settings and input parameters: a small mistake can result in a lot of wasted computing time or, even worse, go undetected. The chemical-transport model is needed to see what the effect of the CO₂ fluxes is on the atmospheric mole fraction, which is ultimately what the instruments are measuring. Of course we are relying on the model and the met fields we use as being ”good enough,” but we also account for some of the errors they introduce through v2 in the above schematic. In WOMBAT we acknowledge that these errors may be correlated in space and time, and employ a simple spatio-temporal covariance model to represent the correlation. This correlation has the effect of considerably increasing our uncertainty on the estimated fluxes (i.e., ignoring the correlation can lead to over-confident estimates).

Finally, the data themselves are highly heterogeneous. In this paper we focus on OCO-2 data, which are available to us as column-retrievals (i.e., the average dry air mole fraction of carbon dioxide in the atmosphere in a sometimes vertical, often skewed, column from the satellite to the surface of Earth). These data are actually the product of a retrieval algorithm which is itself an inversion algorithm, and are biased and have errors of their own. We account for all of this in the data model, which contains many parameters that need to be estimated. We use Markov chain Monte Carlo (Metropolis-within-Gibbs) to estimate all the parameters in our model.

Findings

So, what do we find out? First, from the methodological perspective we find that if biases are indeed present in the data, and that if correlations in the errors are not taken into account when they are really there, that flux estimates can be badly affected. Several inversion systems currently in place do not take these into account, largely because data are already largely de-biased prior to being used in an inversion, but primarily because taking into account correlations is computationally difficult. We also show that estimating biases while doing the inversion is possible, and that offline de-biasing can be complemented in this way. Second, from a science point of view, we show that our flux estimates are largely in line with those from other inversion systems and that (unsurprisingly!) carbon dioxide injection in the atmosphere is very high, with humans to blame for this. Below, we show our estimates of total carbon dioxide surface flux, split by land and ocean, excluding fossil fuel emissions, using data from a specific satellite mode (Land Glint [LG]). We also show the flux estimates from the first OCO-2 model intercomparison project (MIP), which largely agree with our estimates. Note how Earth absorbs about 4 PgC (i.e., petagrams of Carbon) per year on average. Unfortunately, humans are emitting on the order of 10 PgC per year of CO2 into the atmosphere. This big discrepancy in flux (approximately 6 PgC per year) means that carbon dioxide levels in the atmosphere will only keep on increasing, and have a deleterious effects on the climate we have learnt to live with. With WOMBAT, as with most flux inversion systems, we can also zoom in to see where and when most carbon dioxide is being emitted or absorbed. This will become a main focus of ours moving forward when preparing our submission to the COP26 Global Carbon Stocktake product.

Annual (left column) and monthly (right column) fluxes for the globe (first row), land (second row), and ocean (third row). Summaries of flux estimates from the model intercomparison project (MIP) and the flux estimate from WOMBAT are shown, split into the prior and LG inversions.

We are currently implementing several exciting upgrades to WOMBAT. For more information about the project, please do not hesitate to get in touch!

Convolution Neural Networks for Statistical Spatio-Temporal Forecasting

An article that I co-authored with Chris Wikle has just been published in a Special Issue edited by Alan Gelfand with the journal Spatial Statistics. An arXiv version is here while the reproducible code can be found here. The methodology in this paper is one which I find very exciting — it essentially takes a framework that has recently appeared in Machine Learning and sets it firmly within a hierarchical statistical modelling framework.

The idea behind it all is relatively simple. Spatio-temporal phenomena are generally governed by diffusions and advections that are constantly changing in time and in space. Modelling these spatio-temporally varying dynamics is hard, estimating them is even harder! Recently, de Bézenac et al. published this very insightful paper showing how, using ideas taken from solving the optical flow problem, one can train a convolution neural network that takes as input an image sequence exhibiting the spatio-temporal dynamics, and spouts out a map showing the spatially-varying advection parameters. The idea is simple: Let the CNN see what happened in the past few time instants and it will tell you in which direction things are flowing, and the strength at which they are flowing! They use a convolution neural network (CNN) to make this happen.

Their code was written in PyTorch, but since R has got an awesome TensorFlow API I decided to make my own code using their architecture and see if it works. I honestly am not sure how ‘deconvolution’ works in a CNN (which is part of their architecture), so what I did was implement a version of their CNN where I just go into a low-dimensional space and stay there (I then use basis functions to go back to my original domain). Without going into details, the architecture we ended up with looks like this:

Figure 1: CNN which takes a sequence of images as inputs and outputs parameters that can reconstruct spatially-varying diffusions and advections.

The performance was impressive right from the start… even on a few simple image sequences of spheres moving around a domain we could estimate dynamics really well.

But de Bézenac’s framework has a small problem — it only works for complete images which are perfectly observed. What happens if some of the pixels are missing or the images are noisy? Well, this is where hierarchical statistical models really come in useful. What we did was recast the CNN as a (high-order) state-space model, and run an ensemble Kalman filter using it as my ‘process model.’ The outcome is a framework which allows you to do extremely fast inference of spatio-temporal dynamic systems with little to no parameter estimation! (Disclaimer: when using a CNN that is already trained!)

To show the flexibility of this methodology we trained the network on about 80,000 sequences of (complete) images of sea-surface temperature from a publicly available data set (the same data set used by de Bézenac et al., see the image at the top of this page and the video here), and then used it to predict the advection parameters of a Gaussian kernel moving in space (which we just simulated ourselves). The result is impressive — even though none of the training images resemble a Gaussian kernel moving in space, the CNN can tell us where and in which direction ‘stuff’ is moving. The CNN has essentially learnt to understand basic physics from images.

Figure 2: Output flow direction and intensity from the fitted CNN (arrows) in response to a Gaussian radial basis function shifting in time (contours), with decreasing transparency used to denote the function at times t − 2, t − 1 and t, respectively.

All of this gets a bit more crazy — we used the CNN in combination with an ensemble Kalman filter to nowcast rainfall. Remember that we trained this CNN on sea-surface temperatures! The nowcasts were better in terms of both mean-squared prediction error and continuous-ranked probability score (CRPS) than anything we could throw at it (uncertainties appear to be very well-calibrated with this model). To add icing on the cake, it gave us the nowcast in a split second… Have a look at the article for more details.

Our book is now published and available online!

Our new book Spatio-Temporal Statistics with R has now been published with Chapman & Hall/CRC. It was a great few years writing this book with Chris Wikle and Noel Cressie, and I’m overjoyed to see the finished product. The book can be downloaded for free here. A hard-cover edition of the book can be purchased from Chapman & Hall/CRC here (Discount code: ASA18).

Summary

Book Cover

The world is becoming increasingly complex, with larger quantities of data available to be analyzed. It so happens that much of these “big data” that are available are spatio-temporal in nature, meaning that they can be indexed by their spatial locations and time stamps.

Spatio-Temporal Statistics with R provides an accessible introduction to statistical analysis of spatio-temporal data, with hands-on applications of the statistical methods using R Labs found at the end of each chapter. The book:

  • Gives a step-by-step approach to analyzing spatio-temporal data, starting with visualization, then statistical modelling, with an emphasis on hierarchical statistical models and basis function expansions, and finishing with model evaluation
  • Provides a gradual entry to the methodological aspects of spatio-temporal statistics
  • Provides broad coverage of using R as well as “R Tips” throughout.
  • Features detailed examples and applications in end-of-chapter Labs
  • Features “Technical Notes” throughout to provide additional technical detail where relevant
  • Supplemented by a website featuring the associated R package, data, reviews, errata, a discussion forum, and more

The book fills a void in the literature and available software, providing a bridge for students and researchers alike who wish to learn the basics of spatio-temporal statistics. It is written in an informal style and functions as a down-to-earth introduction to the subject. Any reader familiar with calculus-based probability and statistics, and who is comfortable with basic matrix-algebra representations of statistical models, would find this book easy to follow. The goal is to give as many people as possible the tools and confidence to analyze spatio-temporal data.

 

 

Notes from Spatial Statistics 2017 (conference)

I just returned from an inspiring Spatial Statistics conference held in Lancaster. In this conference I gave a contributed talk on fast computation of prediction variances using sparse linear algebraic methods, described in detail in this manuscript. Following the talk I had fruitful discussions with Doug Nychka and on the methods’ potential use in LatticeKrig in the future.

I am going to cut to the chase and talk about one of the most inspiring talks I ever heard, given by Peter Diggle at the end of the conference. There was nothing “new” in this talk, nor anything cutting edge. I would call it a “story” about a journey in data analysis, from data collection to policy implementation, that contained little nuggets of wisdom that could only come from someone with decades of experience in the field. Peter talked about the problem of river blindness in Africa, and the problem of the side-effects of treatment on people co-infected with Loa Loa. Important recurring messages in Peter’s talk included the following:

  • Never forget the importance of covariates in improving prediction ability in spatial models. The spatial residual field simply captures things we cannot explain.
  • Do not underestimate the difficulty in data collection, and in thinking of ways to improve this (recalling the image of a “data collector” on a motocross bike heading towards a remote town to talk to village elders!).
  • Never give up on trying to convey uncertainty to policy makers, and on using this to motivate improved experimental design.
  • Always try to include experts in the field who will likely tell you where your spatial predictions do not make sense!

The conference contained several interesting talks, too many to mention here. Some which struck me as particularly intriguing, because of their relevance to my own research, included those by

  • Doug Nychka on local modelling of climate. When modelling using precision matrices, gluing together the local models leads to a ‘valid’ global model. Local models are able to capture the variable variance and length scales apparent in climate models and are hence ideal for emulating them.
  • Samir Bhatt who talked about the application of machine learning techniques to spatial modelling. In particular, Samir made the important point that spatial statisticians often only model the covariance between 2 or 3 space-time variables: Why not model the covariances between 40 or so variables (including space and time) as is done in machine learning? He advocated the use of random Fourier features (essentially fitting the covariance function in spectral space), dropout for regularisation, and the use of TensorFlow and stochastic gradient descent for optimisation. I want to note here that the latter points are key in Goodfellow’s new book on Deep Learning and I agree here with Samir that we have a lot to learn in this regard. His final comment on learning to incorporate mechanistic models inside statistical models seemed to divide the audience and Peter noted that it was interesting that this comment came after a talk that was exclusively on black-box models.
  • Won Chang who talked about emulating an ice sheet model (PSU3d-ICE) in order to calibrate its parameters against observations. What was interesting is that calibration was based on both (i) time-series observations of grounding line position and (ii) spatial binary data denoting ice presence/absence (initially studied here). I queried Won as to whether it is possible to calibrate the basal friction coefficient, which is probably the (spatially-distributed) parameter that determines ice evolution the most. In his case a fingerprint method was used whereby the (spatial) pattern of the friction coefficient was fixed and one parameter was used to calibrate it. I have often wondered how one could effectively parameterise the highly spatially irregular friction coefficient for emulation and I think this is still an open question.
  • Jorge Mateu who described an ANOVA method for point patterns by testing whether the K functions are the same or different between groups. The test used was the integrated difference between the K functions which of course has no known distribution under the null hypothesis.
  • James Faulkner who talked about locally adaptive smoothing using Markov fields and shrinkage priors. The key prior distribution here was the horseshoe prior which I’m not too familiar with. James showed some impressive figures where the MRF could be used to recover discontinuities in the signal.
  • Denis Allard who extended the parsimonious multivariate Matern covariance functions of Gneiting et al. to the spatio-temporal case (based on Gneiting’s class of spatio-temporal covariance functions).

This is my second Spatial Statistics conference, and I’m glad to have attended. It is becoming increasingly difficult to choose between the several top-notch conferences on offer in the sort of research I do, but Spatial Statistics remains one of my favourites.