This week I arXived a paper I wrote with Tin Lok James Ng (University of Wollongong), Quan Vu (University of Wollongong), and Maurizio Filippone (Eurecom), titled Deep Compositional Spatial Models. As the name implies, the paper describes a spatial model that is inspired by deep learning, specifically, deep Gaussian processes.
To understand why these deep spatial models might be interesting objects, we need a brief run through history. Sampson & Guttorp, back in 1992, published a paper in JASA promoting the idea of warping a spatial domain and modelling simple (stationary) processes on this warped domain. On this warped domain all process realisations appear to be simple and regular; when plotted on the unwarped domain though, they can appear rather complex. Spatial warping is thus a long standing method to model spatial processes that exhibit complicated behaviour (i.e., processes that are highly nonstationary).
Enter deep neural nets, which are designed to model complex warpings such as the one we envisage. The basic idea in our paper is to warp space using neural net architectures, and then proceed with modelling the process on this warped space. This has been done before in various guises, but what is new here is that we model elementary bijective warpings used to generate the global warping through composition as random (non-Gaussian) processes. Bijectivity is important, as it guarantees that the warping does not fold. As with other deep process models, we also take advantage of the technologies used in deep learning, such as TensorFlow.
The random bijective warping is achieved by modelling each warping layer as a trans-Gaussian process (since Gaussian maps cannot guarantee bijectivity). Therefore, each layer is itself a nonlinear transformation of a Gaussian process. While this may appear to complicate things, in fact, the sample paths of these bijective maps are very constrained and, somewhat paradoxically, I found that the deep (non-Gaussian) warping is very easy to fit since the solution space has practically been decimated to only contain those maps that are bijective.
So what we end up with is a model that has random warpings, that does not result in ‘space folding,’ and that is easy and very quick to fit using GPUs. To give you an idea, our 1D examples required only a couple of seconds for fitting and prediction, while our 2D examples required a couple of minutes. Compare this to the DGPRFF where we needed a day of careful optimisation to get a good fit, and MCMC (elliptical slice sampling) on a deep Gaussian process (ala Schmidt & O’Hagan) that needed nearly a week to ‘converge.’ The model is sufficiently complex, yet also sufficiently parsimonious, a win-win situation for spatial problems (or other very low-dimensional problems) where pathologies due to over-warping are likely.
From a spatial statistician’s point of view, what is attractive is that there is very little effort needed to model the stretches, anisotropies, etc., that worry us so much. The ability to capture the ‘directions of flow’ is remarkable. See the following pictures for example; in each one, the top-left panel shows the true warping; the top-right panel shows the recovered warping from simulated data (that is only identifiable up to a rigid transformation); and the bottom three panels show the truth, the prediction, and the prediction standard error, respectively. The prediction standard errors clearly follow the process ‘contours’ — this is desirable but not something you’d get using standard kriging or Gaussian process regression.
Each of the following two images compares the deep spatial model predictions (middle row), to those from a standard Gaussian process (bottom) when fitting MODIS image data (top-right, subsampled from the complete data, top-left). The first one is particularly interesting since the image is taken over Antarctica. Since luminescence over Antarctica is practically constant in this image, the warping collapses Antarctica to nearly a single point, so that uncertainty over Antarctica is practically nill. On the other hand the predictions and prediction errors capture the major directions of variation in the images. The difference of these predictions to those from a standard Gaussian process regression is striking.
All in all this was a very exciting project with some excellent colleagues who assisted in the writing and the simulation experiments. Maurizio, in addition, provided some valuable hands-on advice for getting stochastic variational Bayes to work, which was a crucial component of this work.
The idea for this project came from a very inspiring workshop I had attended at Imperial College, over 5 years ago, specifically from a talk by James Hensman on variational encoders. I remember walking back with James to the train station and discussing with him the models with a view on how they could be used in spatial statistics. I’m very glad to see they work so well in practice, so thank you James for the inspiration!