ABC (approximate Bayesian computation) is getting a lot of hype in the blogosphere and, increasingly, in the ecological literature (e.g., Hartig et al., 2011; Csillery et al., 2010). The idea is pretty appealing: link your super-complex simulation explicitly to your data with something resembling (or not, as the case may be) statistical rigor.
I like ABC. It has the flexiblity to give me access to complex process models without having to go through the pain (or impossibility) of writing out my likelihood. But I see the warts: an approximate posterior is not ACTUALLY a posterior; we are not ACTUALLY doing Bayesian inference in the classical sense. And there are a lot of ways in which we could mess things up.
The goal of this post is to introduce ABC, go through a few of the ABC caveats, and talk about how to assess whether (or to what extent) assumptions are being violated.
We get ABC courtesy of population genetics (thanks Pritchard!), where it’s been under steady development since the late 1990s. Tina Toni’s Journal of the Royal Society Interface paper in 2009 brought it (and specifically sequential Monte Carlo approaches for ABC) into the biological modeling mainstream. There are a few references to approximate Bayesian computation in the statistical literature beginning in about 2005, but that community realy became interested in ABC between 2008 and 2010, due largely to a few high-impact crossover papers that focused on genetic problems (see for example Cornuet et al., 2010). It’s been on the rise in cross-disciplinary modeling projects since then.
Intro: What’s ABC?
ABC often gets applied in cases where data are generated via some complicated process. The complication could lie in either the biological process, the observational process, or both; doesn’t matter. The data generating model is so complicated that we can’t (or won’t) write down an actual likelihood. It’s assumed that despite this complexity in the process, we have some nice data that we think are produced by the process. The goal is to use those data to estimate the parameters governing the process.
I work with data that are likely generated by some deviant of an SIR (Susceptible – Infected – Recovered) model. SIR is a dynamical system where individuals transition from compartment to compartment. The model can sometimes have a whole bunch of parameters, and it’s fundamentally non-linear (the transition of individuals from S to I depends on how many individuals are in S and I at a given point in time). The data are (often) a batch of time-stamped disease occurrence observations. The challenge is to map the dynamical process to the empirical data and gain some meaningful insights into the process parameters.
The ABC approach to doing that follows these steps:
(1) Specify a summary measure on the data which you want to recapitulate in simulation.
(2) Simulate the process at a whole bunch of locations all over the possible parameter space.
(3) For each simulation, calculate a summary measure that is structurally identical to a summary measure calculated on the actual data. Reject the parameters that go with simulations whose summary measures don’t match the summary statistic from the actual data. The density of parameter values that weren’t rejected form an approximate posterior distribution for the parameters of interest.
There are a lot of flavors for implementing (2). The geneticists started off with a very simple rejection algorithm: simulate lots of parameters at randomly chosen locations, and throw out that ones that do a bad job, AKA ABC-rejection sampling. In the late 2000s, sequential Monte Carlo approaches came into vogue. These operate more or less like particle filters: (1) draw a batch of samples, (2) see what looks good, (3) draw more samples in the neighborhood of the samples that best matched the data in step 1, rinse, and repeat. Sampling at increasingly higher resolutions has the consequence of honing our examination of the posterior space to the regions with the highest approximate likelihood. Recently, methods are beginning to focus on incorporating hierarchical structures into ABC. By and large, these methods are still under development.
Issue number 1: Sufficient statistics are the best! (but what’s a sufficient statistic, again…?)
Several aspects of ABC (and especially that ever-desirable attribute, model selection) require sufficient statistics. In general, a sufficient statistic (for a given parameter) is a measure calculated on the data such that all relevant information in the dataset about that particular parameter is captured in the statistic. There’s an easy statistical trick to establish whether a given statistic is sufficient, called the factorization theorem, that allows a researcher to test sufficiency of any given statistic. However (and lucky for us), sufficient statistics for most commonly-used parameters are well-documented (for example, see the wiki list here). For a normal distribution with known variance, the sample mean is a sufficient statistic for the parameteric mean; for a Poisson distribution, the sum of the observations is a sufficient statistic for the rate parameter. For a sequence of independent and identically distributed (e.g., idd) Bernoulli trials, the sum of successes is a sufficient statistic for the success probability (given a known number of trials). The point is, we generally work with sufficient statitistics anyway, so running an ABC so that it matches sufficient statistics isn’t overwhelmingly difficult.
Issue number 2: How to explore your giant parameter space?
This is an area that has received a lot of attention. Toni et al. review this pretty thoroughly, but here’s the ten-second version. The basic strategy for many ABC approaches is to generate the posterior by running many very short chains, as opposed to a few very long chains. Sometimes (as in ABC rejection sampling), the chains are of length 1: draw a gazillion points from the joint prior, run a simulation for each draw, and build the posterior from simulations that produce statistics within some tolerance range of the empirical data. Sometimes (as in sequential Monte Carlo, AKA SMC), the chains are of length 3-5. Sample a bunch of initial points from the prior, simulate at those points, find the best matches to the empirical data, then resample from the prior in regions close to the best matches. Rinse and repeat a few times. Some new methods (e.g., Turner and Van Zandt, 2014) explore Gibbs-based approaches for sampling parameters in a hierarchical structure (where proposed parameters at different model levels respond to performance of parameters at other levels in the model). It’s worth noting that there are few (if any) examples of classical hierarchical modeling (as is implemented in lme4 in R, for example) meshed with ABC. I expect that will be an area of active work in the next few years.