So far, we have focused on model building, interpretation of results and criticism of models. We have relied on the magic of the pm.sample function to compute the posterior distributions for us. Now we will focus on learning some of the details of the inference engines behind this function. The whole purpose of probabilistic programming tools, such as PyMC3, is that the user should not care about how sampling is carried out, but understanding how we get samples from the posterior is important for a full understanding of the inference process, and could also help us to get an idea of when and how these methods fail and what to do about it. If you are not interested in understanding how the methods for approximating the posterior works...
You're reading from Bayesian Analysis with Python. - Second Edition
Inference engines
While conceptually simple, Bayesian methods can be mathematically and numerically challenging. The main reason is that the marginal likelihood, the denominator in Bayes' theorem (see equation 1.4), usually takes the form of an intractable or computationally-expensive integral to solve. For this reason, the posterior is usually estimated numerically using algorithms from the Markov Chain Monte Carlo (MCMC) family or, more recently, variational algorithms. These methods are sometimes called inference engines, because, at least in principle, they are capable of approximating the posterior distribution for any probabilistic model. Even when in practice inference does not always work that well, the existence of such methods has motivated the development of probabilistic programming languages such as PyMC3.
The goal of probabilistic programming languages is to...
Non-Markovian methods
Let's start our discussion of inference engines with the non-Markovian methods. Under some circumstances, these methods can provide a fast and accurate enough approximation to the posterior.
Grid computing
Grid computing is a simple brute-force approach. Even if you are not able to compute the whole posterior, you may be able to compute the prior and the likelihood point-wise; this is a pretty common scenario, if not the most common one. Let's assume we want to compute the posterior for a single parameter model, the grid approximation is as follows:
- Define a reasonable interval for the parameter (the prior should give you a hint).
- Place a grid of points (generally equidistant) on that interval...
Markovian methods
There is a family of related methods, collectively known as MCMC methods. These stochastic methods allow us to get samples from the true posterior distribution as long as we are able to compute the likelihood and the prior point-wise. While this is the same condition that we need for the grid-approach, MCMC methods outperform the grid approximation. The is because MCMC methods are capable of taking more samples from higher-probability regions than lower ones. In fact, an MCMC method will visit each region of the parameter-space in accordance to their relative probabilities. If region A is twice as likely as region B, then we are going to get twice as many samples from A as from B. Hence, even if we are not capable of computing the whole posterior analytically, we could use MCMC methods to take samples from it.
At the most fundamental level, basically everything...
Diagnosing the samples
This section is focused on diagnostic samples for Metropolis and NUTS. Since we are approximating the posterior with a finite number of samples, is important to check whether we have a valid sample—otherwise any analysis from it will be totally flawed. There are several tests we can perform, some are visual and some are quantitative. These tests are designed to spot problems with our samples, but they are unable to prove we have the correct distribution; they can only provide evidence that the sample seems reasonable. If we find problems with the sample, the are many solutions to try:
- Increase the number of samples.
- Remove a number of samples from the beginning of the trace. This is known as burn-in. The PyMC3 tuning phase helps reduce the need for burn-in.
- Modify sampler parameters, such as increasing the length of the tuning phase, or increase...
Summary
In this chapter, we have taken a conceptual walk through some of the most common methods used to compute the posterior distribution, including variational methods and Markov Chain Monte Carlo methods. We have put special emphasis on universal inference engines, methods that are designed to work on any given model (or at least a broad range of models). These methods are the core of any probabilistic programming language as they allow for automatic inference, letting users concentrate on iterative model design and interpretations of the results. We also discussed numerical and visual tests for diagnosing samples. Without good approximations to the posterior distribution, all the advantages and flexibility of the Bayesian framework vanish, so evaluating the quality of the inference process is crucial for us to be confident of the quality of the inference process itself.
...Exercises
- Use the grid approach with other priors; for example, try with prior = (grid <= 0.5).astype(int) or prior = abs(grid - 0.5), or try defining your own crazy priors. Experiment with other data, such as increasing the total amount of data or making it more or less even in terms of the number of heads you observe.
- In the code we use to estimate π, keep N fixed and re-run the code a couple of times. Notice that the results are different because we are using random numbers, but also check that the errors are more or less in the same order. Try changing the number of N points and re-run the code. Can you guesstimate how the number of N points and the error are related? For a better estimation, you may want to modify the code to compute the error as a function of N. You can also run the code a few times with the same N and compute the mean error and standard deviation...