Reader small image

You're reading from  Bayesian Analysis with Python. - Second Edition

Product typeBook
Published inDec 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781789341652
Edition2nd Edition
Languages
Concepts
Right arrow
Author (1)
Osvaldo Martin
Osvaldo Martin
author image
Osvaldo Martin

Osvaldo Martin is a researcher at CONICET, in Argentina. He has experience using Markov Chain Monte Carlo methods to simulate molecules and perform Bayesian inference. He loves to use Python to solve data analysis problems. He is especially motivated by the development and implementation of software tools for Bayesian statistics and probabilistic modeling. He is an open-source developer, and he contributes to Python libraries like PyMC, ArviZ and Bambi among others. He is interested in all aspects of the Bayesian workflow, including numerical methods for inference, diagnosis of sampling, evaluation and criticism of models, comparison of models and presentation of results.
Read more about Osvaldo Martin

Right arrow

Posterior predictive checks

One of the nice elements of the Bayesian toolkit is that once we have a posterior, it is possible to use the posterior, , to generate predictions, , based on the data, , and the estimated parameters, . The posterior predictive distribution is:

Thus, the posterior predictive distribution is an average of conditional predictions over the posterior distribution of . Conceptually (and computationally), we approximate this integral 1.17 as an iterative two-step process:

  1. We sample a value of from the posterior,
  2. We feed that value of to the likelihood (or sampling distribution if you wish), thus obtaining a data point,
Notice how this process combines two sources of uncertainty: the parameters uncertainty; as captured by the posterior; and the sampling uncertainty; as captured by the likelihood.

The generated predictions, , can be used when we need to make, ahem, predictions. But also we can use them to criticize the models by comparing the observed data, , and the predicted data, , to spot differences between these two sets, this is known as posterior predictive checks. The main goal is to check for auto-consistency. The generated data and the observed data should look more or less similar, otherwise there was some problem during the modeling or some problem feeding the data to the model. But even in the absence of mistakes, differences could arise. Trying to understand the mismatch could lead us to improve models or at least to understand their limitations. Knowing which parts of our problem/data the model is capturing well and which it is not is valuable information even if we do not know how to improve the model. Maybe the model captures the mean behavior of our data well but fails to predict rare values. This could be problematic for us, or maybe we only care about the mean, so this model will be OK to us. The general aim is not to declare that a model is false. We just want to know which part of the model we can trust, and try to test whether the model is a good fit for our specific purpose. How confident one can be about a model is certainly not the same across disciplines. Physics can study systems under highly-controlled conditions using high-level theories, so models are often seen as good descriptions of reality. Other disciplines, such as sociology and biology, study complex, difficult-to-isolate systems, and thus models usually have a weaker epistemological status. Nevertheless, independent of which discipline you are working in, models should always be checked, and posterior predictive checks together with ideas from exploratory data analysis are a good way to check our models.

Previous PageNext Page
You have been reading a chapter from
Bayesian Analysis with Python. - Second Edition
Published in: Dec 2018Publisher: PacktISBN-13: 9781789341652
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Osvaldo Martin

Osvaldo Martin is a researcher at CONICET, in Argentina. He has experience using Markov Chain Monte Carlo methods to simulate molecules and perform Bayesian inference. He loves to use Python to solve data analysis problems. He is especially motivated by the development and implementation of software tools for Bayesian statistics and probabilistic modeling. He is an open-source developer, and he contributes to Python libraries like PyMC, ArviZ and Bambi among others. He is interested in all aspects of the Bayesian workflow, including numerical methods for inference, diagnosis of sampling, evaluation and criticism of models, comparison of models and presentation of results.
Read more about Osvaldo Martin