You're reading from Bayesian Analysis with Python - Third Edition

Product type Book

Published in Jan 2024

Publisher Packt

ISBN-13 9781805127161

Pages 394 pages

Edition 3rd Edition

Languages

Python

Concepts

Machine Learning

Author (1):

Osvaldo Martin

Table of Contents (15) Chapters

Preface

1. Chapter 1 Thinking Probabilistically

2. Chapter 2 Programming Probabilistically

3. Chapter 3 Hierarchical Models

4. Chapter 4 Modeling with Lines

5. Chapter 5 Comparing Models

6. Chapter 6 Modeling with Bambi

7. Chapter 7 Mixture Models

8. Chapter 8 Gaussian Processes

9. Chapter 9 Bayesian Additive Regression Trees

10. Chapter 10 Inference Engines

11. Chapter 11 Where to Go Next

Join our community Discord space

12. Bibliography

13. Other Books You May Enjoy

14. Index

1.8 How to choose priors

Newcomers to Bayesian analysis (as well as detractors of this paradigm) are generally a little nervous about how to choose priors. Usually, they are afraid that the prior distribution will not let the data speak for itself! That’s OK, but we have to remember that data does not speak; at best, data murmurs. We can only make sense of data in the context of our models, including mathematical and mental models. There are plenty of examples in the history of science where the same data led people to think differently about the same topics, and this can happen even if you base your opinions on formal models.

Some people like the idea of using non-informative priors (also known as flat, vague, or diffuse priors). These priors have the least possible amount of impact on the analysis. While it is possible to use them for some problems deriving truly non-informative priors can be hard or just impossible. Additionally, we generally can do better as we usually have some prior information.

Throughout this book, we will follow the recommendations of Gelman, McElreath, Kruschke, and many others, and we will prefer weakly informative priors. For many problems, we often know something about the values a parameter can take. We may know that a parameter is restricted to being positive, or we may know the approximate range it can take, or whether we expect the value to be close to zero or below/above some value. In such cases, we can use priors to put some weak information in our models without being afraid of being too pushy. Because these priors work to keep the posterior distribution within certain reasonable bounds, they are also known as regularizing priors.

Informative priors are very strong priors that convey a lot of information. Using them is also a valid option. Depending on your problem, it could be easy or not to find good-quality information from your domain knowledge and turn it into priors. I used to work on structural bioinformatics. In this field, people have been using, in Bayesian and non-Bayesian ways, all the prior information they could get to study and predict the structure of proteins. This is reasonable because we have been collecting data from thousands of carefully designed experiments for decades and hence we have a great amount of trustworthy prior information at our disposal. Not using it would be absurd! There is nothing “objective” or “scientific” about throwing away valuable information. If you have reliable prior information, you should use it. Imagine if every time an automotive engineer had to design a new car, they had to start from scratch and reinvent the combustion engine, the wheel, and for that matter, the whole concept of a car.

PreliZ is a very new Python library for prior elicitation [Mikkola et al., 2023, Icazatti et al., 2023]. Its mission is to help you to elicit, represent, and visualize your prior knowledge. For instance, we can ask PreliZ to compute the parameters of a distribution satisfying a set of constraints. Let’s say we want to find the Beta distribution with 90% of the mass between 0.1 and 0.7, then we can write:

Code 1.7

dist = pz.Beta() 
pz.maxent(dist, 0.1, 0.7, 0.9)

The result is a Beta distribution with parameters α = 2.5 and β = 3.6 (rounded to the first decimal point). The pz.maxent function computes the maximum entropy distribution given the constraints we specified. Why maximum entropy distribution? Because that is equivalent to computing the least informative distribution under those constraints. By default, PreliZ will plot the distribution as shown here:

Figure 1.13: Maximum entropy Beta distribution with 90% of the mass between 0.1 and 0.7

As eliciting prior has many facets, PreliZ offers many other ways to elicit priors. If you are interested in learning more about PreliZ, you can check the documentation at https://preliz.readthedocs.io.

Building models is an iterative process; sometimes the iteration takes a few minutes, and sometimes it could take years. Reproducibility matters and transparent assumptions in a model contribute to it. We are free to use more than one prior (or likelihood) for a given analysis if we are not sure about any special one; exploring the effect of different priors can also bring valuable information to the table. Part of the modeling process is about questioning assumptions, and priors (and likelihoods) are just that. Different assumptions will lead to different models and probably different results. By using data and our domain knowledge of the problem, we will be able to compare models and, if necessary, decide on a winner. Chapter 5 will be devoted to this issue. Since priors have a central role in Bayesian statistics, we will keep discussing them as we face new problems. So if you have doubts and feel a little bit confused about this discussion, just keep calm and don’t worry, people have been confused for decades and the discussion is still going on.

You're reading from Bayesian Analysis with Python - Third Edition

Table of Contents (15) Chapters

1.8 How to choose priors

Authors (1)

Personalised recommendations for you