You're reading from Bayesian Analysis with Python - Third Edition

Product typeBook

Published inJan 2024

Reading LevelExpert

PublisherPackt

ISBN-139781805127161

Edition3rd Edition

Languages

Python

Concepts

Machine Learning

Author (1)

Osvaldo Martin

Chapter 6
Modeling with Bambi

A good tool improves the way you work. A great tool improves the way you think. – Jeff Duntemann

In Chapter 4, we described the basic ingredients of linear regression models and how to generalize them to better fit our needs. In this chapter, we are going to keep learning about linear models, but this time, we are going to work with Bambi [Capretto et al., 2022], a high-level Bayesian model-building interface written on top of PyMC. Bambi is designed to make it extremely easy to fit linear models, including hierarchical ones. We will see that Bambi’s domain is more comprehensive than just linear models.

We are going to learn about:

Using Bambi to build and fit models
Analyzing results with Bambi
Polynomial regression and splines
Distributional models
Categorical predictors
Interactions
Variable selection with Kulprit

6.1 One syntax to rule them all

PyMC has a very simple and expressive syntax that allows us to build arbitrary models. That’s usually a blessing, but it can be a burden too. Bambi instead focuses on regression models, and this restriction leads to a more focused syntax and features, as we will see.

Bambi uses a Wilkinson-formula syntax similar to the one used by many R packages like nlme, lme4, and brms. Let’s assume data is a pandas DataFrame like the one shown in Table 6.1.

	y	x	z	g
0	-0.633494	-0.196436	-0.355148	Group A
1	2.32684	0.0163941	-1.22847	Group B
2	0.999604	0.107602	-0.391528	Group C
3	-0.119111	0.804268	0.967253	Group A
4	2.07504	0.991417	0.590832	Group B
5	-0.412135	0.691132	-2.13044	Group C

Table 6.1: A dummy pandas DataFrame

Using this data, we want to build a linear model that predicts y from x. Using PyMC, we would do something like the model in the following code block:

Code 6.1

with pm.Model() as lm: 
  ...

6.2 The bikes model, Bambi’s version

The first model we are going to use to illustrate how to use Bambi is the bikes model from Chapter 4. We can load the data with:

Code 6.8

bikes = pd.read_csv("data/bikes.csv")

Now we can build and fit the model:

Code 6.9

model_t = bmb.Model("rented ∼ temperature", bikes, family="negativebinomial") 
idata_t = model_t.fit()

Figure 6.2 shows a visual representation of the model. If you want to visually inspect the priors, you can use model.plot_priors():

Figure 6.2: A visual representation of the bikes model, computed with the command model.graph()

Let’s now plot the posterior mean and the posterior predictive distribution (predictions). Omitting some details needed to make the plots look nice, the code to do this is:

Code 6.10

_, axes = plt.subplots(1, 2, sharey=True, figsize=(12, 4)) 
bmb.interpret.plot_predictions(model_t, idata_t, 
   ...

6.3 Polynomial regression

One way to fit curves using a linear regression model is by building a polynomial, like this:

μ = 𝛽0 + 𝛽1x + 𝛽2x2 + 𝛽3x3 + 𝛽4x4...𝛽mxm

We call m the degree of the polynomial.

There are two important things to notice. First, polynomial regression is still linear regression; the linearity refers to the coefficients (the βs), not the variables (the xs). The second thing to note is that we are creating new variables out of thin air. The only observed variable is x, the rest are just powers of x. Creating new variables from observed ones is a perfectly valid ”trick” when doing regression; sometimes the transformation can be motivated or justified by theory (like taking the square root of the length of babies), but sometimes it is just a way to fit a curve. The intuition with polynomials is that for a given value of x, the higher the degree of the polynomial, the more flexible the curve can be. A polynomial of degree 1 is a line, a polynomial of degree 2 is a curve that can go up or...

6.4 Splines

A general way to write very flexible models is to apply functions B_m to X_m and then multiply them by coefficients β_m:

μ = 𝛽0 + 𝛽1B1 (X1) + 𝛽2B2(X2 )+ ⋅⋅⋅+ 𝛽mBm (Xm )

We are free to pick B_m as we wish; for instance, we can pick polynomials. But we can also pick other functions. A popular choice is to use B-splines; we are not going to discuss their definition, but we can think of them as a way to create smooth curves in such a way that we get flexibility, as with polynomials, but less prone to overfitting. We achieve this by using piecewise polynomials, that is, polynomials that are restricted to affect only a portion of the data. Figure 6.6 shows three examples of piecewise polynomials of increasing degrees. The dotted vertical lines show the ”knots,” which are the points used to restrict the regions, the dashed gray line represents the function we want to approximate, and the black lines are the piecewise polynomials.

Figure 6.6: Piecewise polynomials of increasing degrees

Figure 6.7 shows...

6.5 Distributional models

We saw earlier that we can use linear models for parameters other than the mean (or location parameter). For example, we can use a linear model for the mean and a linear model for the standard deviation of a Gaussian distribution. These models are usually called distributional models. The syntax for distributional models is very similar; we just need to add a line for the auxiliary parameters we want to model. For instance, σ for a Gaussian, or α for a NegativeBinomial.

Let’s now reproduce an example from Chapter 4, the babies example:

Code 6.16

formula = bmb.Formula( 
    "length ∼ np.sqrt(month)", 
    "sigma ∼ month" 
) 
model_dis = bmb.Model(formula, babies) 
idata_dis = model_dis.fit()

Figure 6.9 shows the posterior distribution values of sigma for model_dis (varying sigma) and for a model with constant sigma. We can see...

6.6 Categorical predictors

A categorical variable represents distinct groups or categories that can take on a limited set of values from those categories. These values are typically labels or names that don’t possess numerical significance on their own. Some examples are:

Political affiliation: conservative, liberal, or progressive.
Sex: female or male.
Customer satisfaction level: very unsatisfied, unsatisfied, neutral, satisfied, or very satisfied.

Linear regression models can easily accommodate categorical variables; we just need to encode the categories as numbers. There are a few options to do so. Bambi can easily handle the details for us. The devil is in the interpretation of the results, as we will explore in the next two sections.

6.6.1 Categorical penguins

For the current example, we are going to use the palmerpenguins dataset, Horst et al. [2020], which contains 344 observations of 8 variables. For the moment, we are interested in modeling the mass of the...

6.7 Interactions

An interaction effect, or statistical interaction, happens when the effect of an independent variable on the response changes depending on the value of another independent variable. An interaction can occur between two or more variables. Some examples are:

Education level and income impact: Higher education may have a stronger positive effect on income for one gender compared to the other, resulting in an interaction between education and gender.
Medication efficacy and age: A drug that works better for older individuals than younger ones.
Exercise and diet effects on weight loss: It could be that the diet’s effect on weight loss is small for people who do little or no exercise and large for people who do moderate exercise.
Temperature and humidity for crop growth: Some crops could thrive in hot and humid conditions, while others might perform better in cooler and less humid environments.

We have an interaction when the combined effect of two or more variables...

6.8 Interpreting models with Bambi

We have been using bmb.interpret_plot_predictions a lot in this chapter. But that’s not the only tool that Bambi offers us to help us understand models. One of them is bmb.interpret_plot_comparisons. This tool helps us answer the question, ”What is the expected predictive difference when we compare two values of a given variable while keeping all the rest at constant values?”.

Let’s use model_int from the previous section, so we don’t need to fit a new model. We use the following code block to generate Figure 6.15:

Code 6.20

bmb.interpret.plot_comparisons(model_int, idata_int, 
                               contrast={"bill_depth":[1.4, 1.8]}, 
                ...

6.9 Variable selection

Variable selection refers to the process of identifying the most relevant variables in a model from a larger set of potential predictors. We perform variable selection under the assumption that only a subset of variables have a considerable impact on the outcome of interest, while others contribute little or no additional value.

Arguably the ”most Bayesian thing to do” when building a model is to include all the variables that we may think of in a single model and then use the posterior from that model to make predictions or gain an understanding of the relationships of the variables. This is the ”most Bayesian” approach because we are using as much data as possible and incorporating in the posterior the uncertainty about the importance of the variables. However, being more Bayesian than Bayes is not always the best idea. We already saw in Chapter 5 that Bayes factors can be problematic, even when they are a direct consequence of Bayes...

6.10 Summary

In this chapter, we have seen how to use Bambi to fit Bayesian models as an alternative to the pure PyMC model. We start with the simplest case, a model with a single predictor, and then move to more complex models, including polynomials, splines, distributional models, models with categorical predictors, and interactions.

The main advantage of Bambi is that it is very easy to use; it is very similar to R’s formula syntax. And internally, Bambi defines weakly informative priors and handles details that can be cumbersome for complex models. The main disadvantage is that it is not as flexible as PyMC. The range of models that Bambi can handle is a small subset of those from PyMC. Still, this subset contains many of the most commonly used statistical models in both industry and academia. The strength of Bambi is not just easy model building, but easier model interpretation. Across the chapter, we have seen how to use Bambi’s interpret module to gain a better understanding...

6.11 Exercises

Read the Bambi documentation ( https://bambinos.github.io/bambi/) and learn how to specify custom priors.
Apply what you learned in the previous point and specify a HalfNormal prior for the slope of model_t.
Define a model like model_poly4, but using raw polynomials, compare the coefficients and the mean fit of both models.
Explain in your own words what a distributional model is.
Expand model_spline to a distributional model. Use another spline to model the α parameter of the NegativeBinomial family.
Create a model named model_p2 for the body_mass with the predictors bill_length, bill_depth, flipper_length, and species.
Use LOO to compare the model in the previous point and model_p.
Use the functions in the interpret module to interpret model_p2. Use both plots and tables.

Join our community Discord space

Join our Discord community to meet like-minded people and learn alongside more than 5000 members at: https://packt.link/bayesian

The rest of the chapter is locked

You have been reading a chapter from

Bayesian Analysis with Python - Third Edition

Published in: Jan 2024Publisher: PacktISBN-13: 9781805127161

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at £13.99/month. Cancel anytime

Author (1)

Osvaldo Martin

Osvaldo Martin is a researcher at CONICET, in Argentina. He has experience using Markov Chain Monte Carlo methods to simulate molecules and perform Bayesian inference. He loves to use Python to solve data analysis problems. He is especially motivated by the development and implementation of software tools for Bayesian statistics and probabilistic modeling. He is an open-source developer, and he contributes to Python libraries like PyMC, ArviZ and Bambi among others. He is interested in all aspects of the Bayesian workflow, including numerical methods for inference, diagnosis of sampling, evaluation and criticism of models, comparison of models and presentation of results.
Read more about Osvaldo Martin

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Bayesian Analysis with Python - Third Edition

Chapter 6 Modeling with Bambi

6.1 One syntax to rule them all

6.2 The bikes model, Bambi’s version

6.3 Polynomial regression

6.4 Splines

6.5 Distributional models

6.6 Categorical predictors

6.6.1 Categorical penguins

6.7 Interactions

6.8 Interpreting models with Bambi

6.9 Variable selection

6.10 Summary

6.11 Exercises

Join our community Discord space

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook

Chapter 6
Modeling with Bambi