Reader small image

You're reading from  Bayesian Analysis with Python - Third Edition

Product typeBook
Published inJan 2024
Reading LevelExpert
PublisherPackt
ISBN-139781805127161
Edition3rd Edition
Languages
Right arrow
Author (1)
Osvaldo Martin
Osvaldo Martin
author image
Osvaldo Martin

Osvaldo Martin is a researcher at CONICET, in Argentina. He has experience using Markov Chain Monte Carlo methods to simulate molecules and perform Bayesian inference. He loves to use Python to solve data analysis problems. He is especially motivated by the development and implementation of software tools for Bayesian statistics and probabilistic modeling. He is an open-source developer, and he contributes to Python libraries like PyMC, ArviZ and Bambi among others. He is interested in all aspects of the Bayesian workflow, including numerical methods for inference, diagnosis of sampling, evaluation and criticism of models, comparison of models and presentation of results.
Read more about Osvaldo Martin

Right arrow

Chapter 6
Modeling with Bambi

A good tool improves the way you work. A great tool improves the way you think. – Jeff Duntemann

In Chapter 4, we described the basic ingredients of linear regression models and how to generalize them to better fit our needs. In this chapter, we are going to keep learning about linear models, but this time, we are going to work with Bambi [Capretto et al.2022], a high-level Bayesian model-building interface written on top of PyMC. Bambi is designed to make it extremely easy to fit linear models, including hierarchical ones. We will see that Bambi’s domain is more comprehensive than just linear models.

We are going to learn about:

  • Using Bambi to build and fit models

  • Analyzing results with Bambi

  • Polynomial regression and splines

  • Distributional models

  • Categorical predictors

  • Interactions

  • Variable selection with Kulprit

6.1 One syntax to rule them all

PyMC has a very simple and expressive syntax that allows us to build arbitrary models. That’s usually a blessing, but it can be a burden too. Bambi instead focuses on regression models, and this restriction leads to a more focused syntax and features, as we will see.

Bambi uses a Wilkinson-formula syntax similar to the one used by many R packages like nlme, lme4, and brms. Let’s assume data is a pandas DataFrame like the one shown in Table 6.1.

y x z g
0 -0.633494 -0.196436 -0.355148 Group A
1 2.32684 0.0163941 -1.22847 Group B
2 0.999604 0.107602 -0.391528 Group C
3 -0.119111 0.804268 0.967253 Group A
4 2.07504 0.991417 0.590832 Group B
5 -0.412135 0.691132 -2.13044 Group C

Table 6.1: A dummy pandas DataFrame

Using this data, we want to build a linear model that predicts y from x. Using PyMC, we would do something like the model in the following code block:

Code 6.1

with pm.Model() as lm: 
  ...

6.2 The bikes model, Bambi’s version

The first model we are going to use to illustrate how to use Bambi is the bikes model from Chapter 4. We can load the data with:

Code 6.8

bikes = pd.read_csv("data/bikes.csv")

Now we can build and fit the model:

Code 6.9

model_t = bmb.Model("rented ∼ temperature", bikes, family="negativebinomial") 
idata_t = model_t.fit()

Figure 6.2 shows a visual representation of the model. If you want to visually inspect the priors, you can use model.plot_priors():

PIC

Figure 6.2: A visual representation of the bikes model, computed with the command model.graph()

Let’s now plot the posterior mean and the posterior predictive distribution (predictions). Omitting some details needed to make the plots look nice, the code to do this is:

Code 6.10

_, axes = plt.subplots(1, 2, sharey=True, figsize=(12, 4)) 
bmb.interpret.plot_predictions(model_t, idata_t, 
   ...

6.3 Polynomial regression

One way to fit curves using a linear regression model is by building a polynomial, like this:

μ = 𝛽0 + 𝛽1x + 𝛽2x2 + 𝛽3x3 + 𝛽4x4...𝛽mxm

We call m the degree of the polynomial.

There are two important things to notice. First, polynomial regression is still linear regression; the linearity refers to the coefficients (the βs), not the variables (the xs). The second thing to note is that we are creating new variables out of thin air. The only observed variable is x, the rest are just powers of x. Creating new variables from observed ones is a perfectly valid ”trick” when doing regression; sometimes the transformation can be motivated or justified by theory (like taking the square root of the length of babies), but sometimes it is just a way to fit a curve. The intuition with polynomials is that for a given value of x, the higher the degree of the polynomial, the more flexible the curve can be. A polynomial of degree 1 is a line, a polynomial of degree 2 is a curve that can go up or...

6.4 Splines

A general way to write very flexible models is to apply functions Bm to Xm and then multiply them by coefficients βm:

μ = 𝛽0 + 𝛽1B1 (X1) + 𝛽2B2(X2 )+ ⋅⋅⋅+ 𝛽mBm (Xm )

We are free to pick Bm as we wish; for instance, we can pick polynomials. But we can also pick other functions. A popular choice is to use B-splines; we are not going to discuss their definition, but we can think of them as a way to create smooth curves in such a way that we get flexibility, as with polynomials, but less prone to overfitting. We achieve this by using piecewise polynomials, that is, polynomials that are restricted to affect only a portion of the data. Figure 6.6 shows three examples of piecewise polynomials of increasing degrees. The dotted vertical lines show the ”knots,” which are the points used to restrict the regions, the dashed gray line represents the function we want to approximate, and the black lines are the piecewise polynomials.

PIC

Figure 6.6: Piecewise polynomials of increasing degrees

Figure 6.7 shows...

6.5 Distributional models

We saw earlier that we can use linear models for parameters other than the mean (or location parameter). For example, we can use a linear model for the mean and a linear model for the standard deviation of a Gaussian distribution. These models are usually called distributional models. The syntax for distributional models is very similar; we just need to add a line for the auxiliary parameters we want to model. For instance, σ for a Gaussian, or α for a NegativeBinomial.

Let’s now reproduce an example from Chapter 4, the babies example:

Code 6.16

formula = bmb.Formula( 
    "length ∼ np.sqrt(month)", 
    "sigma ∼ month" 

model_dis = bmb.Model(formula, babies) 
idata_dis = model_dis.fit()

Figure 6.9 shows the posterior distribution values of sigma for model_dis (varying sigma) and for a model with constant sigma. We can see...

6.6 Categorical predictors

A categorical variable represents distinct groups or categories that can take on a limited set of values from those categories. These values are typically labels or names that don’t possess numerical significance on their own. Some examples are:

  • Political affiliation: conservative, liberal, or progressive.

  • Sex: female or male.

  • Customer satisfaction level: very unsatisfied, unsatisfied, neutral, satisfied, or very satisfied.

Linear regression models can easily accommodate categorical variables; we just need to encode the categories as numbers. There are a few options to do so. Bambi can easily handle the details for us. The devil is in the interpretation of the results, as we will explore in the next two sections.

6.6.1 Categorical penguins

For the current example, we are going to use the palmerpenguins dataset, Horst et al. [2020], which contains 344 observations of 8 variables. For the moment, we are interested in modeling the mass of the...

6.7 Interactions

An interaction effect, or statistical interaction, happens when the effect of an independent variable on the response changes depending on the value of another independent variable. An interaction can occur between two or more variables. Some examples are:

  • Education level and income impact: Higher education may have a stronger positive effect on income for one gender compared to the other, resulting in an interaction between education and gender.

  • Medication efficacy and age: A drug that works better for older individuals than younger ones.

  • Exercise and diet effects on weight loss: It could be that the diet’s effect on weight loss is small for people who do little or no exercise and large for people who do moderate exercise.

  • Temperature and humidity for crop growth: Some crops could thrive in hot and humid conditions, while others might perform better in cooler and less humid environments.

We have an interaction when the combined effect of two or more variables...

6.8 Interpreting models with Bambi

We have been using bmb.interpret_plot_predictions a lot in this chapter. But that’s not the only tool that Bambi offers us to help us understand models. One of them is bmb.interpret_plot_comparisons. This tool helps us answer the question, ”What is the expected predictive difference when we compare two values of a given variable while keeping all the rest at constant values?”.

Let’s use model_int from the previous section, so we don’t need to fit a new model. We use the following code block to generate Figure 6.15:

Code 6.20

bmb.interpret.plot_comparisons(model_int, idata_int, 
                               contrast={"bill_depth":[1.4, 1.8]}, 
                ...

6.9 Variable selection

Variable selection refers to the process of identifying the most relevant variables in a model from a larger set of potential predictors. We perform variable selection under the assumption that only a subset of variables have a considerable impact on the outcome of interest, while others contribute little or no additional value.

Arguably the ”most Bayesian thing to do” when building a model is to include all the variables that we may think of in a single model and then use the posterior from that model to make predictions or gain an understanding of the relationships of the variables. This is the ”most Bayesian” approach because we are using as much data as possible and incorporating in the posterior the uncertainty about the importance of the variables. However, being more Bayesian than Bayes is not always the best idea. We already saw in Chapter 5 that Bayes factors can be problematic, even when they are a direct consequence of Bayes...

6.10 Summary

In this chapter, we have seen how to use Bambi to fit Bayesian models as an alternative to the pure PyMC model. We start with the simplest case, a model with a single predictor, and then move to more complex models, including polynomials, splines, distributional models, models with categorical predictors, and interactions.

The main advantage of Bambi is that it is very easy to use; it is very similar to R’s formula syntax. And internally, Bambi defines weakly informative priors and handles details that can be cumbersome for complex models. The main disadvantage is that it is not as flexible as PyMC. The range of models that Bambi can handle is a small subset of those from PyMC. Still, this subset contains many of the most commonly used statistical models in both industry and academia. The strength of Bambi is not just easy model building, but easier model interpretation. Across the chapter, we have seen how to use Bambi’s interpret module to gain a better understanding...

6.11 Exercises

  1. Read the Bambi documentation ( https://bambinos.github.io/bambi/) and learn how to specify custom priors.

  2. Apply what you learned in the previous point and specify a HalfNormal prior for the slope of model_t.

  3. Define a model like model_poly4, but using raw polynomials, compare the coefficients and the mean fit of both models.

  4. Explain in your own words what a distributional model is.

  5. Expand model_spline to a distributional model. Use another spline to model the α parameter of the NegativeBinomial family.

  6. Create a model named model_p2 for the body_mass with the predictors bill_length, bill_depth, flipper_length, and species.

  7. Use LOO to compare the model in the previous point and model_p.

  8. Use the functions in the interpret module to interpret model_p2. Use both plots and tables.

Join our community Discord space

Join our Discord community to meet like-minded people and learn alongside more than 5000 members at: https://packt.link/bayesian

PIC

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Bayesian Analysis with Python - Third Edition
Published in: Jan 2024Publisher: PacktISBN-13: 9781805127161
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at £13.99/month. Cancel anytime

Author (1)

author image
Osvaldo Martin

Osvaldo Martin is a researcher at CONICET, in Argentina. He has experience using Markov Chain Monte Carlo methods to simulate molecules and perform Bayesian inference. He loves to use Python to solve data analysis problems. He is especially motivated by the development and implementation of software tools for Bayesian statistics and probabilistic modeling. He is an open-source developer, and he contributes to Python libraries like PyMC, ArviZ and Bambi among others. He is interested in all aspects of the Bayesian workflow, including numerical methods for inference, diagnosis of sampling, evaluation and criticism of models, comparison of models and presentation of results.
Read more about Osvaldo Martin