You're reading from Causal Inference and Discovery in Python

Product typeBook

Published inMay 2023

PublisherPackt

ISBN-139781804612989

Edition1st Edition

Concepts

Data Science

Author (1)

Aleksander Molak

Step 1 – modeling the problem

In this section, we’ll discuss and practice step 1 of the four-step causal inference process: modeling the problem.

We’ll split this step into two substeps:

Creating a graph representing our problem
Instantiating DoWhy’s CausalModel object using this graph

Creating the graph

In Chapter 3, we introduced a graph language called GML. We’ll use GML to define our data-generating process in this section.

Figure 7.1 presents the GPS example from the previous chapter, which we’ll model next. Note that we have omitted variable-specific noise for clarity:

Figure 7.1 – The graphical model from Chapter 6

Note that the graph in Figure 7.1 contains an unobserved variable, U. We did not include this variable in our dataset (it’s unobserved!), but we’ll include it in our graph. This will allow DoWhy to recognize that there’s an unobserved confounder...

Step 2 – identifying the estimand(s)

This short section is all about finding estimands with DoWhy. We’ll start with a brief overview of estimands supported by the library and then jump straight into practice!

DoWhy offers three ways to find estimands:

Back-door
Front-door
Instrumental variable

We know all of them from the previous chapter. To see a quick practical introduction to all three methods, check out my blog post Causal Python — 3 Simple Techniques to Jump-Start Your Causal Inference Journey Today (Molak, 2022; https://bit.ly/DoWhySimpleBlog).

Let’s see how to use DoWhy in order to find a correct estimand for our model.

It turns out it is very easy! Just see for yourself:

estimand = model.identify_effect()

Yes, that’s all!

We just call the .identify_effect() method of our CausalModel object and we’re done!

Let’s print out our estimand to see what we can learn:

print(estimand)

...

Step 3 – obtaining estimates

In this section, we’ll compute causal effect estimates for our model.

Computing estimates using DoWhy is as simple as it can be. To do it, we need to call the .estimate_effect() method of our CausalModel object:

estimate = model.estimate_effect(
    identified_estimand=estimand,
    method_name='frontdoor.two_stage_regression')

We pass two arguments to the method:

Our identified estimand
The name of the method that will be used to compute the estimate

You might recall from Chapter 6 that we needed to fit two linear regression models, get their coefficients, and multiply them in order to obtain the final causal effect estimate. DoWhy makes this process much easier for us.

Let’s print out the result:

print(f'Estimate of causal effect (
    linear regression): {estimate.value}')

This gives us the following output:

Estimate...

Step 4 – where’s my validation set? Refutation tests

In this section, we’ll discuss ideas regarding causal model validation. We’ll introduce the idea behind refutation tests. Finally, we’ll implement a couple of refutation tests in practice.

How to validate causal models

One of the most popular ways to validate machine learning models is through cross-validation (CV). The basic idea behind CV is relatively simple:

We split the data into folds (subsets).
We train the model on folds and validate it on the remaining fold.
We repeat this process times.
At every step, we train on a different set of folds and evaluate on the remaining fold (which is also different at each step).

Figure 7.3 presents a schematic visualization of a five-fold CV scheme:

Figure 7.3 – Schematic of five-fold CV

In Figure 7.3, the blue folds denote validation sets, while the white ones denote training sets...

Full example

This section is here to help us solidify our newly acquired knowledge. We’ll run a full causal inference process once again, step by step. We’ll introduce some new exciting elements on the way and – finally – we’ll translate the whole process to the new GCM API. By the end of this section, you will have the confidence and skills to apply the four-step causal inference process to your own problems.

Figure 7.4 presents a graphical model that we’ll use in this section:

Figure 7.4 – A graphical model that we’ll use in this section

We’ll generate 1,000 observations from an SCM following the structure from Figure 7.4 and store them in a data frame:

SAMPLE_SIZE = 1000
S = np.random.random(SAMPLE_SIZE)
Q = 0.2*S + 0.67*np.random.random(SAMPLE_SIZE)
X = 0.14*Q + 0.4*np.random.random(SAMPLE_SIZE)
Y = 0.7*X + 0.11*Q + 0.32*S +
    0.24*np.random.random(SAMPLE_SIZE...

Wrapping it up

In this chapter, we discussed the Python causal ecosystem. We introduced the DoWhy and EconML libraries and practiced the four-step causal inference process using DoWhy’s CausalModel API. We learned how to automatically obtain estimands and how to use different types of estimators to compute causal effect estimates. We discussed what refutation tests are and how to use them in practice. Finally, we introduced DoWhy’s experimental GCM API and showed its great capabilities when it comes to answering various causal queries. After working through this chapter, you have the basic skills to apply causal inference to your own problems. Congratulations!

In the next chapter, we’ll summarize common assumptions for causal inference and discuss some limitations of the causal inference framework.

References

Bates, S., Hastie, T., & Tibshirani, R. (2021). Cross-validation: what does it estimate and how well does it do it?. arXiv preprint. https://doi.org/10.48550/ARXIV.2104.00673

Battocchi, K., Dillon, E., Hei, M., Lewis, G., Oka, P., Oprescu, M., & Syrgkanis, V. (2019). EconML: A Python Package for ML-Based Heterogeneous Treatment Effects Estimation. https://github.com/microsoft/EconML

Blobaum, P., Götz, P., Budhathoki, K., Mastakouri, A., & Janzing, D. (2022). DoWhy-GCM: An extension of DoWhy for causal inference in graphical causal models. arXiv.

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2016). Double/Debiased Machine Learning for Treatment and Causal Parameters. arXiv preprint. https://doi.org/10.48550/ARXIV.1608.00060

Molak, A. (2022, September 27). Causal Python: 3 Simple Techniques to Jump-Start Your Causal Inference Journey Today. Towards Data Science. https://towardsdatascience.com...

Wrapping it up

In the next chapter, we’ll summarize common assumptions for causal inference and discuss some limitations of the causal inference framework.

References

Bates, S., Hastie, T., & Tibshirani, R. (2021). Cross-validation: what does it estimate and how well does it do it?. arXiv preprint. https://doi.org/10.48550/ARXIV.2104.00673

Blobaum, P., Götz, P., Budhathoki, K., Mastakouri, A., & Janzing, D. (2022). DoWhy-GCM: An extension of DoWhy for causal inference in graphical causal models. arXiv.

Molak, A. (2022, September 27). Causal Python: 3 Simple Techniques to Jump-Start Your Causal Inference Journey Today. Towards Data Science. https://towardsdatascience.com...

Join our book's Discord space

Join our Discord community to meet like-minded people and learn alongside more than 2000 members at: https://packt.link/infer

The rest of the chapter is locked

You have been reading a chapter from

Causal Inference and Discovery in Python

Published in: May 2023Publisher: PacktISBN-13: 9781804612989

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Aleksander Molak

Aleksander Molak is a Machine Learning Researcher and Consultant who gained experience working with Fortune 100, Fortune 500, and Inc. 5000 companies across Europe, the USA, and Israel, designing and building large-scale machine learning systems. On a mission to democratize causality for businesses and machine learning practitioners, Aleksander is a prolific writer, creator, and international speaker. As a co-founder of Lespire, an innovative provider of AI and machine learning training for corporate teams, Aleksander is committed to empowering businesses to harness the full potential of cutting-edge technologies that allow them to stay ahead of the curve.
Read more about Aleksander Molak

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages