Reader small image

You're reading from  Causal Inference and Discovery in Python

Product typeBook
Published inMay 2023
PublisherPackt
ISBN-139781804612989
Edition1st Edition
Concepts
Right arrow
Author (1)
Aleksander Molak
Aleksander Molak
author image
Aleksander Molak

Aleksander Molak is a Machine Learning Researcher and Consultant who gained experience working with Fortune 100, Fortune 500, and Inc. 5000 companies across Europe, the USA, and Israel, designing and building large-scale machine learning systems. On a mission to democratize causality for businesses and machine learning practitioners, Aleksander is a prolific writer, creator, and international speaker. As a co-founder of Lespire, an innovative provider of AI and machine learning training for corporate teams, Aleksander is committed to empowering businesses to harness the full potential of cutting-edge technologies that allow them to stay ahead of the curve.
Read more about Aleksander Molak

Right arrow

Preface

I wrote this book with a purpose in mind.

My journey to practical causality was an exciting but also challenging road.

Going from great theoretical books to implementing models in practice, and from translating assumptions to verifying them in real-world scenarios, demanded significant work.

I could not find unified, comprehensive resources that could be my guide through this journey.

This book is intended to be that guide.

This book provides a map that allows you to break into the world of causality.

We start with basic motivations behind causal thinking and a comprehensive introduction to Pearlian causal concepts: structural causal model, interventions, counterfactuals, and more.

Each concept comes with a theoretical explanation and a set of practical exercises accompanied by Python code.

Next, we dive into the world of causal effect estimation. Starting simple, we consistently progress toward modern machine learning methods. Step by step, we introduce the Python causal ecosystem and harness the power of cutting-edge algorithms.

In the last part of the book, we sneak into the secret world of causal discovery. We explore the mechanics of how causes leave traces and compare the main families of causal discovery algorithms to unravel the potential of end-to-end causal discovery and human-in-the-loop learning.

We close the book with a broad outlook into the future of causal AI. We examine challenges and opportunities and provide you with a comprehensive list of resources to learn more.

Who this book is for

The main audience I wrote this book for consists of machine learning engineers, data scientists, and machine learning researchers with three or more years of experience, who want to extend their data science toolkit and explore the new unchartered territory of causal machine learning.

People familiar with causality who have worked with another technology (e.g., R) and want to switch to Python can also benefit from this book, as well as people who have worked with traditional causality and want to expand their knowledge and tap into the potential of causal machine learning.

Finally, this book can benefit tech-savvy entrepreneurs who want to build a competitive edge for their products and go beyond the limitations of traditional machine learning.

What this book covers

Chapter 1, Causality: Hey, We Have Machine Learning, So Why Even Bother?, briefly discusses the history of causality and a number of motivating examples. This chapter introduces the notion of spuriousness and demonstrates that some classic definitions of causality do not capture important aspects of causal learning (which human babies know about). This chapter provides the basic distinction between statistical and causal learning, which is a cornerstone for the rest of the book.

Chapter 2, Judea Pearl and the Ladder of Causation, provides us with a definition of the Ladder of Causation – a crucial concept introduced by Judea Pearl that emphasizes the differences between observational, interventional, and counterfactual queries and distributions. We build on top of these ideas and translate them into concrete code examples. Finally, we briefly discuss how different families of machine learning (supervised, reinforcement, semi-, and unsupervised) relate to causal modeling.

Chapter 3, Regression, Observations, and Interventions, prepares us to take a look at linear regression from a causal perspective. We analyze important properties of observational data and discuss the significance of these properties for causal reasoning. We re-evaluate the problem of statistical control through the causal lens and introduce structural causal models (SCMs). These topics help us build a strong foundation for the rest of the book.

Chapter 4, Graphical Models, starts with a refresher on graphs and basic graph theory. After refreshing the fundamental concepts, we use them to define directed acyclic graphs (DAGs) – one of the most crucial concepts in Pearlian causality. We briefly introduce the sources of causal graphs in the real world and touch upon causal models that are not easily describable using DAGs. This prepares us for Chapter 5.

Chapter 5, Forks, Chains, and Immoralities, focuses on three basic graphical structures: forks, chains, and immoralities (also known as colliders). We learn about the crucial properties of these structures and demonstrate how these graphical concepts manifest themselves in the statistical properties of the data. The knowledge we gain in this chapter will be one of the fundamental building blocks of the concepts and techniques that we introduced in Part 2 and Part 3 of this book.

Chapter 6, Nodes, Edges, and Statistical (In)Dependence, builds on top of the concepts introduced in Chapter 5 and takes them a step further. We introduce the concept of d-separation, which will allow us to systematically evaluate conditional independence queries in DAGs, and define the notion of estimand. Finally, we discuss three popular estimands and the conditions under which they can be applied.

Chapter 7, The Four-Step Process of Causal Inference, takes us to the practical side of causality. We introduce DoWhy – an open source causal inference library created by researchers from Microsoft – and show how to carry out a full causal inference process using its intuitive APIs. We demonstrate how to define a causal model, find a relevant estimand, estimate causal effects, and perform refutation tests.

Chapter 8, Causal Models – Assumptions and Challenges, brings our attention back to the topic of assumptions. Assumptions are a crucial and indispensable part of any causal project or analysis. In this chapter, we take a broader view and discuss the most important assumptions from the point of view of two causal formalisms: the Pearlian (graph-based) framework and the potential outcomes framework.

Chapter 9, Causal Inference and Machine Learning – from Matching to Meta-learners, opens the door to causal estimation beyond simple linear models. We start by introducing the ideas behind matching and propensity scores and discussing why propensity scores should not be used for matching. We introduce meta-learners – a class of models that can be used for the estimation of conditional average treatment effects (CATEs) and implement them using DoWhy and EconML packages.

Chapter 10, Causal Inference and Machine Learning – Advanced Estimators, Experiments, Evaluations, and More, introduces more advanced estimators: DR-Learner, double machine learning (DML), and causal forest. We show how to use CATE estimators with experimental data and introduce a number of useful evaluation metrics that can be applied in real-world scenarios. We conclude the chapter with a brief discussion of counterfactual explanations.

Chapter 11, Causal Inference and Machine Learning – Deep Learning, NLP, and Beyond, introduces deep learning models for CATE estimation and a PyTorch-based CATENets library. In the second part of the chapter, we take a look at the intersection of causal inference and NLP and introduce CausalBert – a Transformer-based model that can be used to remove spurious relationships present in textual data. We close the chapter with an introduction to the synthetic control estimator, which we use to estimate causal effects in real-world data.

Chapter 12, Can I Have a Causal Graph, Please?, provides us with a deeper look at the real-world sources of causal knowledge and introduces us to the concept of automated causal discovery. We discuss the idea of expert knowledge and its value in the process of causal analysis.

Chapter 13, Causal Discovery and Machine Learning – from Assumptions to Applications, starts with a review of assumptions required by some of the popular causal discovery algorithms. We introduce four main families of causal discovery methods and implement key algorithms using the gCastle library, addressing some of the important challenges on the way. Finally, we demonstrate how to encode expert knowledge when working with selected methods.

Chapter 14, Causal Discovery and Machine Learning – Advanced Deep Learning and Beyond, introduces an advanced causal discovery algorithm – DECI. We implement it using the modules coming from an open source Microsoft library, Causica, and train it using PyTorch. We present methods that allow us to work with datasets with hidden confounding and implement one of them – fast causal inference (FCI) – using the causal-learn library. Finally, we briefly discuss two frameworks that allow us to combine observational and interventional data in order to make causal discovery more efficient and less error-prone.

Chapter 15, Epilogue, closes Part 3 of the book with a summary of what we’ve learned, a discussion of causality in business, a sneak peek into the (potential) future of the field, and pointers to more resources on causal inference and discovery for those who are ready to continue their causal journey.

To get the most out of this book

The code for this book is provided in the form of Jupyter notebooks. To run the notebooks, you’ll need to install the required packages.

The easiest way to install them is using Conda. Conda is a great package manager for Python. If you don’t have Conda installed on your system, the installation instructions can be found here: https://bit.ly/InstallConda.

Note that Conda’s license might have some restrictions for commercial use. After installing Conda, follow the environment installation instructions in the book’s repository README.md file (https://bit.ly/InstallEnvironments).

If you want to recreate some of the plots from the book, you might need to additionally install Graphviz. For GPU acceleration, CUDA drivers might be needed. Instructions and requirements for Graphviz and CUDA are available in the same README.md file in the repository (https://bit.ly/InstallEnvironments).

The code for this book has been only tested on Windows 11 (64-bit).

Software/hardware covered in the book

Operating system requirements

Python 3.9

Windows, macOS, or Linux

DoWhy 0.8

Windows, macOS, or Linux

EconML 0.12.0

Windows, macOS, or Linux

CATENets 0.2.3

Windows, macOS, or Linux

gCastle 1.0.3

Windows, macOS, or Linux

Causica 0.2.0

Windows, macOS, or Linux

Causal-learn 0.1.3.3

Windows, macOS, or Linux

Transformers 4.24.0

Windows, macOS, or Linux

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Causal-Inference-and-Discovery-in-Python. If there’s an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “We’ll model the adjacency matrix using the ENCOAdjacencyDistributionModule object.”

A block of code is set as follows:

preds = causal_bert.inference(
    texts=df['text'],
    confounds=df['has_photo'],
)[0]

Any command-line input or output is written as follows:

$ mkdir css
$ cd css

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “Select System info from the Administration panel.”

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at customercare@packtpub.com and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you’ve read Causal Inference and Discovery in Python, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Join our book's Discord space

Join our Discord community to meet like-minded people and learn alongside more than 2000 members at: https://packt.link/infer

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere? Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

  1. Scan the QR code or visit the link below
Download a free PDF copy of this book

https://packt.link/free-ebook/9781804612989

  1. Submit your proof of purchase
  2. That’s it! We’ll send your free PDF and other benefits to your email directly
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Causal Inference and Discovery in Python
Published in: May 2023Publisher: PacktISBN-13: 9781804612989
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Aleksander Molak

Aleksander Molak is a Machine Learning Researcher and Consultant who gained experience working with Fortune 100, Fortune 500, and Inc. 5000 companies across Europe, the USA, and Israel, designing and building large-scale machine learning systems. On a mission to democratize causality for businesses and machine learning practitioners, Aleksander is a prolific writer, creator, and international speaker. As a co-founder of Lespire, an innovative provider of AI and machine learning training for corporate teams, Aleksander is committed to empowering businesses to harness the full potential of cutting-edge technologies that allow them to stay ahead of the curve.
Read more about Aleksander Molak