You're reading from Causal Inference and Discovery in Python

Product typeBook

Published inMay 2023

PublisherPackt

ISBN-139781804612989

Edition1st Edition

Concepts

Data Science

Author (1)

Aleksander Molak

Starting simple – observational data and linear regression

In previous chapters, we discussed the concept of association. In this section, we’ll quantify associations between variables using a regression model. We’ll see the geometrical interpretation of this model and demonstrate that regression can be performed in an arbitrary direction. For the sake of simplicity, we’ll focus our attention on linear cases. Let’s start!

Linear regression

Linear regression is a basic data-fitting algorithm that can be used to predict the expected value of a dependent (target) variable, <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>Y</mml:mi></mml:math> , given values of some predictor(s), . Formally, this is written as <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><msub><mover><mi>Y</mi><mo stretchy="true">ˆ</mo></mover><mrow><mi>X</mi><mo>=</mo><mi>x</mi></mrow></msub><mo>=</mo><mi>E</mi><mfenced open="[" close="]"><mrow><mi>Y</mi><mo>|</mo><mi>X</mi><mo>=</mo><mi>x</mi></mrow></mfenced></mrow></mrow></math> .

In the preceding formula, <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>Y</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>X</mml:mi><mml:mo>=</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:math> is the predicted value of <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>Y</mml:mi></mml:math> given that takes the value(s) . is the expected value operator. Note that can be multidimensional. In such cases, is usually represented as a matrix, X, with shape <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>D</mml:mi></mml:math> , where is the number of observations and is the dimensionality of (the number of...

Should we always control for all available covariates?

Multiple regression provides scientists and analysts with a tool to perform statistical control – a procedure to remove unwanted influence from certain variables in the model. In this section, we’ll discuss different perspectives on statistical control and build an intuition as to why statistical control can easily lead us astray.

Let’s start with an example. When studying predictors of dyslexia, you might be interested in understanding whether parents smoking influences the risk of dyslexia in their children. In your model, you might want to control for parental education. Parental education might affect how much attention parents devote to their children’s reading and writing, and this in turn can impact children’s skills and other characteristics. At the same time, education level might decrease the probability of smoking, potentially leading to confounding. But how do we actually know whether...

Regression and structural models

Before we conclude this chapter, let’s take a look at the connection between regression and SCMs. You might already have an intuitive understanding that they are somehow related. In this section, we’ll discuss the nature of this relationship.

SCMs

In the previous chapter, we learned that SCMs are a useful tool for encoding causal models. They consist of a set of variables (exogenous and endogenous) and a set of functions defining the relationships between these variables. We saw that SCMs can be represented as graphs, with nodes representing variables and directed edges representing functions. Finally, we learned that SCMs can produce interventional and counterfactual distributions.

SCM and structural equations

In causal literature, the names structural equation model (SEM) and structural causal model (SCM) are sometimes used interchangeably (e.g., Peters et al., 2017). Others refer to SEMs as a family of specific multivariate...

Wrapping it up

That was a lot of material! Congrats on reaching the end of Chapter 3!

In this chapter, we learned about the links between regression, observational data, and causal models. We started with a review of linear regression. After that, we discussed the concept of statistical control and demonstrated how it can lead us astray. We analyzed selected recommendations regarding statistical control and reviewed them from a causal perspective. Finally, we examined the links between linear regression and SCMs.

A solid understanding of the links between observational data, regression, and statistical control will help us move freely in the world of much more complex models, which we’ll start introducing in Part 2, Causal Inference.

We’re now ready to take a more detailed look at the graphical aspect of causal models. See you in the next chapter!

References

Becker, T. E., Atinc, G., Breaugh, J. A., Carlson, K. D., Edwards, J. R., & Spector, P. E. (2016). Statistical control in correlational studies: 10 essential recommendations for organizational researchers. Journal of Organizational Behavior, 37(2), 157–167.

Bollen, K. A. & Noble, M. D. (2011). Structural equation models and the quantification of behavior. PNAS Proceedings of the National Academy of Sciences of the United States of America, 108(Suppl 3), 15639–15646.

Cinelli, C., Forney, A., & Pearl, J. (2022). A Crash Course in Good and Bad Controls. Sociological Methods & Research, 0 (0), 1-34.

Kline, R. B. (2015). Principles and Practice of Structural Equation Modeling. Guilford Press.

Murphy, K. P. (2022). Probabilistic Machine Learning: An Introduction. MIT Press.

Pearl, J. (2012). The causal foundations of structural equation modeling. In Hoyle, R. H. (Ed.), Handbook of structural equation modeling (pp. 68–91). Guilford...

References

Cinelli, C., Forney, A., & Pearl, J. (2022). A Crash Course in Good and Bad Controls. Sociological Methods & Research, 0 (0), 1-34.

Kline, R. B. (2015). Principles and Practice of Structural Equation Modeling. Guilford Press.

Murphy, K. P. (2022). Probabilistic Machine Learning: An Introduction. MIT Press.

Pearl, J. (2012). The causal foundations of structural equation modeling. In Hoyle, R. H. (Ed.), Handbook of structural equation modeling (pp. 68–91). Guilford...

The rest of the chapter is locked

You have been reading a chapter from

Causal Inference and Discovery in Python

Published in: May 2023Publisher: PacktISBN-13: 9781804612989

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Author (1)

Aleksander Molak

Aleksander Molak is a Machine Learning Researcher and Consultant who gained experience working with Fortune 100, Fortune 500, and Inc. 5000 companies across Europe, the USA, and Israel, designing and building large-scale machine learning systems. On a mission to democratize causality for businesses and machine learning practitioners, Aleksander is a prolific writer, creator, and international speaker. As a co-founder of Lespire, an innovative provider of AI and machine learning training for corporate teams, Aleksander is committed to empowering businesses to harness the full potential of cutting-edge technologies that allow them to stay ahead of the curve.
Read more about Aleksander Molak

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages