Packt+ | Advance your knowledge in tech

You're reading from Regression Analysis with Python

Product typeBook

Published inFeb 2016

Reading LevelIntermediate

Publisher

ISBN-139781785286315

Edition1st Edition

Languages

Python

Tools

SciPy Scikit-learn

Concepts

Statistics

Authors (2):

Luca Massaron

Alberto Boschetti

View More author details

Chapter 3. Multiple Regression in Action

In the previous chapter, we introduced linear regression as a supervised method for machine learning rooted in statistics. Such a method forecasts numeric values using a combination of predictors, which can be continuous numeric values or binary variables, given the assumption that the data we have at hand displays a certain relation (a linear one, measurable by a correlation) with the target variable. To smoothly introduce many concepts and easily explain how the method works, we limited our example models to just a single predictor variable, leaving to it all the burden of modeling the response.

However, in real-world applications, there may be some very important causes determining the events you want to model but it is indeed rare that a single variable could take the stage alone and make a working predictive model. The world is complex (and indeed interrelated in a mix of causes and effects) and often it cannot be easily explained without considering...

Using multiple features

To recap the tools seen in the previous chapter, we reload all the packages and the Boston dataset:

In: import numpy as np
  import pandas as pd
  import matplotlib.pyplot as plt
  import matplotlib as mpl
  from sklearn.datasets import load_boston
  from sklearn import linear_model

If you are working on the code in an IPython Notebook (as we strongly suggest), the following magic command will allow you to visualize plots directly on the interface:

In: %matplotlib inline

We are still using the Boston dataset, a dataset that tries to explain different house prices in the Boston of the 70s, given a series of statistics aggregated at the census zone level:

In: boston = load_boston()
  dataset = pd.DataFrame(boston.data, columns=boston.feature_names)
  dataset['target'] = boston.target

We will always work by keeping with us a series of informative variables, the number of observation and variable names, the input data matrix, and the response vector at hand:

In: observations...

Revisiting gradient descent

In continuity with the previous chapter, we carry on our explanation and experimentation with gradient descent. As we have already defined both the mathematical formulation and their translation into Python code, using matrix notation, we don't need to worry if now we have to deal with more than one variable at a time. Having used the matrix notation allows us to easily extend our previous introduction and example to multiple predictors with just minor changes to the algorithm.

In particular, we have to take note that, by introducing more parameters to be estimated during the optimization procedure, we are actually introducing more dimensions to our line of fit (turning it into a hyperplane, a multidimensional surface) and such dimensions have certain communalities and differences to be taken into account.

Feature scaling

Working with different features requires more attention when estimating the coefficients because of their similarities which can cause a variance...

Estimating feature importance

After having confirmed the values of the coefficients of the linear model we have built, and after having explored the basic statistics necessary to understand if our model is working correctly, we can start auditing our work by first understanding how a prediction is made up. We can obtain this by accounting for each variable's role in the constitution of the predicted values. A first check to be done on the coefficients is surely on the directionality they express, which is simply dictated by their sign. Based on our expertise on the subject (so it is advisable to be knowledgeable about the domain we are working on), we can check whether all the coefficients correspond to our expectations in terms of directionality. Some features may decrease the response as we expect, thereby correctly confirming that they have a coefficient with a negative sign, whereas others may increase it, so a positive coefficient should be correct. When coefficients do not correspond...

Interaction models

Having explained how to build a regression model with multiple variables and having touched on the theme of its utilization and interpretation, we start from this paragraph to explore how to improve it. As a first step, we will work on its fit with present data. In the following chapters, devoted to model selection and validation, we will concentrate on how to make it really generalizable—that is, capable of correctly predicting on new, previously unseen data.

As we previously reasoned, the beta coefficients in a linear regression represent the link between a unit change in the predictors and the response variations. The assumptions at the core of such a model are of a constant and unidirectional relationship between each predictor and the target. It is the linear relationship assumption, having the characteristics of a line where direction and fluctuation are determined by the angular coefficient (hence the name linear regression, hinting at the operation of regressing...

Polynomial regression

As an extension of interactions, polynomial expansion systematically provides an automatic means of creating both interactions and non-linear power transformations of the original variables. Power transformations are the bends that the line can take in fitting the response. The higher the degree of power, the more bends are available to fit the curve.

For instance, if you have a simple linear regression of the form:

By a second degree transformation, called quadratic, you will get a new form:

By a third degree transformation, called cubic, your equation will turn into:

If your regression is a multiple one, the expansion will create additional terms (interactions) increasing the number of new features derived from the expansion. For instance, a multiple regression made up of two predictors (x₁ and x₂), expanded using the quadratic transformation, will become:

Before proceeding, we have to note two aspects of the expansion procedure:

Polynomial expansion rapidly increases the...

Summary

In this chapter, we have carried on introducing linear regression, extending our example from a simple to a multiple one. We have revisited the previous outputs from the Statsmodels linear functions (the classical statistical approach) and gradient descent (the data science engine).

We started experimenting with models by removing selected predictors and evaluating the impact of such a move from the point of view of the R-squared measure. Meanwhile we also discovered reciprocal correlations between predictors and how to render more linear relations between each predictor and the target variable by intercepting the interactions and by means of polynomial expansion of the features.

In the next chapter, we will progress again and extend the regression model to make it viable for classification tasks, turning it into a probabilistic predictor. The conceptual jump into the world of probability will allow us to complete the range of possible problems where linear models can be successfully...

The rest of the chapter is locked

You have been reading a chapter from

Regression Analysis with Python

Published in: Feb 2016Publisher: ISBN-13: 9781785286315

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Luca Massaron

Having joined Kaggle over 10 years ago, Luca Massaron is a Kaggle Grandmaster in discussions and a Kaggle Master in competitions and notebooks. In Kaggle competitions he reached no. 7 in the worldwide rankings. On the professional side, Luca is a data scientist with more than a decade of experience in transforming data into smarter artifacts, solving real-world problems, and generating value for businesses and stakeholders. He is a Google Developer Expert(GDE) in machine learning and the author of best-selling books on AI, machine learning, and algorithms.
Read more about Luca Massaron

Alberto Boschetti

Alberto Boschetti is a data scientist with expertise in signal processing and statistics. He holds a Ph.D. in telecommunication engineering and currently lives and works in London. In his work projects, he faces challenges ranging from natural language processing (NLP) and behavioral analysis to machine learning and distributed processing. He is very passionate about his job and always tries to stay updated about the latest developments in data science technologies, attending meet-ups, conferences, and other events.
Read more about Alberto Boschetti

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages