Packt+ | Advance your knowledge in tech

You're reading from Regression Analysis with Python

Product type Book

Published in Feb 2016

Publisher

ISBN-13 9781785286315

Pages 312 pages

Edition 1st Edition

Languages

Python

Concepts

Statistics

Authors (2):

Luca Massaron

Alberto Boschetti

View More author details

Table of Contents (16) Chapters

Regression Analysis with Python

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Preface

1. Regression – The Workhorse of Data Science

2. Approaching Simple Linear Regression

3. Multiple Regression in Action

4. Logistic Regression

5. Data Preparation

6. Achieving Generalization

7. Online and Batch Learning

8. Advanced Regression Methods

9. Real-world Applications for Regression Models

Index

Chapter 3. Multiple Regression in Action

In the previous chapter, we introduced linear regression as a supervised method for machine learning rooted in statistics. Such a method forecasts numeric values using a combination of predictors, which can be continuous numeric values or binary variables, given the assumption that the data we have at hand displays a certain relation (a linear one, measurable by a correlation) with the target variable. To smoothly introduce many concepts and easily explain how the method works, we limited our example models to just a single predictor variable, leaving to it all the burden of modeling the response.

However, in real-world applications, there may be some very important causes determining the events you want to model but it is indeed rare that a single variable could take the stage alone and make a working predictive model. The world is complex (and indeed interrelated in a mix of causes and effects) and often it cannot be easily explained without considering...

Using multiple features

To recap the tools seen in the previous chapter, we reload all the packages and the Boston dataset:

In: import numpy as np
  import pandas as pd
  import matplotlib.pyplot as plt
  import matplotlib as mpl
  from sklearn.datasets import load_boston
  from sklearn import linear_model

If you are working on the code in an IPython Notebook (as we strongly suggest), the following magic command will allow you to visualize plots directly on the interface:

In: %matplotlib inline

We are still using the Boston dataset, a dataset that tries to explain different house prices in the Boston of the 70s, given a series of statistics aggregated at the census zone level:

In: boston = load_boston()
  dataset = pd.DataFrame(boston.data, columns=boston.feature_names)
  dataset['target'] = boston.target

We will always work by keeping with us a series of informative variables, the number of observation and variable names, the input data matrix, and the response vector at hand:

In: observations...

Revisiting gradient descent

In continuity with the previous chapter, we carry on our explanation and experimentation with gradient descent. As we have already defined both the mathematical formulation and their translation into Python code, using matrix notation, we don't need to worry if now we have to deal with more than one variable at a time. Having used the matrix notation allows us to easily extend our previous introduction and example to multiple predictors with just minor changes to the algorithm.

In particular, we have to take note that, by introducing more parameters to be estimated during the optimization procedure, we are actually introducing more dimensions to our line of fit (turning it into a hyperplane, a multidimensional surface) and such dimensions have certain communalities and differences to be taken into account.

Feature scaling

Working with different features requires more attention when estimating the coefficients because of their similarities which can cause a variance...

Estimating feature importance

After having confirmed the values of the coefficients of the linear model we have built, and after having explored the basic statistics necessary to understand if our model is working correctly, we can start auditing our work by first understanding how a prediction is made up. We can obtain this by accounting for each variable's role in the constitution of the predicted values. A first check to be done on the coefficients is surely on the directionality they express, which is simply dictated by their sign. Based on our expertise on the subject (so it is advisable to be knowledgeable about the domain we are working on), we can check whether all the coefficients correspond to our expectations in terms of directionality. Some features may decrease the response as we expect, thereby correctly confirming that they have a coefficient with a negative sign, whereas others may increase it, so a positive coefficient should be correct. When coefficients do not correspond...

Interaction models

Having explained how to build a regression model with multiple variables and having touched on the theme of its utilization and interpretation, we start from this paragraph to explore how to improve it. As a first step, we will work on its fit with present data. In the following chapters, devoted to model selection and validation, we will concentrate on how to make it really generalizable—that is, capable of correctly predicting on new, previously unseen data.

As we previously reasoned, the beta coefficients in a linear regression represent the link between a unit change in the predictors and the response variations. The assumptions at the core of such a model are of a constant and unidirectional relationship between each predictor and the target. It is the linear relationship assumption, having the characteristics of a line where direction and fluctuation are determined by the angular coefficient (hence the name linear regression, hinting at the operation of regressing...

Polynomial regression

As an extension of interactions, polynomial expansion systematically provides an automatic means of creating both interactions and non-linear power transformations of the original variables. Power transformations are the bends that the line can take in fitting the response. The higher the degree of power, the more bends are available to fit the curve.

For instance, if you have a simple linear regression of the form:

By a second degree transformation, called quadratic, you will get a new form:

By a third degree transformation, called cubic, your equation will turn into:

If your regression is a multiple one, the expansion will create additional terms (interactions) increasing the number of new features derived from the expansion. For instance, a multiple regression made up of two predictors (x₁ and x₂), expanded using the quadratic transformation, will become:

Before proceeding, we have to note two aspects of the expansion procedure:

Polynomial expansion rapidly increases the...

Summary

In this chapter, we have carried on introducing linear regression, extending our example from a simple to a multiple one. We have revisited the previous outputs from the Statsmodels linear functions (the classical statistical approach) and gradient descent (the data science engine).

We started experimenting with models by removing selected predictors and evaluating the impact of such a move from the point of view of the R-squared measure. Meanwhile we also discovered reciprocal correlations between predictors and how to render more linear relations between each predictor and the target variable by intercepting the interactions and by means of polynomial expansion of the features.

In the next chapter, we will progress again and extend the regression model to make it viable for classification tasks, turning it into a probabilistic predictor. The conceptual jump into the world of probability will allow us to complete the range of possible problems where linear models can be successfully...