Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Regression Analysis with Python

You're reading from  Regression Analysis with Python

Product type Book
Published in Feb 2016
Publisher
ISBN-13 9781785286315
Pages 312 pages
Edition 1st Edition
Languages
Concepts
Authors (2):
Luca Massaron Luca Massaron
Profile icon Luca Massaron
Alberto Boschetti Alberto Boschetti
Profile icon Alberto Boschetti
View More author details

Table of Contents (16) Chapters

Regression Analysis with Python
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
1. Regression – The Workhorse of Data Science 2. Approaching Simple Linear Regression 3. Multiple Regression in Action 4. Logistic Regression 5. Data Preparation 6. Achieving Generalization 7. Online and Batch Learning 8. Advanced Regression Methods 9. Real-world Applications for Regression Models Index

Chapter 3. Multiple Regression in Action

In the previous chapter, we introduced linear regression as a supervised method for machine learning rooted in statistics. Such a method forecasts numeric values using a combination of predictors, which can be continuous numeric values or binary variables, given the assumption that the data we have at hand displays a certain relation (a linear one, measurable by a correlation) with the target variable. To smoothly introduce many concepts and easily explain how the method works, we limited our example models to just a single predictor variable, leaving to it all the burden of modeling the response.

However, in real-world applications, there may be some very important causes determining the events you want to model but it is indeed rare that a single variable could take the stage alone and make a working predictive model. The world is complex (and indeed interrelated in a mix of causes and effects) and often it cannot be easily explained without considering...

Using multiple features


To recap the tools seen in the previous chapter, we reload all the packages and the Boston dataset:

In: import numpy as np
  import pandas as pd
  import matplotlib.pyplot as plt
  import matplotlib as mpl
  from sklearn.datasets import load_boston
  from sklearn import linear_model

If you are working on the code in an IPython Notebook (as we strongly suggest), the following magic command will allow you to visualize plots directly on the interface:

In: %matplotlib inline

We are still using the Boston dataset, a dataset that tries to explain different house prices in the Boston of the 70s, given a series of statistics aggregated at the census zone level:

In: boston = load_boston()
  dataset = pd.DataFrame(boston.data, columns=boston.feature_names)
  dataset['target'] = boston.target

We will always work by keeping with us a series of informative variables, the number of observation and variable names, the input data matrix, and the response vector at hand:

In: observations...

Revisiting gradient descent


In continuity with the previous chapter, we carry on our explanation and experimentation with gradient descent. As we have already defined both the mathematical formulation and their translation into Python code, using matrix notation, we don't need to worry if now we have to deal with more than one variable at a time. Having used the matrix notation allows us to easily extend our previous introduction and example to multiple predictors with just minor changes to the algorithm.

In particular, we have to take note that, by introducing more parameters to be estimated during the optimization procedure, we are actually introducing more dimensions to our line of fit (turning it into a hyperplane, a multidimensional surface) and such dimensions have certain communalities and differences to be taken into account.

Feature scaling

Working with different features requires more attention when estimating the coefficients because of their similarities which can cause a variance...

Estimating feature importance


After having confirmed the values of the coefficients of the linear model we have built, and after having explored the basic statistics necessary to understand if our model is working correctly, we can start auditing our work by first understanding how a prediction is made up. We can obtain this by accounting for each variable's role in the constitution of the predicted values. A first check to be done on the coefficients is surely on the directionality they express, which is simply dictated by their sign. Based on our expertise on the subject (so it is advisable to be knowledgeable about the domain we are working on), we can check whether all the coefficients correspond to our expectations in terms of directionality. Some features may decrease the response as we expect, thereby correctly confirming that they have a coefficient with a negative sign, whereas others may increase it, so a positive coefficient should be correct. When coefficients do not correspond...

Interaction models


Having explained how to build a regression model with multiple variables and having touched on the theme of its utilization and interpretation, we start from this paragraph to explore how to improve it. As a first step, we will work on its fit with present data. In the following chapters, devoted to model selection and validation, we will concentrate on how to make it really generalizable—that is, capable of correctly predicting on new, previously unseen data.

As we previously reasoned, the beta coefficients in a linear regression represent the link between a unit change in the predictors and the response variations. The assumptions at the core of such a model are of a constant and unidirectional relationship between each predictor and the target. It is the linear relationship assumption, having the characteristics of a line where direction and fluctuation are determined by the angular coefficient (hence the name linear regression, hinting at the operation of regressing...

Polynomial regression


As an extension of interactions, polynomial expansion systematically provides an automatic means of creating both interactions and non-linear power transformations of the original variables. Power transformations are the bends that the line can take in fitting the response. The higher the degree of power, the more bends are available to fit the curve.

For instance, if you have a simple linear regression of the form:

By a second degree transformation, called quadratic, you will get a new form:

By a third degree transformation, called cubic, your equation will turn into:

If your regression is a multiple one, the expansion will create additional terms (interactions) increasing the number of new features derived from the expansion. For instance, a multiple regression made up of two predictors (x1 and x2), expanded using the quadratic transformation, will become:

Before proceeding, we have to note two aspects of the expansion procedure:

  • Polynomial expansion rapidly increases the...

Summary


In this chapter, we have carried on introducing linear regression, extending our example from a simple to a multiple one. We have revisited the previous outputs from the Statsmodels linear functions (the classical statistical approach) and gradient descent (the data science engine).

We started experimenting with models by removing selected predictors and evaluating the impact of such a move from the point of view of the R-squared measure. Meanwhile we also discovered reciprocal correlations between predictors and how to render more linear relations between each predictor and the target variable by intercepting the interactions and by means of polynomial expansion of the features.

In the next chapter, we will progress again and extend the regression model to make it viable for classification tasks, turning it into a probabilistic predictor. The conceptual jump into the world of probability will allow us to complete the range of possible problems where linear models can be successfully...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Regression Analysis with Python
Published in: Feb 2016 Publisher: ISBN-13: 9781785286315
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}