Reader small image

You're reading from  Regression Analysis with Python

Product typeBook
Published inFeb 2016
Reading LevelIntermediate
Publisher
ISBN-139781785286315
Edition1st Edition
Languages
Concepts
Right arrow
Authors (2):
Luca Massaron
Luca Massaron
author image
Luca Massaron

Having joined Kaggle over 10 years ago, Luca Massaron is a Kaggle Grandmaster in discussions and a Kaggle Master in competitions and notebooks. In Kaggle competitions he reached no. 7 in the worldwide rankings. On the professional side, Luca is a data scientist with more than a decade of experience in transforming data into smarter artifacts, solving real-world problems, and generating value for businesses and stakeholders. He is a Google Developer Expert(GDE) in machine learning and the author of best-selling books on AI, machine learning, and algorithms.
Read more about Luca Massaron

Alberto Boschetti
Alberto Boschetti
author image
Alberto Boschetti

Alberto Boschetti is a data scientist with expertise in signal processing and statistics. He holds a Ph.D. in telecommunication engineering and currently lives and works in London. In his work projects, he faces challenges ranging from natural language processing (NLP) and behavioral analysis to machine learning and distributed processing. He is very passionate about his job and always tries to stay updated about the latest developments in data science technologies, attending meet-ups, conferences, and other events.
Read more about Alberto Boschetti

View More author details
Right arrow

Chapter 8. Advanced Regression Methods

In this chapter, we will introduce some advanced regression methods. Since many of them are very complex, we will skip most of the mathematical formulations, providing the readers instead with the ideas underneath the techniques and some practical advice, such as explaining when and when not to use the technique. We will illustrate:

  • Least Angle Regression (LARS)

  • Bayesian regression

  • SGD classification with hinge loss (note that this is not a regressor, it's a classifier)

  • Regression trees

  • Ensemble of regressors (bagging and boosting)

  • Gradient Boosting Regressor with Least Angle Deviation

Least Angle Regression


Although very similar to Lasso (seen in Chapter 6, Achieving Generalization), Least Angle Regression, or simply LARS, is a regression algorithm that, in a fast and smart way, selects the best features to use in the model, even though they're very closely correlated to each other. LARS is an evolution of the Forward Selection (also called Forward Stepwise Regression) algorithm and of the Forward Stagewise Regression algorithm.

Here is how the Forward Selection algorithm works, based on the hypothesis that all the variables, including the target one, have been previously normalized:

  1. Of all the possible predictors for a problem, the one with the largest absolute correlation with the target variable y is selected (that is, the one with the most explanatory capability). Let's call it p1.

  2. All the other predictors are now projected onto p1 Least Angle Regression, and the projection is removed, creating a vector of residuals orthogonal to p1.

  3. Step 1 is repeated on the residual...

Bayesian regression


Bayesian regression is similar to linear regression, as seen in Chapter 3, Multiple Regression in Action, but, instead of predicting a value, it predicts its probability distribution. Let's start with an example: given X, the training observation matrix, and y, the target vector, linear regression creates a model (that is a series of coefficients) that fits the line that has the minimal error with the training points. Then, when a new observation arrives, the model is applied to that point, and a predicted value is outputted. That's the only output from linear regression, and no conclusions can be made as to whether the prediction, for that specific point, is accurate or not. Let's take a very simple example in code: the observed phenomenon has only one feature, and the number of observations is just 10:

In:
from sklearn.datasets import make_classification
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=10, n_features=1, n_informative=1,...

SGD classification with hinge loss


In Chapter 4, Logistic Regression we explored a classifier based on a regressor, logistic regression. Its goal was to fit the best probabilistic function associated with the probability of one point to be classified with a label. Now, the core function of the algorithm considers all the training points of the dataset: what if it's only built on the boundary ones? That's exactly the case with the linear Support Vector Machine (SVM) classifier, where a linear decision plane is drawn by only considering the points close to the separation boundary itself.

Beyond working on the support vectors (the closest points to the boundary), SVM uses a new decision loss, called hinge. Here's its formulation:

Where t is the intended label of the point x and w the set of weights in the classifier. The hinge loss is also sometimes called softmax, because it's actually a clipped max. In this formula, just the boundary points (that is, the support vectors) are used.

In the first...

Regression trees (CART)


A very common learner, recently used very much due to its speed, is the regression tree. It's a non-linear learner, can work with both categorical and numerical features, and can be used alternately for classification or regression; that's why it's often called Classification and Regression Tree (CART). Here, in this section, we will see how regression trees work.

A tree is composed of a series of nodes that split the branch into two children. Each branch, then, can go in another node, or remain a leaf with the predicted value (or class).

Starting from the root (that is, the whole dataset):

  1. The best feature with which to split the dataset, F1, is identified as well as the best splitting value. If the feature is numerical, the splitting value is a threshold T1: in this case, the left child branch will be the set of observations where F1 is below T1, and the right one is the set of observations where F1 is greater than, or equal to, T1. If the feature is categorical, the...

Bagging and boosting


Bagging and boosting are two techniques used to combine learners. These techniques are classified under the generic name of ensembles (or meta-algorithm) because the ultimate goal is actually to ensemble weak learners to create a more sophisticated, but more accurate, model. There is no formal definition of a weak learner, but ideally it's a fast, sometimes linear model that not necessarily produces excellent results (it suffices that they are just better than a random guess). The final ensemble is typically a non-linear learner whose performance increases with the number of weak learners in the model (note that the relation is strictly non-linear). Let's now see how they work.

Bagging

Bagging stands for Bootstrap Aggregating, and its ultimate goal is to reduce variance by averaging weak learners' results. Let's now see the code; we will explain how it works. As a dataset, we will reuse the Boston dataset (and its validation split) from the previous example:

In:
from sklearn...

Gradient Boosting Regressor with LAD


More than a new technique, this is an ensemble of technologies already seen in this book, with a new loss function, the Least Absolute Deviations (LAD). With respect to the least square function, seen in the previous chapter, with LAD the L1 norm of the error is computed.

Regressor learners based on LAD are typically robust but unstable, because of the multiple minima of the loss function (leading therefore to multiple best solutions). Alone, this loss function seems to bear little value, but paired with gradient boosting, it creates a very stable regressor, due to the fact that boosting overcomes LAD regression limitations. With the code, this is very simple to achieve:

In:
from sklearn.ensemble import GradientBoostingRegressor

regr = GradientBoostingRegressor('lad',
                                 n_estimators=500, 
                                 learning_rate=0.1, 
                                 random_state=101)
regr.fit(X_train, y_train)
mean_absolute_error...

Summary


This chapter concludes the long journey around regression methods we have taken throughout this book. We have seen how to deal with different kinds of regression modeling, how to pre-process data, and how to evaluate the results. In the present chapter, we glanced at some cutting-edge techniques. In the next, and last, chapter of the book, we apply regression in real-world examples and invite you to experiment with some concrete examples.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Regression Analysis with Python
Published in: Feb 2016Publisher: ISBN-13: 9781785286315
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Luca Massaron

Having joined Kaggle over 10 years ago, Luca Massaron is a Kaggle Grandmaster in discussions and a Kaggle Master in competitions and notebooks. In Kaggle competitions he reached no. 7 in the worldwide rankings. On the professional side, Luca is a data scientist with more than a decade of experience in transforming data into smarter artifacts, solving real-world problems, and generating value for businesses and stakeholders. He is a Google Developer Expert(GDE) in machine learning and the author of best-selling books on AI, machine learning, and algorithms.
Read more about Luca Massaron

author image
Alberto Boschetti

Alberto Boschetti is a data scientist with expertise in signal processing and statistics. He holds a Ph.D. in telecommunication engineering and currently lives and works in London. In his work projects, he faces challenges ranging from natural language processing (NLP) and behavioral analysis to machine learning and distributed processing. He is very passionate about his job and always tries to stay updated about the latest developments in data science technologies, attending meet-ups, conferences, and other events.
Read more about Alberto Boschetti