Packt+ | Advance your knowledge in tech

You're reading from Regression Analysis with Python

Product typeBook

Published inFeb 2016

Reading LevelIntermediate

Publisher

ISBN-139781785286315

Edition1st Edition

Languages

Python

Tools

SciPy Scikit-learn

Concepts

Statistics

Authors (2):

Luca Massaron

Alberto Boschetti

View More author details

Chapter 8. Advanced Regression Methods

In this chapter, we will introduce some advanced regression methods. Since many of them are very complex, we will skip most of the mathematical formulations, providing the readers instead with the ideas underneath the techniques and some practical advice, such as explaining when and when not to use the technique. We will illustrate:

Least Angle Regression (LARS)
Bayesian regression
SGD classification with hinge loss (note that this is not a regressor, it's a classifier)
Regression trees
Ensemble of regressors (bagging and boosting)
Gradient Boosting Regressor with Least Angle Deviation

Least Angle Regression

Although very similar to Lasso (seen in Chapter 6, Achieving Generalization), Least Angle Regression, or simply LARS, is a regression algorithm that, in a fast and smart way, selects the best features to use in the model, even though they're very closely correlated to each other. LARS is an evolution of the Forward Selection (also called Forward Stepwise Regression) algorithm and of the Forward Stagewise Regression algorithm.

Here is how the Forward Selection algorithm works, based on the hypothesis that all the variables, including the target one, have been previously normalized:

Of all the possible predictors for a problem, the one with the largest absolute correlation with the target variable y is selected (that is, the one with the most explanatory capability). Let's call it p₁.
All the other predictors are now projected onto p₁ Least Angle Regression, and the projection is removed, creating a vector of residuals orthogonal to p₁.
Step 1 is repeated on the residual...

Bayesian regression

Bayesian regression is similar to linear regression, as seen in Chapter 3, Multiple Regression in Action, but, instead of predicting a value, it predicts its probability distribution. Let's start with an example: given X, the training observation matrix, and y, the target vector, linear regression creates a model (that is a series of coefficients) that fits the line that has the minimal error with the training points. Then, when a new observation arrives, the model is applied to that point, and a predicted value is outputted. That's the only output from linear regression, and no conclusions can be made as to whether the prediction, for that specific point, is accurate or not. Let's take a very simple example in code: the observed phenomenon has only one feature, and the number of observations is just 10:

In:
from sklearn.datasets import make_classification
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=10, n_features=1, n_informative=1,...

SGD classification with hinge loss

In Chapter 4, Logistic Regression we explored a classifier based on a regressor, logistic regression. Its goal was to fit the best probabilistic function associated with the probability of one point to be classified with a label. Now, the core function of the algorithm considers all the training points of the dataset: what if it's only built on the boundary ones? That's exactly the case with the linear Support Vector Machine (SVM) classifier, where a linear decision plane is drawn by only considering the points close to the separation boundary itself.

Beyond working on the support vectors (the closest points to the boundary), SVM uses a new decision loss, called hinge. Here's its formulation:

Where t is the intended label of the point x and w the set of weights in the classifier. The hinge loss is also sometimes called softmax, because it's actually a clipped max. In this formula, just the boundary points (that is, the support vectors) are used.

In the first...

Regression trees (CART)

A very common learner, recently used very much due to its speed, is the regression tree. It's a non-linear learner, can work with both categorical and numerical features, and can be used alternately for classification or regression; that's why it's often called Classification and Regression Tree (CART). Here, in this section, we will see how regression trees work.

A tree is composed of a series of nodes that split the branch into two children. Each branch, then, can go in another node, or remain a leaf with the predicted value (or class).

Starting from the root (that is, the whole dataset):

The best feature with which to split the dataset, F1, is identified as well as the best splitting value. If the feature is numerical, the splitting value is a threshold T1: in this case, the left child branch will be the set of observations where F1 is below T1, and the right one is the set of observations where F1 is greater than, or equal to, T1. If the feature is categorical, the...

Bagging and boosting

Bagging and boosting are two techniques used to combine learners. These techniques are classified under the generic name of ensembles (or meta-algorithm) because the ultimate goal is actually to ensemble weak learners to create a more sophisticated, but more accurate, model. There is no formal definition of a weak learner, but ideally it's a fast, sometimes linear model that not necessarily produces excellent results (it suffices that they are just better than a random guess). The final ensemble is typically a non-linear learner whose performance increases with the number of weak learners in the model (note that the relation is strictly non-linear). Let's now see how they work.

Bagging

Bagging stands for Bootstrap Aggregating, and its ultimate goal is to reduce variance by averaging weak learners' results. Let's now see the code; we will explain how it works. As a dataset, we will reuse the Boston dataset (and its validation split) from the previous example:

In:
from sklearn...

Gradient Boosting Regressor with LAD

More than a new technique, this is an ensemble of technologies already seen in this book, with a new loss function, the Least Absolute Deviations (LAD). With respect to the least square function, seen in the previous chapter, with LAD the L1 norm of the error is computed.

Regressor learners based on LAD are typically robust but unstable, because of the multiple minima of the loss function (leading therefore to multiple best solutions). Alone, this loss function seems to bear little value, but paired with gradient boosting, it creates a very stable regressor, due to the fact that boosting overcomes LAD regression limitations. With the code, this is very simple to achieve:

In:
from sklearn.ensemble import GradientBoostingRegressor

regr = GradientBoostingRegressor('lad',
                                 n_estimators=500, 
                                 learning_rate=0.1, 
                                 random_state=101)
regr.fit(X_train, y_train)
mean_absolute_error...

Summary

This chapter concludes the long journey around regression methods we have taken throughout this book. We have seen how to deal with different kinds of regression modeling, how to pre-process data, and how to evaluate the results. In the present chapter, we glanced at some cutting-edge techniques. In the next, and last, chapter of the book, we apply regression in real-world examples and invite you to experiment with some concrete examples.

The rest of the chapter is locked

You have been reading a chapter from

Regression Analysis with Python

Published in: Feb 2016Publisher: ISBN-13: 9781785286315

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Luca Massaron

Having joined Kaggle over 10 years ago, Luca Massaron is a Kaggle Grandmaster in discussions and a Kaggle Master in competitions and notebooks. In Kaggle competitions he reached no. 7 in the worldwide rankings. On the professional side, Luca is a data scientist with more than a decade of experience in transforming data into smarter artifacts, solving real-world problems, and generating value for businesses and stakeholders. He is a Google Developer Expert(GDE) in machine learning and the author of best-selling books on AI, machine learning, and algorithms.
Read more about Luca Massaron

Alberto Boschetti

Alberto Boschetti is a data scientist with expertise in signal processing and statistics. He holds a Ph.D. in telecommunication engineering and currently lives and works in London. In his work projects, he faces challenges ranging from natural language processing (NLP) and behavioral analysis to machine learning and distributed processing. He is very passionate about his job and always tries to stay updated about the latest developments in data science technologies, attending meet-ups, conferences, and other events.
Read more about Alberto Boschetti

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages