Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
The Supervised Learning Workshop - Second Edition

You're reading from  The Supervised Learning Workshop - Second Edition

Product type Book
Published in Feb 2020
Publisher Packt
ISBN-13 9781800209046
Pages 532 pages
Edition 2nd Edition
Languages
Authors (4):
Blaine Bateman Blaine Bateman
Profile icon Blaine Bateman
Ashish Ranjan Jha Ashish Ranjan Jha
Profile icon Ashish Ranjan Jha
Benjamin Johnston Benjamin Johnston
Profile icon Benjamin Johnston
Ishita Mathur Ishita Mathur
Profile icon Ishita Mathur
View More author details

6. Ensemble Modeling

Overview

This chapter examines different ways of performing ensemble modeling, along with its benefits and limitations. By the end of the chapter, you will be able to recognize the underfitting and overfitting of data on machine learning models. You will also be able to devise a bagging classifier using decision trees and implement adaptive boosting and gradient boosting models. Finally, you will be able to build a stacked ensemble using a number of classifiers.

Introduction

In the previous chapters, we discussed the two types of supervised learning problems: regression and classification. We looked at a number of algorithms for each type and delved into how those algorithms worked.

But there are times when these algorithms, no matter how complex they are, just don't seem to perform well on the data that we have. There could be a variety of causes and reasons for this – perhaps the data is not good enough, perhaps there really is no trend where we are trying to find one, or perhaps the model itself is too complex.

Wait. What?! How can a model being too complex be a problem? If a model is too complex and there isn't enough data, the model could fit so well to the data that it learns even the noise and outliers, which is not what we want.

Often, where a single complex algorithm can give us a result that is way off from actual results, aggregating the results from a group of models can give us a result that&apos...

One-Hot Encoding

So, what is one-hot encoding? Well, in machine learning, we sometimes have categorical input features such as name, gender, and color. Such features contain label values rather than numeric values, such as John and Tom for name, male and female for gender, and red, blue, and green for color. Here, blue is one such label for the categorical feature – color. All machine learning models can work with numeric data, but many machine learning models cannot work with categorical data because of the way their underlying algorithms are designed. For example, decision trees can work with categorical data, but logistic regression cannot.

In order to still make use of categorical features with models such as logistic regression, we transform such features into a usable numeric format. Figure 6.1 shows an example of what this transformation looks like:

Figure 6.1: One-hot encoding

Figure 6.2 shows how one-hot encoding changes the dataset, once...

Overfitting and Underfitting

Let's say we fit a supervised learning algorithm to our data and subsequently use the model to perform a prediction on a hold-out validation set. The performance of this model will be considered to be good based on how well it generalizes, that is, how well it makes predictions for data points in an independent validation dataset.

Sometimes, we find that the model is not able to make accurate predictions and gives poor performance on the validation data. This poor performance can be the result of a model that is too simple to model the data appropriately, or a model that is too complex to generalize to the validation dataset. In the former case, the model has a high bias and results in underfitting, while, in the latter case, the model has a high variance and results in overfitting.

Bias

The bias in the prediction of a machine learning model represents the difference between the predicted target value and the true target value of a data point...

Bagging

The term bagging is derived from a technique called bootstrap aggregation. In order to implement a successful predictive model, it's important to know in what situation we could benefit from using bootstrapping methods to build ensemble models. Such models are used extensively both in industry as well as academia.

One such application would be that these models can be used for the quality assessment of Wikipedia articles. Features such as article_length, number_of_references, number_of_headings, and number_of_images are used to build a classifier that classifies Wikipedia articles into low- or high-quality articles. Out of the several models that were tried for this task, the random forest model – a well-known bagging-based ensemble classifier that we will discuss in our next section – outperforms all other models such as SVM, logistic regression, and even neural networks, with the best precision and recall scores of 87.3% and 87.2%, respectively. This...

Bootstrapping

The bootstrap method essentially refers to drawing multiple samples (each known as a resample) from the dataset consisting of randomly chosen data points, where there can be an overlap in the data points contained in each resample and each data point has an equal probability of being selected from the overall dataset:

Figure 6.7: Randomly choosing data points

From the previous diagram, we can see that each of the five bootstrapped samples taken from the primary dataset is different and has different characteristics. As such, training models on each of these resamples would result in different predictions.

The following are the advantages of bootstrapping:

  • Each resample can contain different characteristics from that of the entire dataset, allowing us a different perspective of how the data behaves.
  • Algorithms that make use of bootstrapping are powerfully built and handle unseen data better, especially on smaller datasets that have...

Boosting

The second ensemble technique we'll be looking at is boosting, which involves incrementally training new models that focus on the misclassified data points in the previous model and utilizes weighted averages to turn weak models (underfitting models having a high bias) into stronger models. Unlike bagging, where each base estimator could be trained independently of the others, the training of each base estimator in a boosted algorithm depends on the previous one.

Although boosting also uses the concept of bootstrapping, it's done differently from bagging, since each sample of data is weighted, implying that some bootstrapped samples can be used for training more often than other samples. When training each model, the algorithm keeps track of which features are most useful and which data samples have the most prediction error; these are given higher weightage and are considered to require more iterations to properly train the model.

When predicting the output...

Stacking

Stacking, or stacked generalization, is also called meta ensembling. It is a model ensembling technique that consists of combining data from multiple models' predictions and using them as features to generate a new model. The stacked model will most likely outperform each of the individual models due to the smoothing effect it adds, as well as due to its ability to "choose" the base model that performs best in certain scenarios. Keeping this in mind, stacking is usually most effective when each of the base models is significantly different from each other.

Stacking is widely used in real-world applications. One popular example comes from the well-known Netflix competition whose two top performers built solutions that were based on stacking models. Netflix is a well-known streaming platform and the competition was about building the best recommendation engine. The winning algorithm was based on feature-weighted-linear-stacking, which basically had meta features...

Summary

In this chapter, we started off with a discussion on overfitting and underfitting and how they can affect the performance of a model on unseen data. The chapter looked at ensemble modeling as a solution for these models and went on to discuss different ensemble methods that could be used, and how they could decrease the overall bias or variance encountered when making predictions. We first discussed bagging algorithms and introduced the concept of bootstrapping.

Then, we looked at random forest as a classic example of a bagged ensemble and solved exercises that involved building a bagging classifier and random forest classifier on the previously seen Titanic dataset. We then moved on to discussing boosting algorithms, how they successfully reduce bias in the system, and gained an understanding of how to implement adaptive boosting and gradient boosting. The last ensemble method we discussed was stacking, which, as we saw from the exercise, gave us the best accuracy score...

lock icon The rest of the chapter is locked
You have been reading a chapter from
The Supervised Learning Workshop - Second Edition
Published in: Feb 2020 Publisher: Packt ISBN-13: 9781800209046
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}