Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Mastering Predictive Analytics with R

You're reading from  Mastering Predictive Analytics with R

Product type Book
Published in Jun 2015
Publisher
ISBN-13 9781783982806
Pages 414 pages
Edition 1st Edition
Languages
Authors (2):
Rui Miguel Forte Rui Miguel Forte
Profile icon Rui Miguel Forte
Rui Miguel Forte Rui Miguel Forte
Profile icon Rui Miguel Forte
View More author details

Table of Contents (19) Chapters

Mastering Predictive Analytics with R
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Gearing Up for Predictive Modeling Linear Regression Logistic Regression Neural Networks Support Vector Machines Tree-based Methods Ensemble Methods Probabilistic Graphical Models Time Series Analysis Topic Modeling Recommendation Systems Index

Chapter 7. Ensemble Methods

In this chapter, we take a step back from learning new models and instead think about how several trained models can work together as an ensemble, in order to produce a single model that is more powerful than the individual models involved.

The first type of ensemble that we will study uses different samples of the same data set in order to train multiple versions of the same model. These models then vote on the correct answer for a new observation and an average or majority decision is made, depending on the type of problem. This process is known as bagging, which is short for bootstrap aggregation. Another approach to combine models is boosting. This essentially involves training a chain of models and assigning weights to observations that were incorrectly classified or fell far from their predicted value so that successive models are forced to prioritize them.

As methods, bagging and boosting are fairly general and have been applied with a number of different...

Bagging


The focus of this chapter is on combining the results from different models in order to produce a single model that will outperform individual models on their own. Bagging is essentially an intuitive procedure for combining multiple models trained on the same data set, by using majority voting for classification models and average value for regression models. We'll present this procedure for the classification case, and later show how this is easily extended to handle regression models.

Boosting


Boosting offers an alternative take on the problem of how to combine models together to achieve greater performance. In particular, it is especially suited to weak learners. Weak learners are models that produce an accuracy that is better than a model that randomly guesses, but not by much. One way to create a weak learner is to use a model whose complexity is configurable.

For example, we can train a multilayer perceptron network with a very small number of hidden layer neurons. Similarly, we can train a decision tree but only allow the tree to comprise a single node, resulting in a single split in the input data. This special type of decision tree is known as a stump.

When we looked at bagging, the key idea was to take a set of random bootstrapped samples of the training data and then train multiple versions of the same model using these different samples. In the classical boosting scenario, there is no random component, as all the models use all of the training data.

For classification...

Predicting atmospheric gamma ray radiation


In order to study boosting in action, in this section we'll introduce a new prediction problem from the field of atmospheric physics. More specifically, we will analyze the patterns made by radiation on a telescope camera in order to predict whether a particular pattern came from gamma rays leaking into the atmosphere, or from regular background radiation.

Gamma rays leave distinctive elliptical patterns and so we can create a set of features to describe these. The data set we will use is the MAGIC Gamma Telescope data set, hosted by the UCI Machine Learning repository at http://archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope. Our data consists of 19,020 observations of the following attributes:

Bagging procedure for binary classification

Inputs:

  • data: The input data frame containing the input features and a column with the binary output label

  • M: An integer, representing the number of models that we want to train

Output:

  • models: A set of Μ trained binary classifier models

Method:

1. Create a random sample of size n, where n is the number of observations in the original data set, with replacement. This means that some of the observations from the original training set will be repeated and some...

Predicting complex skill learning with boosting


We will revisit our Skillcraft data set in this section—this time in the context of another boosting technique known as stochastic gradient boosting. The main characteristic of this method is that in every iteration of boosting, we compute a gradient in the direction of the errors that are made by the model trained in the current iteration.

This gradient is then used in order to guide the construction of the model that will be added in the next iteration. Stochastic gradient boosting is commonly used with decision trees, and a good implementation in R can be found in the gbm package, which provides us with the gbm() function. For regression problems, we need to specify the distribution parameter to be gaussian. In addition, we can specify the number of trees we want to build (which is equivalent to the number of iterations of boosting) via the n.trees parameter, as well as a shrinkage parameter that is used to control the algorithm's learning...

Random forests


The final ensemble model that we will discuss in this chapter is unique to tree-based models and is known as the random forest. In a nutshell, the idea behind random forests stems from an observation on bagging trees. Let's suppose that the actual relationship between the features and the target variable can be adequately described with a tree structure. It is quite likely that during bagging with moderately sized bootstrapped samples, we will keep picking the same features to split on high up in the tree.

For example, in our Skillcraft data set, we expect to see APM as the feature that will be chosen at the top of most of the bagged trees. This is a form of tree correlation that essentially impedes our ability to derive the variance reduction benefits from bagging. Put differently, the different tree models that we build are not truly independent of each other because they will have many features and split points in common. Consequently, the averaging process at the end will...

Summary


In this chapter, we deviated from our usual pattern of learning a new type of model and instead focused on techniques to build ensembles of models that we have seen before. We discovered that there are numerous ways to combine models in a meaningful way, each with its own advantages and limitations. Our first technique for building ensemble models was bagging. The central idea behind bagging is that we build multiple versions of the same model using bootstrap samples of the training data. We then average the predictions made by these models in order to construct our overall prediction. By building many different versions of the model we can smooth out errors made due to overfitting and end up with a model that has reduced variance.

A different approach to building model ensembles uses all of the training data and is known as boosting. Here, the defining characteristic is to train a sequence of models but each time we weigh each observation with a different weight depending on whether...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Mastering Predictive Analytics with R
Published in: Jun 2015 Publisher: ISBN-13: 9781783982806
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}

Column name

Type

Definition

FLENGTH

Numerical

The major axis of the ellipse (mm)

FWIDTH

Numerical

The minor axis of the ellipse (mm)

FSIZE

Numerical

Logarithm to the base ten of the sum of the content of all pixels in the camera photo

FCONC...