In this chapter, we are going to present one of the most intuitive ways to create a predictive model—using the concept of a tree. Tree-based models, often also known as decision tree models, are successfully used to handle both regression and classification type problems. We'll explore both scenarios in this chapter, and we'll be looking at a range of different algorithms that are effective in training these models. We will also learn about a number of useful properties that these models possess, such as their ability to handle missing data and the fact that they are highly interpretable.
You're reading from Mastering Predictive Analytics with R
A decision tree is a model with a very straightforward structure that allows us to make a prediction on an output variable, based on a series of rules arranged in a tree-like structure. The output variable that we can model can be categorical, allowing us to use a decision tree to handle classification problems. Equally, we can use decision trees to predict a numerical output, and in this way we'll also be able to tackle problems where the predictive task is a regression task.
Decision trees consist of a series of split points, often referred to as nodes. In order to make a prediction using a decision tree, we start at the top of the tree at a single node known as the root node. The root node is a decision or split point, because it places a condition in terms of the value of one of the input features, and based on this decision we know whether to continue on with the left part of the tree or with the right part of the tree. We repeat this process of choosing...
Now that we have understood how a decision tree works, we'll want to address the issue of how we can train one using some data. There are several algorithms that have been proposed to build decision trees, and in this section we will present a few of the most well-known. One thing we should bear in mind is that whatever tree-building algorithm we choose, we will have to answer four fundamental questions:
For every node (including the root node), how should we choose the input feature to split on and, given this feature, what is the value of the split point?
How do we decide whether a node should become a leaf node or if we should make another split point?
How deep should our tree be allowed to become?
Once we arrive at a leaf node, what value should we predict?
Our first example showcasing tree-based methods in R will operate on a synthetic data set that we have created. The data set can be generated using commands in the companion R file for this chapter, available from the publisher. The data consists of 287 observations of two input features, x1
and x2
.
The output variable is a categorical variable with three possible classes: a
, b
, and c
. If we follow the commands in the code file, we will end up with a data frame in R, mcdf
:
> head(mcdf, n = 5) x1 x2 class 1 18.58213 12.03106 a 2 22.09922 12.36358 a 3 11.78412 12.75122 a 4 23.41888 13.89088 a 5 16.37667 10.32308 a
This problem is actually very simple because on the one hand, we have a very small data set with only two features, and on the other because the classes happen to be quite well separated in the feature space, something that is very rare. Nonetheless, our objective in this section is to demonstrate...
In this section, we will study the problem of predicting whether a particular banknote is genuine or whether it has been forged. The banknote authentication data set is hosted at https://archive.ics.uci.edu/ml/datasets/banknote+authentication. The creators of the data set have taken specimens of both genuine and forged banknotes and photographed them with an industrial camera. The resulting grayscale image was processed using a type of time-frequency transformation known as a wavelet transform. Three features of this transform are constructed, and along with the image entropy, they make up the four features in total for this binary classification task.
Column name |
Type |
Definition |
---|---|---|
|
Numerical |
Variance of the wavelet-transformed image |
|
Numerical |
Skewness of the wavelet-transformed image |
|
Numerical |
Curtosis of the wavelet-transformed image |
|
Numerical |
Entropy of the image |
|
Binary |
Authenticity... |