Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Mastering Predictive Analytics with R

You're reading from  Mastering Predictive Analytics with R

Product type Book
Published in Jun 2015
Publisher
ISBN-13 9781783982806
Pages 414 pages
Edition 1st Edition
Languages
Authors (2):
Rui Miguel Forte Rui Miguel Forte
Profile icon Rui Miguel Forte
Rui Miguel Forte Rui Miguel Forte
Profile icon Rui Miguel Forte
View More author details

Table of Contents (19) Chapters

Mastering Predictive Analytics with R
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Gearing Up for Predictive Modeling Linear Regression Logistic Regression Neural Networks Support Vector Machines Tree-based Methods Ensemble Methods Probabilistic Graphical Models Time Series Analysis Topic Modeling Recommendation Systems Index

Chapter 5. Support Vector Machines

In this chapter, we are going to take a fresh look at nonlinear predictive models by introducing support vector machines. Support vector machines, often abbreviated as SVMs, are very commonly used for classification problems, although there are certainly ways to perform function approximation and regression tasks with them. In this chapter, we will focus on the more typical case of their role in classification. To do this, we'll first present the notion of maximal margin classification, which presents an alternative formulation of how to choose between many possible classification boundaries and differs from approaches such as maximum likelihood, which we have seen thus far. We'll introduce the related idea of support vectors and how, together with maximal margin classification, we can obtain a linear model in the form of a support vector classifier. Finally, we'll present how we can generalize these ideas in order to introduce nonlinearity through the...

Maximal margin classification


We'll begin this chapter by returning to a situation that should be very familiar by now: the binary classification task. Once again, we'll be thinking about the problem of how to design a model that will correctly predict whether an observation belongs to one of two possible classes. We've already seen that this task is simplest when the two classes are linearly separable, that is, when we can find a separating hyperplane (a plane in a multidimensional space) in the space of our features so that all the observations on one side of the hyperplane belong to one class and all the observations that lie on the other side belong to the second class. Depending on the structure, assumptions, and optimizing criterion that our particular model uses, we could end up with one of infinitely many such hyperplanes.

Let's visualize this scenario using some data in a two-dimensional feature space, where the separating hyperplane is just a separating line:

In the preceding diagram...

Support vector classification


We need our data to be linearly separable in order to classify with a maximal margin classifier. When our data is not linearly separable, we can still use the notion of support vectors that define a margin, but this time, we will allow some examples to be misclassified. Thus, we essentially define a soft margin in that some of the observations in our data set can violate the constraint that they need to be at least as far as the margin from the separating hyperplane. It is also important to note that sometimes, we may want to use a soft margin even for linearly separable data. The reason for this is in order to limit the degree of overfitting the data. Note that the larger the margin, the more confident we are about our ability to correctly classify new observations, because the classes are further apart from each other in our training data. If we achieve separation using a very small margin, we are less confident about our ability to correctly classify our...

Kernels and support vector machines


So far, we've introduced the notion of maximum margin classification under linearly separable conditions and its extension to the support vector classifier, which still uses a hyperplane as the separating boundary but handles data sets that are not linearly separable by specifying a budget for tolerating errors. The observations that are on or within the margin, or are misclassified by the support vector classifier are support vectors. The critical role that these play in the positioning of the decision boundary was also seen in an alternative model representation of the support vector classifier that uses inner products.

What is common in the situations that we have seen so far in this chapter is that our model is always linear in terms of the input features. We've seen that the ability to create models that implement nonlinear boundaries between the classes to be separated is far more flexible in terms of the different kinds of underlying target functions...

Predicting chemical biodegration


In this section, we are going to use R's e1071 package to try out the models we've discussed on a real-world data set. As our first example, we have chosen the QSAR biodegration data set, which can be found at https://archive.ics.uci.edu/ml/datasets/QSAR+biodegradation#. This is a data set containing 41 numerical variables that describe the molecular composition and properties of 1055 chemicals. The modeling task is to predict whether a particular chemical will be biodegradable based on these properties. Example properties are the percentages of carbon, nitrogen, and oxygen atoms as well as the number of heavy atoms in the molecule. These features are highly specialized and sufficiently numerous, so a full listing won't be given here. The complete list and further details of the quantities involved can be found on the website. For now, we've downloaded the data into a bdf data frame:

> bdf <- read.table("biodeg.csv", sep = ";", quote = "\"")
> head...

Cross-validation


We've seen that many times in the real world, we come across a situation where we don't have an available test data set that we can use in order to measure the performance of our model on unseen data. The most typical reason is that we have very few data overall and want to use all of it to train our model. Another situation is that we want to keep a sample of the data as a validation set to tune some model meta parameters such as cost and gamma for SVMs with radial kernels, and as a result, we've already reduced our starting data and don't want to reduce it further.

Whatever the reason for the lack of a test data set, we already know that we should never use our training data as a measure of model performance and generalization because of the problem of overfitting. This is especially relevant for powerful and expressive models such as the nonlinear models of neural networks and SVMs with radial kernels that are often capable of approximating the training data very closely...

Predicting credit scores


In this section, we will explore another data set; this time, in the field of banking and finance. The particular data set in question is known as the German Credit Dataset and is also hosted by the UCI Machine Learning Repository. The link to the data is https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29.

The observations in the data set are loan applications made by individuals at a bank. The goal of the data is to determine whether an application constitutes a high credit risk.

Multiclass classification with support vector machines


Just like with logistic regression, we've seen that the basic premise behind the support vector machine is that it is designed to handle two classes. Of course, we often have situations where we would like to be able to handle a greater number of classes, such as when classifying different plant species based on a variety of physical characteristics. One way to do this is the one versus all approach. Here, if we have K classes, we create K SVM classifiers, and for each classifier, we are attempting to distinguish one particular class from all the rest. To determine the best class to pick, we assign the class for which the observation produces the highest distance from the separating hyperplane, thus lying farthest away from all other classes. More formally, we pick the class for which our linear feature combination has a maximum value across all the different classifiers.

An alternative approach is known as the (balanced) one versus one...

Summary


In this chapter, we presented the maximal margin hyperplane as a decision boundary that is designed to separate two classes by finding the maximum distance from either of them. When the two classes are linearly separable, this creates a situation where the space between the two classes is evenly split.

We've seen that there are circumstances where this is not always desirable, such as when the classes are close to each other because of a few observations. An improvement to this approach is the support vector classifier that allows us to tolerate a few margin violations, or even misclassifications, in order to obtain a more stable result. This also allows us to handle classes that aren't linearly separable. The form of the support vector classifier can be written in terms of inner products between the observation that is being classified and the support vectors. This transforms our feature space from p features into as many features as we have support vectors. Using kernel functions...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Mastering Predictive Analytics with R
Published in: Jun 2015 Publisher: ISBN-13: 9781783982806
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}

Column name

Type

Definition

checking

Categorical

The status of the existing checking account

duration

Numerical

The duration in months

creditHistory

Categorical

The applicant's credit history

purpose

Categorical

The purpose of the loan

credit

Numerical

The credit amount

savings

Categorical

Savings account/bonds

employment

Categorical

Present employment since

installmentRate

Numerical

The installment rate (as a percentage of disposable income...