You're reading from Machine Learning with Scala Quick Start Guide

Product type Book

Published in Apr 2019

Publisher Packt

ISBN-13 9781789345070

Pages 220 pages

Edition 1st Edition

Languages

Scala

Concepts

Machine Learning

Authors (2):

Md. Rezaul Karim

Ajay Kumar N

View More author details

Scala for Learning Classification

In the previous chapter, we saw how to develop a predictive model for analyzing insurance severity claims as a regression analysis problem. We applied very simple linear regression, as well as generalized linear regression (GLR).

In this chapter, we'll learn about another supervised learning task, called classification. We'll use widely used algorithms such as logistic regression, Naive Bayes (NB), and Support Vector Machines (SVMs) to analyze and predict whether a customer is likely to cancel the subscription of their telecommunication contract or not.

In particular, we will cover the following topics:

Introduction to classification
Learning classification with a real-life example
Logistic regression for churn prediction
SVM for churn prediction
NB for prediction

Technical requirements

Make sure Scala 2.11.x and Java 1.8.x are installed and configured on your machine.

The code files of this chapters can be found on GitHub:

https://github.com/PacktPublishing/Machine-Learning-with-Scala-Quick-Start-Guide/tree/master/Chapter03

Check out the following video to see the Code in Action:
http://bit.ly/2ZKVrxH

Overview of classification

As a supervised learning task, classification is the problem of identifying which set of observations (sample) belongs to what based on one or more independent variables. This learning process is based on a training set containing observations (or instances) about the class or label of membership. Typically, classification problems are when we are training a model to predict quantitative (but discrete) targets, such as spam detection, churn prediction, sentiment analysis, cancer type prediction, and so on.

Suppose we want to develop a predictive model, which will predict whether a student is competent enough to get admission into computer science based on his/her competency in TOEFL and GRE. Also, suppose we have some historical data in the following range/format:

TOEFL: Between 0 and 100
GRE: Between 0 and 100
Admission: 1 for admitted, 0 if not admitted...

Developing predictive models for churn

Accurate identification of churn possibility can minimize customer defection if you first identify which customers are likely to cancel a subscription to an existing service, and offering a special offer or plan to those customers. When it comes to employee churn prediction and developing a predictive model, where the process is heavily data-driven, machine learning can be used to understand a customer's behavior. This is done by analyzing the following:

Demographic data, such as age, marital status, and job status
Sentiment analysis based on their social media data
Behavior analysis using their browsing clickstream logs
Calling-circle data and support call center statistics

An automated churn analytics pipeline can be developed by following three steps:

First, identify typical tasks to analyze the churn, which will depend on company...

LR for churn prediction

LR is an algorithm for classification, which predicts a binary response. It is similar to linear regression, which we described in Chapter 2, Scala for Regression Analysis, except that it does not predict continuous values—it predicts discrete classes. The loss function is the sigmoid function (or logistic function):

Similar to linear regression, the intuition behind the cost function is to penalize models that have large errors between the real response and the predicted response:

For a given new data point, x, the LR model makes predictions using the following equation:

In the preceding equation, the logistic function is applied to the regression to get the probabilities of it belonging in either class, where z = w^T x and if f(w^T x) > 0.5, the outcome is positive; otherwise, it is negative. This means that the threshold for the classification...

NB for churn prediction

The NB classifier is based on Bayes' theorem, with the following assumptions:

Independence between every pair of features
Feature values are non-negative, such as counts

For example, if cancer is related to age, this can be used to assess the probability that a patient might have cancer. Bayes' theorem is stated mathematically as follows:

In the preceding equation, A and B are events with P (B) ≠ 0. The other terms can be described as follows:

P (A | B) is called the posterior or the conditional probability of observing event A, given that B is true
P (B| A) is the likelihood of event B given that A is true
P(A) is the prior and P(B) is the prior probability, also called marginal likelihood or marginal probability

Gaussian NB is a generalized version of NB that's used for classification, which is based on the binomial distribution...

SVM for churn prediction

SVM is also a population algorithm for classification. SVM is based on the concept of decision planes, which defines the decision boundaries we discussed at the beginning of this chapter. The following diagram shows how the SVM algorithm works:

SVM uses kernel function, which finds the linear hyperplane that separates classes with the maximum margin. The following diagram shows how the data points (that is, support vectors) belonging to two different classes (red versus blue) are separated using the decision boundary based on the maximum margin:

The preceding support vector classifier can be represented as a dot product mathematically, as follows:

If the data to be separated is very high-dimensional, the kernel trick uses the kernel function to transform the data into a higher-dimensional feature space so that they can be linearly separable for classification...

Summary

In this chapter, we have learned about different classical classification algorithms, such as LR, SVM, and NB. Using these algorithms, we predicted whether a customer is likely to cancel their telecommunications subscription or not. We've also discussed what types of data are required to build a successful churn predictive model.

Tree-based and tree ensemble classifiers are really useful and robust, and are widely used for solving both classification and regression tasks. In the next chapter, we will look into developing such classifiers and regressors using tree-based and ensemble techniques such as DT, RF, and GBT, for both classification and regression.