Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Machine Learning with Scala Quick Start Guide

You're reading from  Machine Learning with Scala Quick Start Guide

Product type Book
Published in Apr 2019
Publisher Packt
ISBN-13 9781789345070
Pages 220 pages
Edition 1st Edition
Languages
Authors (2):
Md. Rezaul Karim Md. Rezaul Karim
Profile icon Md. Rezaul Karim
Ajay Kumar N Ajay Kumar N
Profile icon Ajay Kumar N
View More author details

Table of Contents (9) Chapters

Preface Introduction to Machine Learning with Scala Scala for Regression Analysis Scala for Learning Classification Scala for Tree-Based Ensemble Techniques Scala for Dimensionality Reduction and Clustering Scala for Recommender System Introduction to Deep Learning with Scala Other Books You May Enjoy

Scala for Learning Classification

In the previous chapter, we saw how to develop a predictive model for analyzing insurance severity claims as a regression analysis problem. We applied very simple linear regression, as well as generalized linear regression (GLR).

In this chapter, we'll learn about another supervised learning task, called classification. We'll use widely used algorithms such as logistic regression, Naive Bayes (NB), and Support Vector Machines (SVMs) to analyze and predict whether a customer is likely to cancel the subscription of their telecommunication contract or not.

In particular, we will cover the following topics:

  • Introduction to classification
  • Learning classification with a real-life example
  • Logistic regression for churn prediction
  • SVM for churn prediction
  • NB for prediction

Technical requirements

Overview of classification

As a supervised learning task, classification is the problem of identifying which set of observations (sample) belongs to what based on one or more independent variables. This learning process is based on a training set containing observations (or instances) about the class or label of membership. Typically, classification problems are when we are training a model to predict quantitative (but discrete) targets, such as spam detection, churn prediction, sentiment analysis, cancer type prediction, and so on.

Suppose we want to develop a predictive model, which will predict whether a student is competent enough to get admission into computer science based on his/her competency in TOEFL and GRE. Also, suppose we have some historical data in the following range/format:

  • TOEFL: Between 0 and 100
  • GRE: Between 0 and 100
  • Admission: 1 for admitted, 0 if not admitted...

Developing predictive models for churn

Accurate identification of churn possibility can minimize customer defection if you first identify which customers are likely to cancel a subscription to an existing service, and offering a special offer or plan to those customers. When it comes to employee churn prediction and developing a predictive model, where the process is heavily data-driven, machine learning can be used to understand a customer's behavior. This is done by analyzing the following:

  • Demographic data, such as age, marital status, and job status
  • Sentiment analysis based on their social media data
  • Behavior analysis using their browsing clickstream logs
  • Calling-circle data and support call center statistics

An automated churn analytics pipeline can be developed by following three steps:

  1. First, identify typical tasks to analyze the churn, which will depend on company...

LR for churn prediction

LR is an algorithm for classification, which predicts a binary response. It is similar to linear regression, which we described in Chapter 2, Scala for Regression Analysis, except that it does not predict continuous values—it predicts discrete classes. The loss function is the sigmoid function (or logistic function):

Similar to linear regression, the intuition behind the cost function is to penalize models that have large errors between the real response and the predicted response:

For a given new data point, x, the LR model makes predictions using the following equation:

In the preceding equation, the logistic function is applied to the regression to get the probabilities of it belonging in either class, where z = wT x and if f(wT x) > 0.5, the outcome is positive; otherwise, it is negative. This means that the threshold for the classification...

NB for churn prediction

The NB classifier is based on Bayes' theorem, with the following assumptions:

  • Independence between every pair of features
  • Feature values are non-negative, such as counts

For example, if cancer is related to age, this can be used to assess the probability that a patient might have cancer. Bayes' theorem is stated mathematically as follows:

In the preceding equation, A and B are events with P (B) ≠ 0. The other terms can be described as follows:

  • P (A | B) is called the posterior or the conditional probability of observing event A, given that B is true
  • P (B| A) is the likelihood of event B given that A is true
  • P(A) is the prior and P(B) is the prior probability, also called marginal likelihood or marginal probability

Gaussian NB is a generalized version of NB that's used for classification, which is based on the binomial distribution...

SVM for churn prediction

SVM is also a population algorithm for classification. SVM is based on the concept of decision planes, which defines the decision boundaries we discussed at the beginning of this chapter. The following diagram shows how the SVM algorithm works:

SVM uses kernel function, which finds the linear hyperplane that separates classes with the maximum margin. The following diagram shows how the data points (that is, support vectors) belonging to two different classes (red versus blue) are separated using the decision boundary based on the maximum margin:

The preceding support vector classifier can be represented as a dot product mathematically, as follows:

If the data to be separated is very high-dimensional, the kernel trick uses the kernel function to transform the data into a higher-dimensional feature space so that they can be linearly separable for classification...

Summary

In this chapter, we have learned about different classical classification algorithms, such as LR, SVM, and NB. Using these algorithms, we predicted whether a customer is likely to cancel their telecommunications subscription or not. We've also discussed what types of data are required to build a successful churn predictive model.

Tree-based and tree ensemble classifiers are really useful and robust, and are widely used for solving both classification and regression tasks. In the next chapter, we will look into developing such classifiers and regressors using tree-based and ensemble techniques such as DT, RF, and GBT, for both classification and regression.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Machine Learning with Scala Quick Start Guide
Published in: Apr 2019 Publisher: Packt ISBN-13: 9781789345070
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}