Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Data Science for Marketing Analytics - Second Edition

You're reading from  Data Science for Marketing Analytics - Second Edition

Product type Book
Published in Sep 2021
Publisher Packt
ISBN-13 9781800560475
Pages 636 pages
Edition 2nd Edition
Languages
Authors (3):
Mirza Rahim Baig Mirza Rahim Baig
Profile icon Mirza Rahim Baig
Gururajan Govindan Gururajan Govindan
Profile icon Gururajan Govindan
Vishwesh Ravi Shrimali Vishwesh Ravi Shrimali
Profile icon Vishwesh Ravi Shrimali
View More author details

Table of Contents (11) Chapters

Preface
1. Data Preparation and Cleaning 2. Data Exploration and Visualization 3. Unsupervised Learning and Customer Segmentation 4. Evaluating and Choosing the Best Segmentation Approach 5. Predicting Customer Revenue Using Linear Regression 6. More Tools and Techniques for Evaluating Regression Models 7. Supervised Learning: Predicting Customer Churn 8. Fine-Tuning Classification Algorithms 9. Multiclass Classification Algorithms Appendix

9. Multiclass Classification Algorithms

Overview

In this chapter, you will learn how to identify and implement the algorithms that will help you solve multiclass classification problems in marketing analytics. You will be going through the different types of classifiers and implementing them using the scikit-learn library in Python. Next, you will learn to interpret the micro- and macro-performance metrics that are used to evaluate the performance of a classifier in multiclass problems. Moreover, you will be learning about different sampling techniques to solve the problem of imbalanced data. By the end of this chapter, you will be able to apply different kinds of algorithms and evaluation metrics to solve multiclass classification problems.

Introduction

The online shopping company you worked with in the previous chapter is busy planning a new feature. Currently, whenever customers search for a product on their website or their app, they are shown the desired product along with options to buy the same product from different sellers. For example, if a customer is looking for a washing machine, they'll get options to buy the product from seller A, seller B, seller C, and so on. Now, the company wants to predict which seller a specific user would be more inclined to buy the product from. They'll then make the most preferred seller a part of their "Verified Seller" program, thus showing that seller as the first option to users. This, in turn, will help the company increase the chances that the product will be bought by the user, consequently leading to an increase in the company's profits. The company is first targeting washing machines for this task, and they have shortlisted four suppliers to solve...

Understanding Multiclass Classification

The classification algorithms that you have seen so far were mostly binary classifiers, where the target variable can have only two categorical values or classes. However, there can be scenarios where you have more than two classes to classify samples into. For instance, given data on customer transactions, the marketing team may be tasked with identifying the credit card most suitable for a customer, such as cashback, air miles, gas station, or shopping. In scenarios such as these, where you have more than two classes, a slightly different approach is required compared to binary classification.

Multiclass classification problems can broadly be divided into the following three categories:

  • Multiclass classification: Multiclass classification problems involve classifying instances or samples into one class out of multiple classes (more than two). Each sample is assigned only one label and cannot be assigned more than one label...

Classifiers in Multiclass Classification

Let's consider two problem statements:

  • An online trading company wants to provide additional benefits to its customers. The marketing analytics team has divided the customers into five categories based on when the last time they logged in to the platform was.
  • The same trading company wants to build a recommendation system for mutual funds. This will recommend their users a mutual fund based on the risk they are willing to take, the amount they are planning to invest, and some other features. The number of mutual funds is well above 100.

Before you jump into more detail about the differences between these two problem statements, let's first understand the two common ways of approaching multiclass classification.

Multiclass classification can be implemented by scikit-learn in the following two ways:

One-versus-all (one-versus-rest) classifier: Here, one classifier is fit against one class. For each of the classifiers...

Performance Metrics

The performance metrics in the case of multiclass classification would be the same as what you used for binary classification in the previous chapter, that is, precision, recall, and F1 score, obtained using a confusion matrix.

In the case of a multiclass classification problem, you average out the metrics to find the micro-average or macro-average of precision, recall, and F1 score in a k-class system, where k is the number of classes. Averaging is useful in the case of multiclass classification since you have multiple class labels. This is because each classifier is going to give one class as the prediction; however, in the end, you are just looking for one class. In such cases, an aggregation such as averaging helps in getting the final output.

The macro-average computes the metrics such as precision (PRE), recall (Recall), or F1 score (F1) of each class independently and takes the average (all the classes are treated equally):

Figure 9.4: The macro...

Class-Imbalanced Data

Consider the scenario we discussed at the beginning of the chapter about the online shopping company. Imagine that out of the four shortlisted sellers, one is a very well-known company. In such a situation, there is a high chance of this company getting most of the orders as compared to the rest of the three sellers. If the online shopping company decided to divert all the customers to this seller, for a large number of customers, it would actually end up matching their preference. This is a classic scenario of class imbalance since one class is dominating the rest of the classes in terms of data points. Class imbalance is also seen in fraud detection, anti-money laundering, spam detection, cancer detection, and many other situations.

Before you go into the details about how to deal with class imbalance, let's first see how it can pose a big problem in a marketing analyst's work in the following exercise.

Exercise 9.03: Performing Classification...

Summary

In this chapter, you started off by understanding the importance of multiclass classification problems and the different categories of these problems. You learned about one-versus-one and one-versus-all classifiers and how to implement them using the scikit-learn module in Python. Next, you went through various micro- and macro-averages of performance metrics and used them to understand the impact of class imbalance on the model performance. You also learned about various sampling techniques, especially SMOTE, and implemented them using the imblearn library in Python. At the end of the chapter, you used an imbalanced marketing campaign dataset to perform dataset exploration, data transformation, model training, performance evaluation, and dataset balancing using SMOTE.

This book started with the basics of data science and slowly covered the entire end-to-end data science pipeline for a marketing analyst. While working on a problem statement, depending on the need, you will...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Data Science for Marketing Analytics - Second Edition
Published in: Sep 2021 Publisher: Packt ISBN-13: 9781800560475
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}