Reader small image

You're reading from  Data Science for Marketing Analytics - Second Edition

Product typeBook
Published inSep 2021
Reading LevelIntermediate
PublisherPackt
ISBN-139781800560475
Edition2nd Edition
Languages
Tools
Concepts
Right arrow
Authors (3):
Mirza Rahim Baig
Mirza Rahim Baig
author image
Mirza Rahim Baig

Mirza Rahim Baig is a Data Science and Artificial Intelligence leader with over 13 years of experience across e-commerce, healthcare, and marketing. He currently holds the position of leading Product Analytics at Marketing Services for Zalando, Europe's largest online fashion platform. In addition, he serves as a Subject Matter Expert and faculty member for MS level programs at prominent Ed-Tech platforms and institutes in India. He is also the lead author of two books, 'Data Science for Marketing Analytics' and 'The Deep Learning Workshop,' both published by Packt. He is recognized as a thought leader in my field and frequently participates as a guest speaker at various forums.
Read more about Mirza Rahim Baig

Gururajan Govindan
Gururajan Govindan
author image
Gururajan Govindan

Gururajan Govindan is a data scientist, intrapreneur, and trainer with more than seven years of experience working across domains such as finance and insurance. He is also an author of The Data Analysis Workshop, a book focusing on data analytics. He is well known for his expertise in data-driven decision-making and machine learning with Python.
Read more about Gururajan Govindan

Vishwesh Ravi Shrimali
Vishwesh Ravi Shrimali
author image
Vishwesh Ravi Shrimali

Vishwesh Ravi Shrimali graduated from BITS Pilani, where he studied mechanical engineering, in 2018. He also completed his Masters in Machine Learning and AI from LJMU in 2021. He has authored - Machine learning for OpenCV (2nd edition), Computer Vision Workshop and Data Science for Marketing Analytics (2nd edition) by Packt. When he is not writing blogs or working on projects, he likes to go on long walks or play his acoustic guitar.
Read more about Vishwesh Ravi Shrimali

View More author details
Right arrow

9. Multiclass Classification Algorithms

Overview

In this chapter, you will learn how to identify and implement the algorithms that will help you solve multiclass classification problems in marketing analytics. You will be going through the different types of classifiers and implementing them using the scikit-learn library in Python. Next, you will learn to interpret the micro- and macro-performance metrics that are used to evaluate the performance of a classifier in multiclass problems. Moreover, you will be learning about different sampling techniques to solve the problem of imbalanced data. By the end of this chapter, you will be able to apply different kinds of algorithms and evaluation metrics to solve multiclass classification problems.

Introduction

The online shopping company you worked with in the previous chapter is busy planning a new feature. Currently, whenever customers search for a product on their website or their app, they are shown the desired product along with options to buy the same product from different sellers. For example, if a customer is looking for a washing machine, they'll get options to buy the product from seller A, seller B, seller C, and so on. Now, the company wants to predict which seller a specific user would be more inclined to buy the product from. They'll then make the most preferred seller a part of their "Verified Seller" program, thus showing that seller as the first option to users. This, in turn, will help the company increase the chances that the product will be bought by the user, consequently leading to an increase in the company's profits. The company is first targeting washing machines for this task, and they have shortlisted four suppliers to solve...

Understanding Multiclass Classification

The classification algorithms that you have seen so far were mostly binary classifiers, where the target variable can have only two categorical values or classes. However, there can be scenarios where you have more than two classes to classify samples into. For instance, given data on customer transactions, the marketing team may be tasked with identifying the credit card most suitable for a customer, such as cashback, air miles, gas station, or shopping. In scenarios such as these, where you have more than two classes, a slightly different approach is required compared to binary classification.

Multiclass classification problems can broadly be divided into the following three categories:

  • Multiclass classification: Multiclass classification problems involve classifying instances or samples into one class out of multiple classes (more than two). Each sample is assigned only one label and cannot be assigned more than one label...

Classifiers in Multiclass Classification

Let's consider two problem statements:

  • An online trading company wants to provide additional benefits to its customers. The marketing analytics team has divided the customers into five categories based on when the last time they logged in to the platform was.
  • The same trading company wants to build a recommendation system for mutual funds. This will recommend their users a mutual fund based on the risk they are willing to take, the amount they are planning to invest, and some other features. The number of mutual funds is well above 100.

Before you jump into more detail about the differences between these two problem statements, let's first understand the two common ways of approaching multiclass classification.

Multiclass classification can be implemented by scikit-learn in the following two ways:

One-versus-all (one-versus-rest) classifier: Here, one classifier is fit against one class. For each of the classifiers...

Performance Metrics

The performance metrics in the case of multiclass classification would be the same as what you used for binary classification in the previous chapter, that is, precision, recall, and F1 score, obtained using a confusion matrix.

In the case of a multiclass classification problem, you average out the metrics to find the micro-average or macro-average of precision, recall, and F1 score in a k-class system, where k is the number of classes. Averaging is useful in the case of multiclass classification since you have multiple class labels. This is because each classifier is going to give one class as the prediction; however, in the end, you are just looking for one class. In such cases, an aggregation such as averaging helps in getting the final output.

The macro-average computes the metrics such as precision (PRE), recall (Recall), or F1 score (F1) of each class independently and takes the average (all the classes are treated equally):

Figure 9.4: The macro...

Class-Imbalanced Data

Consider the scenario we discussed at the beginning of the chapter about the online shopping company. Imagine that out of the four shortlisted sellers, one is a very well-known company. In such a situation, there is a high chance of this company getting most of the orders as compared to the rest of the three sellers. If the online shopping company decided to divert all the customers to this seller, for a large number of customers, it would actually end up matching their preference. This is a classic scenario of class imbalance since one class is dominating the rest of the classes in terms of data points. Class imbalance is also seen in fraud detection, anti-money laundering, spam detection, cancer detection, and many other situations.

Before you go into the details about how to deal with class imbalance, let's first see how it can pose a big problem in a marketing analyst's work in the following exercise.

Exercise 9.03: Performing Classification...

Summary

In this chapter, you started off by understanding the importance of multiclass classification problems and the different categories of these problems. You learned about one-versus-one and one-versus-all classifiers and how to implement them using the scikit-learn module in Python. Next, you went through various micro- and macro-averages of performance metrics and used them to understand the impact of class imbalance on the model performance. You also learned about various sampling techniques, especially SMOTE, and implemented them using the imblearn library in Python. At the end of the chapter, you used an imbalanced marketing campaign dataset to perform dataset exploration, data transformation, model training, performance evaluation, and dataset balancing using SMOTE.

This book started with the basics of data science and slowly covered the entire end-to-end data science pipeline for a marketing analyst. While working on a problem statement, depending on the need, you will...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Science for Marketing Analytics - Second Edition
Published in: Sep 2021Publisher: PacktISBN-13: 9781800560475
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Mirza Rahim Baig

Mirza Rahim Baig is a Data Science and Artificial Intelligence leader with over 13 years of experience across e-commerce, healthcare, and marketing. He currently holds the position of leading Product Analytics at Marketing Services for Zalando, Europe's largest online fashion platform. In addition, he serves as a Subject Matter Expert and faculty member for MS level programs at prominent Ed-Tech platforms and institutes in India. He is also the lead author of two books, 'Data Science for Marketing Analytics' and 'The Deep Learning Workshop,' both published by Packt. He is recognized as a thought leader in my field and frequently participates as a guest speaker at various forums.
Read more about Mirza Rahim Baig

author image
Gururajan Govindan

Gururajan Govindan is a data scientist, intrapreneur, and trainer with more than seven years of experience working across domains such as finance and insurance. He is also an author of The Data Analysis Workshop, a book focusing on data analytics. He is well known for his expertise in data-driven decision-making and machine learning with Python.
Read more about Gururajan Govindan

author image
Vishwesh Ravi Shrimali

Vishwesh Ravi Shrimali graduated from BITS Pilani, where he studied mechanical engineering, in 2018. He also completed his Masters in Machine Learning and AI from LJMU in 2021. He has authored - Machine learning for OpenCV (2nd edition), Computer Vision Workshop and Data Science for Marketing Analytics (2nd edition) by Packt. When he is not writing blogs or working on projects, he likes to go on long walks or play his acoustic guitar.
Read more about Vishwesh Ravi Shrimali