Reader small image

You're reading from  Data Science for Marketing Analytics - Second Edition

Product typeBook
Published inSep 2021
Reading LevelIntermediate
PublisherPackt
ISBN-139781800560475
Edition2nd Edition
Languages
Tools
Concepts
Right arrow
Authors (3):
Mirza Rahim Baig
Mirza Rahim Baig
author image
Mirza Rahim Baig

Mirza Rahim Baig is a Data Science and Artificial Intelligence leader with over 13 years of experience across e-commerce, healthcare, and marketing. He currently holds the position of leading Product Analytics at Marketing Services for Zalando, Europe's largest online fashion platform. In addition, he serves as a Subject Matter Expert and faculty member for MS level programs at prominent Ed-Tech platforms and institutes in India. He is also the lead author of two books, 'Data Science for Marketing Analytics' and 'The Deep Learning Workshop,' both published by Packt. He is recognized as a thought leader in my field and frequently participates as a guest speaker at various forums.
Read more about Mirza Rahim Baig

Gururajan Govindan
Gururajan Govindan
author image
Gururajan Govindan

Gururajan Govindan is a data scientist, intrapreneur, and trainer with more than seven years of experience working across domains such as finance and insurance. He is also an author of The Data Analysis Workshop, a book focusing on data analytics. He is well known for his expertise in data-driven decision-making and machine learning with Python.
Read more about Gururajan Govindan

Vishwesh Ravi Shrimali
Vishwesh Ravi Shrimali
author image
Vishwesh Ravi Shrimali

Vishwesh Ravi Shrimali graduated from BITS Pilani, where he studied mechanical engineering, in 2018. He also completed his Masters in Machine Learning and AI from LJMU in 2021. He has authored - Machine learning for OpenCV (2nd edition), Computer Vision Workshop and Data Science for Marketing Analytics (2nd edition) by Packt. When he is not writing blogs or working on projects, he likes to go on long walks or play his acoustic guitar.
Read more about Vishwesh Ravi Shrimali

View More author details
Right arrow

4. Evaluating and Choosing the Best Segmentation Approach

Overview

In this chapter, you will continue your journey with customer segmentation. You will improve your approach to customer segmentation by learning and implementing newer techniques for clustering and cluster evaluation. You will learn a principled way of choosing the optimal number of clusters so that you can keep the customer segments statistically robust and actionable for businesses. You will apply evaluation approaches to multiple business problems. You will also learn to apply some other popular approaches to clustering such as mean-shift, k-modes, and k-prototypes. Adding these to your arsenal of segmentation techniques will further sharpen your skills as a data scientist in marketing and help you come up with solutions that will create a big business impact.

Introduction

A large e-commerce company is gearing up for its biggest event for the year – its annual sale. The company is ambitious in its goals and aims to achieve the best sales figures so far, hoping for significant growth over last year's event. The marketing budget is the highest it has ever been. Naturally, marketing campaigns will be a critical factor in deciding the success of the event. From what we have learned so far, we know that for those campaigns to be most effective, an understanding of the customers and choosing the right messaging for them is critical.

In such a situation, well-performed customer segmentation can make all the difference and help maximize the ROI (Return on Investment) of marketing spend. By analyzing customer segments, the marketing team can carefully define strategies for each segment. But before investing precious resources into a customer segmentation project, data science teams, as well as business teams, need to answer a few key...

Choosing the Number of Clusters

While performing segmentation in the previous chapter, we specified the number of clusters to the k-means algorithm. In practice, though, we don't typically know the number of clusters to expect in the data. While an analyst or business team may have some intuition that may be very different from the 'natural' clusters that are available in the data. For instance, a business may have an intuition that there are generally three types of customers. But an analysis of the data may point to five distinct groups of customers. Recall that the features that we choose and the scale of those features also play an important role in defining 'similarity' between customers.

There is, hence, a need to understand the different ways we can choose the 'right' number of clusters. In this chapter, we will discuss three approaches. First, we will learn about simple visual inspection, which has the advantages of being easy and intuitive...

More Clustering Techniques

If you completed the preceding activity, you must have realized that you had to use a more robust approach to determine the number of clusters. You dealt with high dimensional data for clustering and therefore the visual analysis of the clusters necessitated the use of PCA. The visual assessment approach and the elbow method from the inertia plot however did not agree very well. This difference can be explained by understanding that visualization using PCA loses a lot of information and therefore provides an incomplete picture. Realizing that, you used the learning from the elbow method as well as your business perspective to arrive at an optimal number of clusters.

Such a comprehensive approach that incorporates business constraints helps the data scientist create actionable and therefore valuable customer segments. With these techniques learned and this understanding created, let us look at more techniques for clustering that will make the data scientist...

Evaluating Clustering

We have seen various ways of performing clustering so far, each approach having its merits. For the same task, we saw that the approaches provided varying results. Which of them is better? Before we answer that, we need to be able to evaluate how good the results from clustering are. Only then can we compare across segmentation approaches. We need to have, therefore, ways to evaluate the quality of clustering.

Another motivation for cluster evaluation methods is the reiteration that clustering is a part of a bigger segmentation exercise, of which clustering is a key part, but far from the whole. Recall from the discussion in the previous chapter that in segmentation exercises, business is often the end consumer of the segments and acts on them. The segments, therefore, need to make sense to the business as well and be actionable. That is why we need to be able to evaluate clusters from a business perspective as well. We have discussed this aspect in the previous...

Summary

Machine learning-based clustering techniques are great in that they help speed up the segmentation process and can find patterns in data that can escape highly proficient analysts. Multiple techniques for clustering have been developed over the decades, each having its merits and drawbacks. As a data science practitioner in marketing, understanding different techniques will make you far more effective in your practice. However, faced with multiple options in techniques and hyper-parameters, it's important to be able to compare the results from the techniques objectively. This, in turn, requires you to quantify the quality of clusters resulting from a clustering process.

In this chapter, you learned various methods for choosing the number of clusters, including judgment-based methods such as visual inspection of cluster overlap and elbow determination using the sum of squared errors/ inertia, and objective methods such as evaluating the silhouette score. Each of these...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Science for Marketing Analytics - Second Edition
Published in: Sep 2021Publisher: PacktISBN-13: 9781800560475
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Mirza Rahim Baig

Mirza Rahim Baig is a Data Science and Artificial Intelligence leader with over 13 years of experience across e-commerce, healthcare, and marketing. He currently holds the position of leading Product Analytics at Marketing Services for Zalando, Europe's largest online fashion platform. In addition, he serves as a Subject Matter Expert and faculty member for MS level programs at prominent Ed-Tech platforms and institutes in India. He is also the lead author of two books, 'Data Science for Marketing Analytics' and 'The Deep Learning Workshop,' both published by Packt. He is recognized as a thought leader in my field and frequently participates as a guest speaker at various forums.
Read more about Mirza Rahim Baig

author image
Gururajan Govindan

Gururajan Govindan is a data scientist, intrapreneur, and trainer with more than seven years of experience working across domains such as finance and insurance. He is also an author of The Data Analysis Workshop, a book focusing on data analytics. He is well known for his expertise in data-driven decision-making and machine learning with Python.
Read more about Gururajan Govindan

author image
Vishwesh Ravi Shrimali

Vishwesh Ravi Shrimali graduated from BITS Pilani, where he studied mechanical engineering, in 2018. He also completed his Masters in Machine Learning and AI from LJMU in 2021. He has authored - Machine learning for OpenCV (2nd edition), Computer Vision Workshop and Data Science for Marketing Analytics (2nd edition) by Packt. When he is not writing blogs or working on projects, he likes to go on long walks or play his acoustic guitar.
Read more about Vishwesh Ravi Shrimali