Reader small image

You're reading from  Data Science for Marketing Analytics - Second Edition

Product typeBook
Published inSep 2021
Reading LevelIntermediate
PublisherPackt
ISBN-139781800560475
Edition2nd Edition
Languages
Tools
Concepts
Right arrow
Authors (3):
Mirza Rahim Baig
Mirza Rahim Baig
author image
Mirza Rahim Baig

Mirza Rahim Baig is a Data Science and Artificial Intelligence leader with over 13 years of experience across e-commerce, healthcare, and marketing. He currently holds the position of leading Product Analytics at Marketing Services for Zalando, Europe's largest online fashion platform. In addition, he serves as a Subject Matter Expert and faculty member for MS level programs at prominent Ed-Tech platforms and institutes in India. He is also the lead author of two books, 'Data Science for Marketing Analytics' and 'The Deep Learning Workshop,' both published by Packt. He is recognized as a thought leader in my field and frequently participates as a guest speaker at various forums.
Read more about Mirza Rahim Baig

Gururajan Govindan
Gururajan Govindan
author image
Gururajan Govindan

Gururajan Govindan is a data scientist, intrapreneur, and trainer with more than seven years of experience working across domains such as finance and insurance. He is also an author of The Data Analysis Workshop, a book focusing on data analytics. He is well known for his expertise in data-driven decision-making and machine learning with Python.
Read more about Gururajan Govindan

Vishwesh Ravi Shrimali
Vishwesh Ravi Shrimali
author image
Vishwesh Ravi Shrimali

Vishwesh Ravi Shrimali graduated from BITS Pilani, where he studied mechanical engineering, in 2018. He also completed his Masters in Machine Learning and AI from LJMU in 2021. He has authored - Machine learning for OpenCV (2nd edition), Computer Vision Workshop and Data Science for Marketing Analytics (2nd edition) by Packt. When he is not writing blogs or working on projects, he likes to go on long walks or play his acoustic guitar.
Read more about Vishwesh Ravi Shrimali

View More author details
Right arrow

6. More Tools and Techniques for Evaluating Regression Models

Overview

This chapter explains how to evaluate various regression models using common measures of accuracy. You will learn how to calculate the
Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE), which are common measures of the accuracy of a regression model. Later, you will use Recursive Feature Elimination (RFE) to perform feature selection for linear models. You will use these models together to predict how spending habits in customers change with age and find out which model outperforms the rest. By the end of the chapter, you will learn to compare the accuracy of different tree-based regression models, such as regression trees and random forest regression, and select the regression model that best suits your use case.

Introduction

You are working in a marketing company that takes projects from various clients. Your team has been given a project where you have to predict the percentage of conversions for a Black Friday sale that the team is going to plan. The percentage of conversion as per the client refers to the number of people who actually buy products vis-à-vis the number of people who initially signed up for updates regarding the sale by visiting the website. Your first instinct is to go for a regression model for predicting the percentage conversion. However, you have millions of rows of data with hundreds of columns. In scenarios like these, it's very common to encounter issues of multi-collinearity where two or more features effectively convey the same information. This can then end up affecting the robustness of the model. This is where solutions such as Recursive Feature Selection (RFE) can be of help.

In the previous chapter, you learned how to prepare data for regression...

Evaluating the Accuracy of a Regression Model

To evaluate regression models, you first need to define some metrics. The common metrics used to evaluate regression models rely on the concepts of residuals and errors, which are quantifications of how much a model incorrectly predicts a particular data point. In the following sections, you will first learn about residuals and errors. You will then learn about two evaluation metrics, the MAE and RMSE, and how they are used to evaluate regression models.

Residuals and Errors

An important concept in understanding how to evaluate regression models is the residual. The residual refers to the difference between the value predicted by the model and the true value for a data point. It can be thought of as by how much your model missed a particular value. In the following diagram, we can see a best-fit (or regression) line with data points scattered above and below it. The distance between a data point and the line signifies how far away the...

Using Recursive Feature Selection for Feature Elimination

So far, we have discussed two important evaluation metrics – the MAE and RMSE. We also saw how these metrics can be used with the help of the scikit-learn library and how a change in the values of these metrics can be used as an indicator of a feature's importance. However, if you have a large number of features, removing one feature at a time would become a very tedious job, and this is where RFE comes into the picture. When a dataset contains features (all columns, except the column that we want to predict) that either are not related to the target column or are related to other columns, the performance of the model can be adversely affected if all the features are used for model training. Let's understand the basic reasoning behind this.

For example, consider that you want to predict the number of sales of a product given the cost price of the product, the discount available, the selling price of the...

Tree-Based Regression Models

In the preceding activity, you were able to identify the three most important features that could be used to predict customer spend. Now, imagine doing the same by removing each feature one at a time and finding out the RMSE. RFE aims to remove the redundant task of going over each feature by doing it internally, without forcing the user to put in the effort to do it manually.

So far, we have covered linear regression models. Now it's time to take it up a notch by discussing some tree-based regression models.

Linear models are not the only type of regression models. Another powerful technique is the use of regression trees. Regression trees are based on the idea of a decision tree. A decision tree is a bit like a flowchart, where, at each step, you ask whether a variable is greater than or less than some value. After flowing through several of these steps, you reach the end of the tree and receive an answer for what value the prediction should be...

Summary

In this chapter, we learned how to evaluate regression models. We used residuals to calculate the MAE and RMSE, and then used those metrics to compare models. We also learned about RFE and how it can be used for feature selection. We were able to see the effect of feature elimination on the MAE and RMSE metrics and relate it to the robustness of the model. We used these concepts to verify that the intuitions about the importance of the "number of competitors" feature were wrong in our case study. Finally, we learned about tree-based regression models and looked at how they can fit some of the non-linear relationships that linear regression is unable to handle. We saw how random forest models were able to perform better than regression tree models and the effect of increasing the maximum tree depth on model performance. We used these concepts to model the spending behavior of people with respect to their age.

In the next chapter, we will learn about classification...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Science for Marketing Analytics - Second Edition
Published in: Sep 2021Publisher: PacktISBN-13: 9781800560475
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Mirza Rahim Baig

Mirza Rahim Baig is a Data Science and Artificial Intelligence leader with over 13 years of experience across e-commerce, healthcare, and marketing. He currently holds the position of leading Product Analytics at Marketing Services for Zalando, Europe's largest online fashion platform. In addition, he serves as a Subject Matter Expert and faculty member for MS level programs at prominent Ed-Tech platforms and institutes in India. He is also the lead author of two books, 'Data Science for Marketing Analytics' and 'The Deep Learning Workshop,' both published by Packt. He is recognized as a thought leader in my field and frequently participates as a guest speaker at various forums.
Read more about Mirza Rahim Baig

author image
Gururajan Govindan

Gururajan Govindan is a data scientist, intrapreneur, and trainer with more than seven years of experience working across domains such as finance and insurance. He is also an author of The Data Analysis Workshop, a book focusing on data analytics. He is well known for his expertise in data-driven decision-making and machine learning with Python.
Read more about Gururajan Govindan

author image
Vishwesh Ravi Shrimali

Vishwesh Ravi Shrimali graduated from BITS Pilani, where he studied mechanical engineering, in 2018. He also completed his Masters in Machine Learning and AI from LJMU in 2021. He has authored - Machine learning for OpenCV (2nd edition), Computer Vision Workshop and Data Science for Marketing Analytics (2nd edition) by Packt. When he is not writing blogs or working on projects, he likes to go on long walks or play his acoustic guitar.
Read more about Vishwesh Ravi Shrimali