You're reading from Data Science for Marketing Analytics - Second Edition

Product type Book

Published in Sep 2021

Publisher Packt

ISBN-13 9781800560475

Pages 636 pages

Edition 2nd Edition

Languages

Python

Concepts

Data Science

Authors (3):

Mirza Rahim Baig

Gururajan Govindan

Vishwesh Ravi Shrimali

View More author details

Table of Contents (11) Chapters

Preface

1. Data Preparation and Cleaning

2. Data Exploration and Visualization

3. Unsupervised Learning and Customer Segmentation

4. Evaluating and Choosing the Best Segmentation Approach

5. Predicting Customer Revenue Using Linear Regression

6. More Tools and Techniques for Evaluating Regression Models

7. Supervised Learning: Predicting Customer Churn

8. Fine-Tuning Classification Algorithms

9. Multiclass Classification Algorithms

Appendix

1. Data Preparation and Cleaning

Activity 1.01: Addressing Data Spilling

Solution:

Import the pandas and copy libraries using the following commands:
import pandas as pd
import copy
Create a new DataFrame, sales, and use the read_csv function to read the sales.csv file into it:
sales = pd.read_csv("sales.csv")
Note
Make sure you change the path (emboldened) to the CSV file based on its location on your system. If you're running the Jupyter notebook from the same directory where the CSV file is stored, you can run the preceding code without any modification.
Now, examine whether your data is properly loaded by checking the first five rows in the DataFrame. Do this using the head() command:
sales.head()
You should get the following output:
Figure 1.60: First five rows of the DataFrame
Look at the data types of sales using the following command:
sales.dtypes
You should get the following output:
Figure 1.61: Looking at the data type of columns of sales.csv
You can...

2. Data Exploration and Visualization

Activity 2.01: Analyzing Advertisements

Solution:

Perform the following steps to complete this activity:

Import pandas and seaborn using the following code:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
Load the Advertising.csv file into a DataFrame called ads and examine if your data is properly loaded by checking the first few values in the DataFrame by using the head() command:
ads = pd.read_csv("Advertising.csv", index_col = 'Date')
ads.head()
The output should be as follows:
Figure 2.65: First five rows of the DataFrame ads
Look at the memory usage and other internal information about the DataFrame using the following command:
ads.info
This gives the following output:
Figure 2.66: The result of ads.info()
From the preceding figure, you can see that you have five columns with 200 data points in each and no missing values.
Use describe() function to view basic statistical details...

3. Unsupervised Learning and Customer Segmentation

Activity 3.01: Bank Customer Segmentation for Loan Campaign

Solution:

Import the necessary libraries for data processing, visualization, and clustering using the following code:
import numpy as np, pandas as pd
import matplotlib.pyplot as plt, seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
Load the data into a pandas DataFrame and display the top five rows:
bank0 = pd.read_csv("Bank_Personal_Loan_Modelling-1.csv")
bank0.head()
Note
Make sure you change the path (highlighted) to the CSV file based on its location on your system. If you're running the Jupyter notebook from the same directory where the CSV file is stored, you can run the preceding code without any modification.
The first five rows get displayed as follows:
Figure 3.31: First five rows of the dataset
You can see that you have data about customer demographics such as Age, Experience, Family, and Education...

4. Evaluating and Choosing the Best Segmentation Approach

Activity 4.01: Optimizing a Luxury Clothing Brand's Marketing Campaign Using Clustering

Solution:

Import the libraries required for DataFrame handling and plotting (pandas, numpy, matplotlib). Read in the data from the file 'Clothing_Customers.csv' into a DataFrame and print the top 5 rows to understand it better.
import numpy as np, pandas as pd
import matplotlib.pyplot as plt, seaborn as sns
data0 = pd.read_csv('Clothing_Customers.csv')
data0.head()
Note
Make sure you place the CSV file in the same directory from where you are running the Jupyter Notebook. If not, make sure you change the path (emboldened) to match the one where you have stored the file.
The result should be the table below:
Figure 4.24: Top 5 records of the data
The data contains the customers' income, age, days since their last purchase, and their annual spending. All these will be used to perform segmentation.
Standardize...

5. Predicting Customer Revenue Using Linear Regression

Activity 5.01: Examining the Relationship between Store Location and Revenue

Solution:

Import the pandas, pyplot from matplotlib, and seaborn libraries. Read the data into a DataFrame called df and print the top five records using the following code:
import pandas as pd
import matplotlib.pyplot as plt, seaborn as sns
df = pd.read_csv('location_rev.csv')
df.head()
Note
Make sure you change the path (highlighted) to the CSV file based on its location on your system. If you're running the Jupyter notebook from the same directory where the CSV file is stored, you can run the preceding code without any modification.
The data should appear as follows:
Figure 5.35: The first five rows of the location revenue data
You see that, as described earlier, you have the revenue of the store, its age, along with various fields about the location of the store. From the top five records, you get a sense of the order of the values...

6. More Tools and Techniques for Evaluating Regression Models

Activity 6.01: Finding Important Variables for Predicting Responses to a Marketing Offer

Solution:

Perform the following steps to achieve the aim of this activity:

Import pandas, read in the data from offer_responses.csv, and use the head function to view the first five rows of the data:
import pandas as pd
df = pd.read_csv('offer_responses.csv')
df.head()
Note
Make sure you change the path (emboldened) to the CSV file based on its location on your system. If you're running the Jupyter notebook from the same directory where the CSV file is stored, you can run the preceding code without any modifications.
You should get the following output:
Figure 6.22: The first five rows of the offer_responses data
Extract the target variable (y) and the predictor variable (X) from the data:
X = df[['offer_quality',\
'offer_discount',\
&...

7. Supervised Learning: Predicting Customer Churn

Activity 7.01: Performing the OSE technique from OSEMN

Solution:

Import the necessary libraries:
# Removes Warnings
import warnings
warnings.filterwarnings('ignore')
#import the necessary packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
Download the dataset from https://packt.link/80blQ and save it as Telco_Churn_Data.csv. Make sure to run the notebook from the same folder as the dataset.
Create a DataFrame called data and read the dataset using pandas' read.csv method. Look at the first few rows of the DataFrame:
data= pd.read_csv(r'Telco_Churn_Data.csv')
data.head(5)
Note
Make sure you change the path (emboldened in the preceding code snippet) to the CSV file based on its location on your system. If you're running the Jupyter notebook from the same directory where the CSV file is stored, you can run the preceding code without any modification.
The...

8. Fine-Tuning Classification Algorithms

Activity 8.01: Implementing Different Classification Algorithms

Solution:

Import the logistic regression library:
from sklearn.linear_model import LogisticRegression
Fit the model:
clf_logistic = LogisticRegression(random_state=0,solver='lbfgs')\
.fit(X_train[top7_features], y_train)
clf_logistic
The preceding code will give the following output:
LogisticRegression(random_state=0)
Score the model:
clf_logistic.score(X_test[top7_features], y_test)
You will get the following output: 0.7454031117397454.
This shows that the logistic regression model is getting an accuracy of 74.5%, which is a mediocre accuracy but serves as a good estimate of the minimum accuracy you can expect.
Import the svm library:
from sklearn import svm
Scale the training and testing data as follows:
from sklearn.preprocessing import MinMaxScaler
scaling = MinMaxScaler...

9. Multiclass Classification Algorithms

Activity 9.01: Performing Multiclass Classification and Evaluating Performance

Solution:

Import the required libraries:
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report,\
confusion_matrix,\
accuracy_score
from sklearn import metrics
from sklearn.metrics import precision_recall_fscore_support
import matplotlib.pyplot as plt
import seaborn as sns
Load the marketing data into a DataFrame named data and look at the first five rows of the DataFrame using the following code:
data...