Reader small image

You're reading from  Data Science for Marketing Analytics - Second Edition

Product typeBook
Published inSep 2021
Reading LevelIntermediate
PublisherPackt
ISBN-139781800560475
Edition2nd Edition
Languages
Tools
Concepts
Right arrow
Authors (3):
Mirza Rahim Baig
Mirza Rahim Baig
author image
Mirza Rahim Baig

Mirza Rahim Baig is a Data Science and Artificial Intelligence leader with over 13 years of experience across e-commerce, healthcare, and marketing. He currently holds the position of leading Product Analytics at Marketing Services for Zalando, Europe's largest online fashion platform. In addition, he serves as a Subject Matter Expert and faculty member for MS level programs at prominent Ed-Tech platforms and institutes in India. He is also the lead author of two books, 'Data Science for Marketing Analytics' and 'The Deep Learning Workshop,' both published by Packt. He is recognized as a thought leader in my field and frequently participates as a guest speaker at various forums.
Read more about Mirza Rahim Baig

Gururajan Govindan
Gururajan Govindan
author image
Gururajan Govindan

Gururajan Govindan is a data scientist, intrapreneur, and trainer with more than seven years of experience working across domains such as finance and insurance. He is also an author of The Data Analysis Workshop, a book focusing on data analytics. He is well known for his expertise in data-driven decision-making and machine learning with Python.
Read more about Gururajan Govindan

Vishwesh Ravi Shrimali
Vishwesh Ravi Shrimali
author image
Vishwesh Ravi Shrimali

Vishwesh Ravi Shrimali graduated from BITS Pilani, where he studied mechanical engineering, in 2018. He also completed his Masters in Machine Learning and AI from LJMU in 2021. He has authored - Machine learning for OpenCV (2nd edition), Computer Vision Workshop and Data Science for Marketing Analytics (2nd edition) by Packt. When he is not writing blogs or working on projects, he likes to go on long walks or play his acoustic guitar.
Read more about Vishwesh Ravi Shrimali

View More author details
Right arrow

1. Data Preparation and Cleaning

Activity 1.01: Addressing Data Spilling

Solution:

  1. Import the pandas and copy libraries using the following commands:

    import pandas as pd

    import copy

  2. Create a new DataFrame, sales, and use the read_csv function to read the sales.csv file into it:

    sales = pd.read_csv("sales.csv")

    Note

    Make sure you change the path (emboldened) to the CSV file based on its location on your system. If you're running the Jupyter notebook from the same directory where the CSV file is stored, you can run the preceding code without any modification.

  3. Now, examine whether your data is properly loaded by checking the first five rows in the DataFrame. Do this using the head() command:

    sales.head()

    You should get the following output:

    Figure 1.60: First five rows of the DataFrame

  4. Look at the data types of sales using the following command:

    sales.dtypes

    You should get the following output:

    Figure 1.61: Looking at the data type of columns of sales.csv

    You can...

2. Data Exploration and Visualization

Activity 2.01: Analyzing Advertisements

Solution:

Perform the following steps to complete this activity:

  1. Import pandas and seaborn using the following code:

    import pandas as pd

    import seaborn as sns

    import matplotlib.pyplot as plt

    sns.set()

  2. Load the Advertising.csv file into a DataFrame called ads and examine if your data is properly loaded by checking the first few values in the DataFrame by using the head() command:

    ads = pd.read_csv("Advertising.csv", index_col = 'Date')

    ads.head()

    The output should be as follows:

    Figure 2.65: First five rows of the DataFrame ads

  3. Look at the memory usage and other internal information about the DataFrame using the following command:

    ads.info

    This gives the following output:

    Figure 2.66: The result of ads.info()

    From the preceding figure, you can see that you have five columns with 200 data points in each and no missing values.

  4. Use describe() function to view basic statistical details...

3. Unsupervised Learning and Customer Segmentation

Activity 3.01: Bank Customer Segmentation for Loan Campaign

Solution:

  1. Import the necessary libraries for data processing, visualization, and clustering using the following code:

    import numpy as np, pandas as pd

    import matplotlib.pyplot as plt, seaborn as sns

    from sklearn.preprocessing import StandardScaler

    from sklearn.cluster import KMeans

  2. Load the data into a pandas DataFrame and display the top five rows:

    bank0 = pd.read_csv("Bank_Personal_Loan_Modelling-1.csv")

    bank0.head()

    Note

    Make sure you change the path (highlighted) to the CSV file based on its location on your system. If you're running the Jupyter notebook from the same directory where the CSV file is stored, you can run the preceding code without any modification.

    The first five rows get displayed as follows:

    Figure 3.31: First five rows of the dataset

    You can see that you have data about customer demographics such as Age, Experience, Family, and Education...

4. Evaluating and Choosing the Best Segmentation Approach

Activity 4.01: Optimizing a Luxury Clothing Brand's Marketing Campaign Using Clustering

Solution:

  1. Import the libraries required for DataFrame handling and plotting (pandas, numpy, matplotlib). Read in the data from the file 'Clothing_Customers.csv' into a DataFrame and print the top 5 rows to understand it better.

    import numpy as np, pandas as pd

    import matplotlib.pyplot as plt, seaborn as sns

    data0 = pd.read_csv('Clothing_Customers.csv')

    data0.head()

    Note

    Make sure you place the CSV file in the same directory from where you are running the Jupyter Notebook. If not, make sure you change the path (emboldened) to match the one where you have stored the file.

    The result should be the table below:

    Figure 4.24: Top 5 records of the data

    The data contains the customers' income, age, days since their last purchase, and their annual spending. All these will be used to perform segmentation.

  2. Standardize...

5. Predicting Customer Revenue Using Linear Regression

Activity 5.01: Examining the Relationship between Store Location and Revenue

Solution:

  1. Import the pandas, pyplot from matplotlib, and seaborn libraries. Read the data into a DataFrame called df and print the top five records using the following code:

    import pandas as pd

    import matplotlib.pyplot as plt, seaborn as sns

    df = pd.read_csv('location_rev.csv')

    df.head()

    Note

    Make sure you change the path (highlighted) to the CSV file based on its location on your system. If you're running the Jupyter notebook from the same directory where the CSV file is stored, you can run the preceding code without any modification.

    The data should appear as follows:

    Figure 5.35: The first five rows of the location revenue data

    You see that, as described earlier, you have the revenue of the store, its age, along with various fields about the location of the store. From the top five records, you get a sense of the order of the values...

6. More Tools and Techniques for Evaluating Regression Models

Activity 6.01: Finding Important Variables for Predicting Responses to a Marketing Offer

Solution:

Perform the following steps to achieve the aim of this activity:

  1. Import pandas, read in the data from offer_responses.csv, and use the head function to view the first five rows of the data:

    import pandas as pd

    df = pd.read_csv('offer_responses.csv')

    df.head()

    Note

    Make sure you change the path (emboldened) to the CSV file based on its location on your system. If you're running the Jupyter notebook from the same directory where the CSV file is stored, you can run the preceding code without any modifications.

    You should get the following output:

    Figure 6.22: The first five rows of the offer_responses data

  2. Extract the target variable (y) and the predictor variable (X) from the data:

    X = df[['offer_quality',\

            'offer_discount',\

      &...

7. Supervised Learning: Predicting Customer Churn

Activity 7.01: Performing the OSE technique from OSEMN

Solution:

  1. Import the necessary libraries:

    # Removes Warnings

    import warnings

    warnings.filterwarnings('ignore')

    #import the necessary packages

    import pandas as pd

    import numpy as np

    import matplotlib.pyplot as plt

    import seaborn as sns

  2. Download the dataset from https://packt.link/80blQ and save it as Telco_Churn_Data.csv. Make sure to run the notebook from the same folder as the dataset.
  3. Create a DataFrame called data and read the dataset using pandas' read.csv method. Look at the first few rows of the DataFrame:

    data= pd.read_csv(r'Telco_Churn_Data.csv')

    data.head(5)

    Note

    Make sure you change the path (emboldened in the preceding code snippet) to the CSV file based on its location on your system. If you're running the Jupyter notebook from the same directory where the CSV file is stored, you can run the preceding code without any modification.

    The...

8. Fine-Tuning Classification Algorithms

Activity 8.01: Implementing Different Classification Algorithms

Solution:

  1. Import the logistic regression library:

    from sklearn.linear_model import LogisticRegression

  2. Fit the model:

    clf_logistic = LogisticRegression(random_state=0,solver='lbfgs')\

                   .fit(X_train[top7_features], y_train)

    clf_logistic

    The preceding code will give the following output:

    LogisticRegression(random_state=0)

  3. Score the model:

    clf_logistic.score(X_test[top7_features], y_test)

    You will get the following output: 0.7454031117397454.

    This shows that the logistic regression model is getting an accuracy of 74.5%, which is a mediocre accuracy but serves as a good estimate of the minimum accuracy you can expect.

  4. Import the svm library:

    from sklearn import svm

  5. Scale the training and testing data as follows:

    from sklearn.preprocessing import MinMaxScaler

    scaling = MinMaxScaler...

9. Multiclass Classification Algorithms

Activity 9.01: Performing Multiclass Classification and Evaluating Performance

Solution:

  1. Import the required libraries:

    import pandas as pd

    import numpy as np

    from sklearn.ensemble import RandomForestClassifier

    from sklearn.model_selection import train_test_split

    from sklearn.metrics import classification_report,\

                                confusion_matrix,\

                                accuracy_score

    from sklearn import metrics

    from sklearn.metrics import precision_recall_fscore_support

    import matplotlib.pyplot as plt

    import seaborn as sns

  2. Load the marketing data into a DataFrame named data and look at the first five rows of the DataFrame using the following code:

    data...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Science for Marketing Analytics - Second Edition
Published in: Sep 2021Publisher: PacktISBN-13: 9781800560475
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Mirza Rahim Baig

Mirza Rahim Baig is a Data Science and Artificial Intelligence leader with over 13 years of experience across e-commerce, healthcare, and marketing. He currently holds the position of leading Product Analytics at Marketing Services for Zalando, Europe's largest online fashion platform. In addition, he serves as a Subject Matter Expert and faculty member for MS level programs at prominent Ed-Tech platforms and institutes in India. He is also the lead author of two books, 'Data Science for Marketing Analytics' and 'The Deep Learning Workshop,' both published by Packt. He is recognized as a thought leader in my field and frequently participates as a guest speaker at various forums.
Read more about Mirza Rahim Baig

author image
Gururajan Govindan

Gururajan Govindan is a data scientist, intrapreneur, and trainer with more than seven years of experience working across domains such as finance and insurance. He is also an author of The Data Analysis Workshop, a book focusing on data analytics. He is well known for his expertise in data-driven decision-making and machine learning with Python.
Read more about Gururajan Govindan

author image
Vishwesh Ravi Shrimali

Vishwesh Ravi Shrimali graduated from BITS Pilani, where he studied mechanical engineering, in 2018. He also completed his Masters in Machine Learning and AI from LJMU in 2021. He has authored - Machine learning for OpenCV (2nd edition), Computer Vision Workshop and Data Science for Marketing Analytics (2nd edition) by Packt. When he is not writing blogs or working on projects, he likes to go on long walks or play his acoustic guitar.
Read more about Vishwesh Ravi Shrimali