You're reading from The Data Analysis Workshop

Product typeBook

Published inJul 2020

Reading LevelIntermediate

PublisherPackt

ISBN-139781839211386

Edition1st Edition

Languages

Python

Tools

Jupyter

Concepts

Data Science

Authors (3):

Gururajan Govindan

Shubhangi Hora

Konstantin Palagachev

View More author details

2. Absenteeism at Work

Activity 2.01: Analyzing the Service Time and Son Columns

First, let's import the data and the necessary libraries:

# perform statistical test for avg duration difference
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# import data from the github page of the book
data = pd.read_csv('https://raw.githubusercontent.com/'\
                   'PacktWorkshops/'\
                   'The-Data-Analysis-Workshop/master/'\
                   'Chapter02/data/Absenteeism_at_work.csv', \
                  &...

3. Analyzing Bank Marketing Campaign Data

Activity 3.01: Creating a Leaner Logistic Regression Model

Start by importing the necessary Python packages:

# import necessary libraries
import pandas as pd
import numpy as np
import statsmodels.api as sm

Load the data from GitHub:

# pull data from github
bank_data = pd.read_csv("https://raw.githubusercontent.com/"\
                        "PacktWorkshops/"\
                        "The-Data-Analysis-Workshop/master/"\
                        "Chapter03/data/bank-additional/"\
          ...

4. Tackling Company Bankruptcy

Activity 4.01: Feature Selection with Lasso

Import Lasso from the sklearn.linear_model package:

from sklearn.linear_model import Lasso
from sklearn.feature_selection import SelectFromModel

Fit the independent and dependent variables with lasso regularization for the mean_imputed_df4 DataFrame:
```
features_names=X6.columns.tolist()
lasso = Lasso(alpha=0.01 ,positive=True)
lasso.fit(X6,y6)
```

Print the coefficients of lasso regularization:

coef_list=sorted(zip(map(lambda x: round(x,4), \
                         lasso.coef_.reshape(-1)),\
                         features_names), reverse=True)
coef_list [0:5]

The output will be as follows:

[(0.0009, 'X21'), (0.0002, 'X2'), (0.0001, ...

5. Analyzing the Online Shopper's Purchasing Intention

Activity 5.01: Performing K-means Clustering for Administrative Duration versus Bounce Rate and Administrative Duration versus Exit Rate

Select the Administrative Duration and Bounce Rate columns. Assign the column to a variable called x:
```
x = df.iloc[:, [1, 6]].values
x.shape
```

Initialize the k-means algorithm:

wcss = []
for i in range(1, 11):
    km = KMeans(n_clusters = i, init = 'k-means++', \
                max_iter = 300, n_init = 10, random_state = 0, \
                algorithm = 'elkan', tol = 0.001)

For the different values of K, compute the Kmeans inertia and store it in a variable called wcss:
```
    km.fit(x)
    labels = km.labels_
    wcss.append...
```

6. Analysis of Credit Card Defaulters

Activity 6.01: Evaluating the Correlation between Columns Using a Heatmap

Plot the heatmap for all the columns in the DataFrame (other than the ID column) by using sns.heatmap and keep the figure size as 30,10 for better visibility:
```
sns.set(rc={'figure.figsize':(30,10)})
sns.set_context("talk", font_scale=0.7)
```
Use Spearman as the method parameter to compute Spearman's rank correlation coefficient:
```
sns.heatmap(df.iloc[:,1:].corr(method='spearman'), \
            cmap='rainbow_r', annot=True)
```
The output of the heatmap is as follows:
Figure 6.28: Heatmap for Spearman's rank correlation
In order to get the exact correlation coefficients of each column with the DEFAULT column, apply the .corr() function on each column with respect to the DEFAULT column:
```
df.drop("DEFAULT", axis=1)\
.apply(lambda x: x.corr(df.DEFAULT...
```

7. Analyzing the Heart Disease Dataset

Activity 7.01: Checking for Outliers

Plot a box plot using sns.boxplot for the st_depr column:
```
sd = sns.boxplot(df['st_depr'])
plt.show()
```
The output will be as follows:
Figure 7.22: Box plot for st_depr
Plot a box plot using sns.boxplot for the colored_vessels column:
```
cv = sns.boxplot(df['colored_vessels'])
plt.show()
```
The output will be as follows:
Figure 7.23: Boxplot for colored_vessels
Plot a box plot using sns.boxplot for the thalassemia column:
```
t = sns.boxplot(df['thalassemia'])
plt.show()
```
The output will be as follows:

Figure 7.24: Boxplot for thalassemia

Note

To access the source code for this specific section, please refer to https://packt.live/2N4I0DF.

You can also run this example online at https://packt.live/2BiGv2c. You must execute the entire Notebook in order to get the desired result.

Activity 7.02: Plotting Distributions and Relationships between Columns with Respect to the...

8. Analyzing Online Retail II Dataset

Activity 8.01: Performing Data Analysis on the Online Retail II Dataset

Import the required packages:
```
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
```
In a Jupyter notebook, install plotly using the following command:
```
!pip install plotly
```
Import plotly.express from the installed package:
```
import plotly.express as px
```

Store each of the CSV files in two different DataFrames:

r09 = pd.read_csv('https://raw.githubusercontent.com/'\
                  'PacktWorkshops/'\
                  'The-Data-Analysis-Workshop/master/'\
                  'Chapter08/Datasets/online_retail_II.csv')
r09.head()

The output...

9. Analysis of the Energy Consumed by Appliances

Activity 9.01: Analyzing the Appliances Energy Consumption

Using seaborn, plot a boxplot for the a_energy column:
```
app_box = sns.boxplot(new_data.a_energy)
```
The output will be as follows:
Figure 9.28: Box plot of a_energy
Use .sum() to determine the total number of instances wherein the value of the energy consumed by appliances is above 200 Wh:
```
out = (new_data['a_energy'] > 200).sum()
out
```
The output will be as follows:
```
1916
```
Calculate the percentage of the number of instances wherein the value of the energy consumed by appliances is above 200 Wh:
```
(out/19735) * 100
```
The output will be as follows:
```
9.708639473017481
```
Use .sum() to check the total number of instances wherein the value of the energy consumed by appliances is above 950 Wh:
```
out_e = (new_data['a_energy'] > 950).sum()
out_e
```
The output will be as follows:
```
2
```
Calculate the percentage of the number of instances wherein the value of the energy consumed...

10. Analyzing Air Quality

Activity 10.01: Checking for Outliers

Plot a boxplot for the PM25 feature using seaborn:
```
pm_25 = sns.boxplot(air['PM25'])
```
The output will be as follows:
Figure 10.50: Boxplot for PM25
Check how many instances contain values of PM25 higher than 250:
```
(air['PM25'] >= 250).sum()
```
The output will be as follows:
```
18668
```
Store all the instances from Step 2 in a DataFrame called pm25 and print the first five rows:
```
pm25 = air.loc[air['PM25'] >= 250]
pm25.head()
```
The output will be as follows:
Figure 10.51: First five rows of pm25
Print the station names of the instances in PM25 to ensure all the instances are not just from one station, but from multiple stations. This reduces the chances of them being incorrectly stored values:
```
pm25.station.unique()
```
The output will be as follows:
```
array(['Aotizhongxin', 'Changping', 'Dingling', 'Dongsi', 
       &apos...
```

The rest of the chapter is locked

You have been reading a chapter from

The Data Analysis Workshop

Published in: Jul 2020Publisher: PacktISBN-13: 9781839211386

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Gururajan Govindan

Gururajan Govindan is a data scientist, intrapreneur, and trainer with more than seven years of experience working across domains such as finance and insurance. He is also an author of The Data Analysis Workshop, a book focusing on data analytics. He is well known for his expertise in data-driven decision-making and machine learning with Python.
Read more about Gururajan Govindan

Shubhangi Hora

Shubhangi Hora is a data scientist, Python developer, and published writer. With a background in computer science and psychology, she is particularly passionate about healthcare-related AI, including mental health. Shubhangi is also a trained musician.
Read more about Shubhangi Hora

Konstantin Palagachev

Konstantin Palagachev holds a Ph.D. in applied mathematics and optimization, with an interest in operations research and data analysis. He is recognized for his passion for delivering data-driven solutions and expertise in the area of urban mobility, autonomous driving, insurance, and finance. He is also a devoted coach and mentor, dedicated to sharing his knowledge and passion for data science.
Read more about Konstantin Palagachev

Other recommended products

Related to this chapter

Python Feature Engineering Cookbook

Feature engineering is invaluable for developing and enriching your machine learning models. In this book, you will work with the best Python tools to streamline your feature engineering pipelines, feature engineering techniques and simplify and improve the quality of your code.

BookJan 2020372 pages

Hands-On Gradient Boosting with XGBoost and scikit-learn

This practical XGBoost guide will put your Python and scikit-learn knowledge to work by showing you how to build powerful, fine-tuned XGBoost models with impressive speed and accuracy. This book will help you to apply XGBoost’s alternative base learners, use unique transformers for model deployment, discover tips from Kaggle masters, and much more!

BookOct 2020310 pages

Data Science for Marketing Analytics

Data Science for Marketing Analytics opens doors to looking at data with a different approach and new tools. Drawing on machine learning and data science concepts, this book broadens the range of tools that you can use to transform the market analysis process.

BookMar 2019420 pages

Forecasting Time Series Data with Facebook Prophet

This book will help you get to grips with the task of time series forecasting using the leading open source forecasting tool available to the public, Facebook Prophet. You will learn how to implement the advanced features of Prophet to build forecast models and understand why and how to modify each of the default parameters to improve results.

BookMar 2021270 pages

The Supervised Learning Workshop

Taking an engaging and practical approach, The Supervised Learning Workshop teaches you how to predict the output of new data, based on the relationship and behavior of?existing datasets. You’ll learn at your own pace and use Python libraries and Jupyter to build intelligent predictive models.?

BookFeb 2020532 pages

Big Data Analysis with Python

Processing big data in real time is challenging due to scalability, information inconsistency, and fault tolerance. Big Data Analysis with Python teaches you how to use tools that can control the data avalanche for you. With this book, you'll learn effective techniques to aggregate data into useful dimensions for posterior analysis, extract statistical measurements, and transform datasets into features for other systems.

BookApr 2019276 pages

Hands-On Exploratory Data Analysis with Python

This book provides practical knowledge about the main pillars of EDA including data cleaning, data preparation, data exploration, and data visualization. You can leverage the power of Python to understand, summarize and investigate your data in the best way possible. The book presents a unique approach to exploring hidden features in your data.

BookMar 2020352 pages

Data Science for Marketing Analytics

This book on marketing analytics with Python will quickly get you up and running using practical data science and machine learning to improve your approach to marketing. You'll learn how to analyze sales, understand customer data, predict outcomes, and present conclusions with clear visualizations.

BookSep 2021636 pages

Python for Finance Cookbook

Python is becoming the number one language for data science and also quantitative finance. This book provides you with solutions to common tasks from the intersection of quantitative finance and data science, using modern Python libraries.

BookJan 2020432 pages

The Data Science Workshop

The Data Science Workshop equips you with the basic skills you need to start working on a variety of data science projects. You’ll work through the essential building blocks of a data science project gradually through the book, and then put all the pieces together to consolidate your knowledge and apply your learnings in the real world.

BookAug 2020824 pages5

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from The Data Analysis Workshop

1. Bike Sharing Analysis

Activity 1.01: Investigating the Impact of Weather Conditions on Rides

2. Absenteeism at Work

Activity 2.01: Analyzing the Service Time and Son Columns

3. Analyzing Bank Marketing Campaign Data

Activity 3.01: Creating a Leaner Logistic Regression Model

4. Tackling Company Bankruptcy

Activity 4.01: Feature Selection with Lasso

5. Analyzing the Online Shopper's Purchasing Intention

Activity 5.01: Performing K-means Clustering for Administrative Duration versus Bounce Rate and Administrative Duration versus Exit Rate

6. Analysis of Credit Card Defaulters

Activity 6.01: Evaluating the Correlation between Columns Using a Heatmap

7. Analyzing the Heart Disease Dataset

Activity 7.01: Checking for Outliers

Activity 7.02: Plotting Distributions and Relationships between Columns with Respect to the...

8. Analyzing Online Retail II Dataset

Activity 8.01: Performing Data Analysis on the Online Retail II Dataset

9. Analysis of the Energy Consumed by Appliances

Activity 9.01: Analyzing the Appliances Energy Consumption

10. Analyzing Air Quality

Activity 10.01: Checking for Outliers

Unlock this book and the full library FREE for 7 days

Authors (3)

Python Feature Engineering Cookbook

Feature engineering is invaluable for developing and enriching your machine learning models. In this book, you will work with the best Python tools to streamline your feature engineering pipelines, feature engineering techniques and simplify and improve the quality of your code.

Hands-On Gradient Boosting with XGBoost and scikit-learn

Data Science for Marketing Analytics

Data Science for Marketing Analytics opens doors to looking at data with a different approach and new tools. Drawing on machine learning and data science concepts, this book broadens the range of tools that you can use to transform the market analysis process.

Forecasting Time Series Data with Facebook Prophet

The Supervised Learning Workshop

Taking an engaging and practical approach, The Supervised Learning Workshop teaches you how to predict the output of new data, based on the relationship and behavior of?existing datasets. You’ll learn at your own pace and use Python libraries and Jupyter to build intelligent predictive models.?

Big Data Analysis with Python

Hands-On Exploratory Data Analysis with Python

Data Science for Marketing Analytics

This book on marketing analytics with Python will quickly get you up and running using practical data science and machine learning to improve your approach to marketing. You'll learn how to analyze sales, understand customer data, predict outcomes, and present conclusions with clear visualizations.

Python for Finance Cookbook

Python is becoming the number one language for data science and also quantitative finance. This book provides you with solutions to common tasks from the intersection of quantitative finance and data science, using modern Python libraries.

The Data Science Workshop

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook