Reader small image

You're reading from  The Data Analysis Workshop

Product typeBook
Published inJul 2020
Reading LevelIntermediate
PublisherPackt
ISBN-139781839211386
Edition1st Edition
Languages
Tools
Concepts
Right arrow
Authors (3):
Gururajan Govindan
Gururajan Govindan
author image
Gururajan Govindan

Gururajan Govindan is a data scientist, intrapreneur, and trainer with more than seven years of experience working across domains such as finance and insurance. He is also an author of The Data Analysis Workshop, a book focusing on data analytics. He is well known for his expertise in data-driven decision-making and machine learning with Python.
Read more about Gururajan Govindan

Shubhangi Hora
Shubhangi Hora
author image
Shubhangi Hora

Shubhangi Hora is a data scientist, Python developer, and published writer. With a background in computer science and psychology, she is particularly passionate about healthcare-related AI, including mental health. Shubhangi is also a trained musician.
Read more about Shubhangi Hora

Konstantin Palagachev
Konstantin Palagachev
author image
Konstantin Palagachev

Konstantin Palagachev holds a Ph.D. in applied mathematics and optimization, with an interest in operations research and data analysis. He is recognized for his passion for delivering data-driven solutions and expertise in the area of urban mobility, autonomous driving, insurance, and finance. He is also a devoted coach and mentor, dedicated to sharing his knowledge and passion for data science.
Read more about Konstantin Palagachev

View More author details
Right arrow

10. Analyzing Air Quality

Activity 10.01: Checking for Outliers

  1. Plot a boxplot for the PM25 feature using seaborn:
    pm_25 = sns.boxplot(air['PM25'])

    The output will be as follows:

    Figure 10.50: Boxplot for PM25

    Figure 10.50: Boxplot for PM25

  2. Check how many instances contain values of PM25 higher than 250:
    (air['PM25'] >= 250).sum()

    The output will be as follows:

    18668
  3. Store all the instances from Step 2 in a DataFrame called pm25 and print the first five rows:
    pm25 = air.loc[air['PM25'] >= 250]
    pm25.head()

    The output will be as follows:

    Figure 10.51: First five rows of pm25

    Figure 10.51: First five rows of pm25

  4. Print the station names of the instances in PM25 to ensure all the instances are not just from one station, but from multiple stations. This reduces the chances of them being incorrectly stored values:
    pm25.station.unique()

    The output will be as follows:

    array(['Aotizhongxin', 'Changping', 'Dingling', 'Dongsi', 
           &apos...
lock icon
The rest of the page is locked
Previous PageNext Chapter
You have been reading a chapter from
The Data Analysis Workshop
Published in: Jul 2020Publisher: PacktISBN-13: 9781839211386

Authors (3)

author image
Gururajan Govindan

Gururajan Govindan is a data scientist, intrapreneur, and trainer with more than seven years of experience working across domains such as finance and insurance. He is also an author of The Data Analysis Workshop, a book focusing on data analytics. He is well known for his expertise in data-driven decision-making and machine learning with Python.
Read more about Gururajan Govindan

author image
Shubhangi Hora

Shubhangi Hora is a data scientist, Python developer, and published writer. With a background in computer science and psychology, she is particularly passionate about healthcare-related AI, including mental health. Shubhangi is also a trained musician.
Read more about Shubhangi Hora

author image
Konstantin Palagachev

Konstantin Palagachev holds a Ph.D. in applied mathematics and optimization, with an interest in operations research and data analysis. He is recognized for his passion for delivering data-driven solutions and expertise in the area of urban mobility, autonomous driving, insurance, and finance. He is also a devoted coach and mentor, dedicated to sharing his knowledge and passion for data science.
Read more about Konstantin Palagachev