You're reading from The Pandas Workshop

Product typeBook

Published inJun 2022

Reading LevelBeginner

PublisherPackt

ISBN-139781800208933

Edition1st Edition

Languages

Python

Tools

NumPy Pandas

Concepts

Data Science

Authors (4):

Blaine Bateman

Saikat Basak

Thomas V. Joseph

William So

View More author details

Chapter 6: Data Selection – Series

In this chapter, you'll use most of the methods you've learned about for DataFrames to select data from a pandas Series.

By the end of this chapter, you will have a complete understanding of the Series Index, know how to apply the dot, bracket, and extended indexing methods, and how to use .loc[] and .iloc[] to select data from a Series.

In this chapter, we will cover the following topics:

Introduction to pandas Series
The Series index
Data selection in pandas Series
Preparing Series from DataFrames and vice versa
Activity 6.01 – Series data selection
Understanding the differences between base Python and pandas data selection
Activity 6.02 – DataFrame data selection

Introduction to pandas Series

In Chapter 5, Data Selection – DataFrames, we introduced several ways you can select data from pandas DataFrames. While a pandas Series can be thought of as a single column of a pandas DataFrame, it is a separate data structure. In this chapter, we are going to learn how to select data from a Series in detail. The key methods, such as .loc[] and .iloc[], will still apply to a one-dimensional Series, as well as some of the more advanced methods such as Boolean indexing and extended indexing. Now that you have mastered the methods you can apply to DataFrames, learning about Series will be very similar and intuitive. Toward the end of this chapter, we will spend some time understanding the differences between pandas and base Python regarding selecting data. This will reinforce some of the ideas and methods you have learned about. Conceptually, the same ideas we used to select elements from a DataFrame can be used to select elements from a Series....

The Series index

Let's say we have some monthly income data from a YouTube channel. We create a Series with some values (monthly earnings in USD) in a list, and an index of month abbreviations, also in a list, using a constructor similar to what we've used for DataFrames. Note that we can add a name for the Series using the name argument:

import pandas as pd
income = pd.Series([100, 125, 105, 111, 275, 137, 
                     99, 10, 250, 100, 175, 200],
                   index = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
                            'Jul', &apos...

Preparing Series from DataFrames and vice versa

In Chapter 5, Data Selection – DataFrames, we saw examples of getting a Series by slicing the column of a DataFrame. Let's review this. You have been provided with a dataset (adapted from https://archive.ics.uci.edu/ml/datasets/Water+Treatment+Plant) regarding a water treatment facility and you've been asked to analyze its performance. The data contains various chemical measurements for the input, two settling stages, and the output, plus some performance indicators. We will begin by reading the water-treatment.csv file. After reading the data, we will use the .fillna() method, which replaces any missing values, which are converted into NaN values during the file read, into the value that's passed to .fillna(). We will use a value of -9999 here:

water_data = pd.read_csv('Datasets\\water-treatment.csv')
water_data.fillna(-9999, inplace = True)
water_data

Note

Please change the path of the dataset...

Activity 6.01 – Series data selection

In this activity, you will read some US population data for large cities for the years 2010 and 2019 and analyze it. The goal is to determine the population growth for the top three cities compared to all the top 20 from 2010 to 2019. To do this, you must compute the population of the three largest cities for 2010 and 2019, as well as the population of the 20 largest cities for both years. Using these values, you can compute the growth rates and compare them.

Follow these steps to complete this activity:

For this activity, all you will need is the pandas library. Load it into the first cell of the notebook.
Read in a pandas Series from the US_Census_SUB-IP-EST2019-ANNRNK_top_20_2010.csv file. This data is from the US Census Bureau (source: https://www2.census.gov/programs-surveys/popest/datasets/2010/2010-eval-estimates/). The city names are in the first column, so read them so that they are used as the indexes. List the resulting...

Understanding the differences between base Python and pandas data selection

For the most part, once you have learned a bit of pandas notation for slicing and indexing, pandas objects work nearly transparently with core Python. Since the indexing of some different object types looks similar, here, we'll touch on some of the differences so that you can avoid surprises in the future.

Lists versus Series access

Python lists look superficially like Series. When you're using bracket notation to index a Series, it works much the same way as indexing a list. Here, we make a simple list using the range() function, then print out 11 values within the list:

my_list = list(range(100))
print(my_list[12:33])

This will produce the following output:

[12  13,  14,  15,  16,  17,  18,  19,  20,  21,  22]

Now, let's attempt the same thing, but using .iloc[]:

print(my_list...

Activity 6.02 – DataFrame data selection

In this activity, you need to analyze data from this year's survey of Abalone oysters for the National Marine Fisheries Service (the source data can be found in the UCI repository: https://archive.ics.uci.edu/ml/datasets/abalone). In particular, you want to get some summary values for the dimensions of male and female samples in the data, depending on the number of rings in the oysters' shells. The ring count is a measure of age, and reviewing this data provides comparisons to previous years to help you understand the health of the population. The data contains several observations, including sex, length, diameter, weight, shell weight, and the number of rings.

To complete this activity, follow these steps:

For this activity, all you will need is the pandas library. Load it into the first cell of the notebook.
Read the abalone.csv file into a DataFrame called abalone and view the first five rows.
Create a...

Summary

In this chapter, we have learned about the pandas methods of data indexing and selection using a Series. We compared the Series.loc() and Series.iloc() methods for accessing items in a Series by labels and integer locations, respectively. We also used pandas shortcut methods, including bracket notation and extended indexing. We reviewed that most methods for DataFrames work similarly and intuitively for a pandas Series, and we highlighted a few key differences. After understanding indexes and how to access them, we illustrated differences between core pandas data structures such as lists and dictionaries, as well as some things to keep in mind regarding pandas and core Python.

At this point, you should be comfortable working with pandas data access as well as understand the common pitfalls and workarounds. With these tools in hand, you are ready to tackle data projects of any complexity. In the next chapter, Chapter 7, Data Transformation, you will apply some of these methods...

The rest of the chapter is locked

You have been reading a chapter from

The Pandas Workshop

Published in: Jun 2022Publisher: PacktISBN-13: 9781800208933

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (4)

Blaine Bateman

Blaine Bateman has more than 35 years of experience working with various industries from government R&D to startups to $1B public companies. His experience focuses on analytics including machine learning and forecasting. His hands-on abilities include Python and R coding, Keras/Tensorflow, and AWS & Azure machine learning services. As a machine learning consultant, he has developed and deployed actual ML models in industry.
Read more about Blaine Bateman

Saikat Basak

Saikat Basak is a data scientist and a passionate programmer. Having worked with multiple industry leaders, he has a good understanding of problem areas that can potentially be solved using data. Apart from being a data guy, he is also a science geek and loves to explore new ideas in the frontiers of science and technology.
Read more about Saikat Basak

Thomas V. Joseph

Thomas V. Joseph is a data science practitioner, researcher, trainer, mentor, and writer with more than 19 years of experience. He has extensive experience in solving business problems using machine learning toolsets across multiple industry segments.
Read more about Thomas V. Joseph

William So

William So is a Data Scientist with both a strong academic background and extensive professional experience. He is currently the Head of Data Science at Douugh and also a Lecturer for Master of Data Science and Innovation at the University of Technology Sydney. During his career, he successfully covered the end-end spectrum of data analytics from ML to Business Intelligence helping stakeholders derive valuable insights and achieve amazing results that benefits the business. William is a co-author of the "The Applied Artificial Intelligence Workshop" published by Packt.
Read more about William So

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages