Pandas Cookbook

Over 95 hands-on recipes to leverage the power of pandas for efficient scientific computation and data analysis
Preview in Mapt

Pandas Cookbook

Theodore Petrou

3 customer reviews
Over 95 hands-on recipes to leverage the power of pandas for efficient scientific computation and data analysis
Mapt Subscription
FREE
$29.99/m after trial
eBook
$10.00
RRP $39.99
Save 74%
Print + eBook
$49.99
RRP $49.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$10.00
$49.99
$29.99 p/m after trial
RRP $39.99
RRP $49.99
Subscription
eBook
Print + eBook
Start 30 Day Trial

Frequently bought together


Pandas Cookbook Book Cover
Pandas Cookbook
$ 39.99
$ 10.00
Robot Operating System Cookbook Book Cover
Robot Operating System Cookbook
$ 39.99
$ 10.00
Buy 2 for $20.00
Save $59.98
Add to Cart

Book Details

ISBN 139781784393878
Paperback538 pages

Book Description

This book will provide you with unique, idiomatic, and fun recipes for both fundamental and advanced data manipulation tasks with pandas. Some recipes focus on achieving a deeper understanding of basic principles, or comparing and contrasting two similar operations. Other recipes will dive deep into a particular dataset, uncovering new and unexpected insights along the way.

The pandas library is massive, and it’s common for frequent users to be unaware of many of its more impressive features. The official pandas documentation, while thorough, does not contain many useful examples of how to piece together multiple commands like one would do during an actual analysis. This book guides you, as if you were looking over the shoulder of an expert, through practical situations that you are highly likely to encounter.

Many advanced recipes combine several different features across the pandas library to generate results.

Table of Contents

Chapter 1: Pandas Foundations
Introduction
Dissecting the anatomy of a DataFrame
Accessing the main DataFrame components
Understanding data types
Selecting a single column of data as a Series
Calling Series methods
Working with operators on a Series
Chaining Series methods together
Making the index meaningful
Renaming row and column names
Creating and deleting columns
Chapter 2: Essential DataFrame Operations
Introduction
Selecting multiple DataFrame columns
Selecting columns with methods
Ordering column names sensibly
Operating on the entire DataFrame
Chaining DataFrame methods together
Working with operators on a DataFrame
Comparing missing values
Transposing the direction of a DataFrame operation
Determining college campus diversity
Chapter 3: Beginning Data Analysis
Introduction
Developing a data analysis routine
Reducing memory by changing data types
Selecting the smallest of the largest
Selecting the largest of each group by sorting
Replicating nlargest with sort_values
Calculating a trailing stop order price
Chapter 4: Selecting Subsets of Data
Introduction
Selecting Series data
Selecting DataFrame rows
Selecting DataFrame rows and columns simultaneously
Selecting data with both integers and labels
Speeding up scalar selection
Slicing rows lazily
Slicing lexicographically
Chapter 5: Boolean Indexing
Introduction
Calculating boolean statistics
Constructing multiple boolean conditions
Filtering with boolean indexing
Replicating boolean indexing with index selection
Selecting with unique and sorted indexes
Gaining perspective on stock prices
Translating SQL WHERE clauses
Determining the normality of stock market returns
Improving readability of boolean indexing with the query method
Preserving Series with the where method
Masking DataFrame rows
Selecting with booleans, integer location, and labels
Chapter 6: Index Alignment
Introduction
Examining the Index object
Producing Cartesian products
Exploding indexes
Filling values with unequal indexes
Appending columns from different DataFrames
Highlighting the maximum value from each column
Replicating idxmax with method chaining
Finding the most common maximum
Chapter 7: Grouping for Aggregation, Filtration, and Transformation
Introduction
Defining an aggregation
Grouping and aggregating with multiple columns and functions
Removing the MultiIndex after grouping
Customizing an aggregation function
Customizing aggregating functions with *args and **kwargs
Examining the groupby object
Filtering for states with a minority majority
Transforming through a weight loss bet
Calculating weighted mean SAT scores per state with apply
Grouping by continuous variables
Counting the total number of flights between cities
Finding the longest streak of on-time flights
Chapter 8: Restructuring Data into a Tidy Form
Introduction
Tidying variable values as column names with stack
Tidying variable values as column names with melt
Stacking multiple groups of variables simultaneously
Inverting stacked data
Unstacking after a groupby aggregation
Replicating pivot_table with a groupby aggregation
Renaming axis levels for easy reshaping
Tidying when multiple variables are stored as column names
Tidying when multiple variables are stored as column values
Tidying when two or more values are stored in the same cell
Tidying when variables are stored in column names and values
Tidying when multiple observational units are stored in the same table
Chapter 9: Combining Pandas Objects
Introduction
Appending new rows to DataFrames
Concatenating multiple DataFrames together
Comparing President Trump's and Obama's approval ratings
Understanding the differences between concat, join, and merge
Connecting to SQL databases
Chapter 10: Time Series Analysis
Introduction
Understanding the difference between Python and pandas date tools
Slicing time series intelligently
Using methods that only work with a DatetimeIndex
Counting the number of weekly crimes
Aggregating weekly crime and traffic accidents separately
Measuring crime by weekday and year
Grouping with anonymous functions with a DatetimeIndex
Grouping by a Timestamp and another column
Finding the last time crime was 20% lower with merge_asof
Chapter 11: Visualization with Matplotlib, Pandas, and Seaborn
Introduction
Getting started with matplotlib
Visualizing data with matplotlib
Plotting basics with pandas
Visualizing the flights dataset
Stacking area charts to discover emerging trends
Understanding the differences between seaborn and pandas
Doing multivariate analysis with seaborn Grids
Uncovering Simpson's paradox in the diamonds dataset with seaborn

What You Will Learn

  • Master the fundamentals of pandas to quickly begin exploring any dataset
  • Isolate any subset of data by properly selecting and querying the data
  • Split data into independent groups before applying aggregations and transformations to each group
  • Restructure data into tidy form to make data analysis and visualization easier
  • Prepare real-world messy datasets for machine learning
  • Combine and merge data from different sources through pandas SQL-like operations
  • Utilize pandas unparalleled time series functionality
  • Create beautiful and insightful visualizations through pandas direct hooks to matplotlib and seaborn

Authors

Table of Contents

Chapter 1: Pandas Foundations
Introduction
Dissecting the anatomy of a DataFrame
Accessing the main DataFrame components
Understanding data types
Selecting a single column of data as a Series
Calling Series methods
Working with operators on a Series
Chaining Series methods together
Making the index meaningful
Renaming row and column names
Creating and deleting columns
Chapter 2: Essential DataFrame Operations
Introduction
Selecting multiple DataFrame columns
Selecting columns with methods
Ordering column names sensibly
Operating on the entire DataFrame
Chaining DataFrame methods together
Working with operators on a DataFrame
Comparing missing values
Transposing the direction of a DataFrame operation
Determining college campus diversity
Chapter 3: Beginning Data Analysis
Introduction
Developing a data analysis routine
Reducing memory by changing data types
Selecting the smallest of the largest
Selecting the largest of each group by sorting
Replicating nlargest with sort_values
Calculating a trailing stop order price
Chapter 4: Selecting Subsets of Data
Introduction
Selecting Series data
Selecting DataFrame rows
Selecting DataFrame rows and columns simultaneously
Selecting data with both integers and labels
Speeding up scalar selection
Slicing rows lazily
Slicing lexicographically
Chapter 5: Boolean Indexing
Introduction
Calculating boolean statistics
Constructing multiple boolean conditions
Filtering with boolean indexing
Replicating boolean indexing with index selection
Selecting with unique and sorted indexes
Gaining perspective on stock prices
Translating SQL WHERE clauses
Determining the normality of stock market returns
Improving readability of boolean indexing with the query method
Preserving Series with the where method
Masking DataFrame rows
Selecting with booleans, integer location, and labels
Chapter 6: Index Alignment
Introduction
Examining the Index object
Producing Cartesian products
Exploding indexes
Filling values with unequal indexes
Appending columns from different DataFrames
Highlighting the maximum value from each column
Replicating idxmax with method chaining
Finding the most common maximum
Chapter 7: Grouping for Aggregation, Filtration, and Transformation
Introduction
Defining an aggregation
Grouping and aggregating with multiple columns and functions
Removing the MultiIndex after grouping
Customizing an aggregation function
Customizing aggregating functions with *args and **kwargs
Examining the groupby object
Filtering for states with a minority majority
Transforming through a weight loss bet
Calculating weighted mean SAT scores per state with apply
Grouping by continuous variables
Counting the total number of flights between cities
Finding the longest streak of on-time flights
Chapter 8: Restructuring Data into a Tidy Form
Introduction
Tidying variable values as column names with stack
Tidying variable values as column names with melt
Stacking multiple groups of variables simultaneously
Inverting stacked data
Unstacking after a groupby aggregation
Replicating pivot_table with a groupby aggregation
Renaming axis levels for easy reshaping
Tidying when multiple variables are stored as column names
Tidying when multiple variables are stored as column values
Tidying when two or more values are stored in the same cell
Tidying when variables are stored in column names and values
Tidying when multiple observational units are stored in the same table
Chapter 9: Combining Pandas Objects
Introduction
Appending new rows to DataFrames
Concatenating multiple DataFrames together
Comparing President Trump's and Obama's approval ratings
Understanding the differences between concat, join, and merge
Connecting to SQL databases
Chapter 10: Time Series Analysis
Introduction
Understanding the difference between Python and pandas date tools
Slicing time series intelligently
Using methods that only work with a DatetimeIndex
Counting the number of weekly crimes
Aggregating weekly crime and traffic accidents separately
Measuring crime by weekday and year
Grouping with anonymous functions with a DatetimeIndex
Grouping by a Timestamp and another column
Finding the last time crime was 20% lower with merge_asof
Chapter 11: Visualization with Matplotlib, Pandas, and Seaborn
Introduction
Getting started with matplotlib
Visualizing data with matplotlib
Plotting basics with pandas
Visualizing the flights dataset
Stacking area charts to discover emerging trends
Understanding the differences between seaborn and pandas
Doing multivariate analysis with seaborn Grids
Uncovering Simpson's paradox in the diamonds dataset with seaborn

Book Details

ISBN 139781784393878
Paperback538 pages
Read More
From 3 reviews

Read More Reviews

Recommended for You

Python Machine Learning - Second Edition Book Cover
Python Machine Learning - Second Edition
$ 31.99
$ 10.00
Practical Time Series Analysis Book Cover
Practical Time Series Analysis
$ 35.99
$ 10.00
Jupyter for Data Science Book Cover
Jupyter for Data Science
$ 31.99
$ 10.00
Practical Reinforcement Learning Book Cover
Practical Reinforcement Learning
$ 35.99
$ 10.00
Learning pandas - Second Edition Book Cover
Learning pandas - Second Edition
$ 39.99
$ 10.00
Python Deep Learning Cookbook Book Cover
Python Deep Learning Cookbook
$ 35.99
$ 10.00