You're reading from Pandas 1.x Cookbook Practical recipes for scientific computing, time series analysis, and exploratory data analysis using Python

Product type Paperback

Published in Feb 2020

Publisher Packt

ISBN-13 9781839213106

Length 626 pages

Edition 2nd Edition

Languages

Python

Tools

Pandas

Concepts

Data Analysis

Authors (2):

Matthew Harrison

Theodore Petrou

View More author details

Table of Contents (17) Chapters

Preface

1. Pandas Foundations

2. Essential DataFrame Operations FREE CHAPTER

3. Creating and Persisting DataFrames

4. Beginning Data Analysis

5. Exploratory Data Analysis

6. Selecting Subsets of Data

7. Filtering Rows

8. Index Alignment

9. Grouping for Aggregation, Filtration, and Transformation

10. Restructuring Data into a Tidy Form

11. Combining Pandas Objects

12. Time Series Analysis

13. Visualization with Matplotlib, Pandas, and Seaborn

14. Debugging and Testing Pandas

15. Other Books You May Enjoy

16. Index

Apply performance

The .apply method on a Series and DataFrame is one of the slowest operations in pandas. In this recipe, we will explore the speed of it and see if we can debug what is going on.

How to do it…

Let's time how long one use of the .apply method takes using the %%timeit cell magic in Jupiter. This is the code from the tweak_kag function that limits the cardinality of the country column (Q3):

>>> %%timeit
>>> def limit_countries(val):
...      if val in  {'United States of America', 'India', 'China'}:
...          return val
...      return 'Another'
>>> q3 = df.Q3.apply(limit_countries).rename('Country')
6.42 ms ± 1.22 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)