Apply performance
The .apply method on a Series and DataFrame is one of the slowest operations in pandas. In this recipe, we will explore the speed of it and see if we can debug what is going on.
How to do it…
- Let's time how long one use of the
.applymethod takes using the%%timeitcell magic in Jupiter. This is the code from thetweak_kagfunction that limits the cardinality of the country column (Q3):>>> %%timeit >>> def limit_countries(val): ... if val in {'United States of America', 'India', 'China'}: ... return val ... return 'Another' >>> q3 = df.Q3.apply(limit_countries).rename('Country') 6.42 ms ± 1.22 ms per loop (mean ± std. dev. of 7 runs, 100 loops each) - Let's look at using the
.replacemethod instead of.applyand see if that improves performance:>>> %%timeit >>> other_values = df...