Reducing the memory usage of pandas DataFrames
When you are dealing with lots of information – for example, when analyzing whole genome sequencing data – memory usage may become a limitation for your analysis. It turns out that naïve pandas is not very efficient from a memory perspective, and we can substantially reduce its consumption. One major reason is that pandas tends to assign data types that are larger than are really needed. For more background on pandas memory usage, see https://medium.com/@gautamrajotya/how-to-reduce-memory-usage-in-python-pandas-158427a99001.In this recipe, we are going to revisit our VAERS data and look at several ways to reduce pandas' memory usage. The impact of these changes can be massive: in many cases, reducing memory consumption may mean the difference between being able to use pandas or requiring a more alternative and complex approach, such as Dask or Spark.
Getting ready
We will be using the data from the first recipe. If...