INTERPOLATING MISSING DATES IN PANDAS
Listing B.15 shows the contents of missing_dates.csv and Listing B.16 shows the contents of pandas_interpolate.py that shows how to replace NaN values with interpolated values that are calculated in several ways.
Listing B.15: missing_dates.csv
"dates","values" 2021-01-31,40 2021-02-28,45 2021-03-31,56 2021-04-30,NaN 2021-05-31,NaN 2021-06-30,140 2021-07-31,95 2021-08-31,40 2021-09-30,55 2021-10-31,NaN 2021-11-15,65
Notice the value 140 (shown in bold) in Listing B.15: this value is an outlier, which will affect the calculation of the interpolated values, and potentially generate additional outliers.
Listing B.16: pandas_interpolate.py
import pandas as pd
df = pd.read_csv("missing_dates.csv")
# fill NaN values with linear interpolation:
df1 = df.interpolate()
# fill NaN values with quadratic polynomial interpolation:
df2 = df.interpolate(method='polynomial', order=2)
# fill NaN values with cubic polynomial...