Input missing values in time series

Missing values are unavoidable in any data science project. It takes experience and skill to handle the missing data. Typical missing value handling consist of taking decisions like deleting the entire record, where applicable. If there are a large number of missing values in a column then even the entire column can be dropped. In certain cases,

Handling missing values in time series data can be even more challenging as dropping data points may even lead to further data problems.

Following is an example time series. Take the index (0-20) as any time interval.

Note the missing values in the above observations (Nan). In fact - 9 observations out of 20 are missing. If we plot on a graph - it looks like the following.

By Statistical Aggregates

Mean

df['new_col_name']=df['col_name'].fillna(df['col_name'].mean())

Median

df['new_col_name']=df['col_name'].fillna(df[col_name].median())

By Existing Observations

Last Observation Carried Forward (bfill)

df['new_col_name']=df['col_name'].fillna(method='bfill'))

Next Observation Carried Backward (ffill)

df['new_col_name']=df['col_name'].fillna(method='ffill'))

By Interpolation

Pad

df['new_col_name']=df['col_name'].interpolate(method='pad'))

Linear

df['new_col_name']=df['col_name'].interpolate(method='linear'))

Polynomial

df['new_col_name']=df['col_name'].interpolate(method='linear', order=2))

Spline

df['new_col_name']=df['col_name'].interpolate(method='linear', order=2))

Polynomial interpolation seems to produce the smoothest curve

If we look at all observations together

Inputing missing values in time series