top of page

Imputing missing values in time series

Updated: Sep 23, 2023

Missing values are unavoidable in any data science project. It takes experience and skill to handle the missing data. Typical missing value handling consist of taking decisions like deleting the entire record, where applicable. If there are a large number of missing values in a column then even the entire column can be dropped. In certain cases,


Handling missing values in time series data can be even more challenging as dropping data points may even lead to further data problems.


Following is an example time series. Take the index (0-20) as any time interval.



Note the missing values in the above observations (Nan). In fact - 9 observations out of 20 are missing. If we plot on a graph - it looks like the following.



By Statistical Aggregates


Mean

df['new_col_name']=df['col_name'].fillna(df['col_name'].mean())


Median

df['new_col_name']=df['col_name'].fillna(df[col_name].median())



By Existing Observations


Last Observation Carried Forward (bfill)

df['new_col_name']=df['col_name'].fillna(method='bfill'))


Next Observation Carried Backward (ffill)

df['new_col_name']=df['col_name'].fillna(method='ffill'))



By Interpolation


Pad

df['new_col_name']=df['col_name'].interpolate(method='pad'))


Linear

df['new_col_name']=df['col_name'].interpolate(method='linear'))


Polynomial

df['new_col_name']=df['col_name'].interpolate(method='linear', order=2))


Spline

df['new_col_name']=df['col_name'].interpolate(method='linear', order=2))




Polynomial interpolation seems to produce the smoothest curve



If we look at all observations together



47 views0 comments

Recent Posts

See All

Crosstab vs Pivot

Crosstab and Pivot are terms used quite interchangeably. But are they the same or is there a difference between them ? This post...

Comments


bottom of page