top of page

Inputing missing values in time series

Updated: Dec 2

Missing values are unavoidable in any data science project. It takes experience and skill to handle the missing data. Typical missing value handling consist of taking decisions like deleting the entire record, where applicable. If there are a large number of missing values in a column then even the entire column can be dropped. In certain cases,


Handling missing values in time series data can be even more challenging as dropping data points may even lead to further data problems.


Following is an example time series. Take the index (0-20) as any time interval.



Note the missing values in the above observations (Nan). In fact - 9 observations out of 20 are missing. If we plot on a graph - it looks like the following.



By Statistical Aggregates


Mean

df['new_col_name']=df['col_name'].fillna(df['col_name'].mean())


Median

df['new_col_name']=df['col_name'].fillna(df[col_name].median())



By Existing Observations


Last Observation Carried Forward (bfill)

df['new_col_name']=df['col_name'].fillna(method='bfill'))


Next Observation Carried Backward (ffill)

df['new_col_name']=df['col_name'].fillna(method='ffill'))



By Interpolation


Pad

df['new_col_name']=df['col_name'].interpolate(method='pad'))


Linear

df['new_col_name']=df['col_name'].interpolate(method='linear'))


Polynomial

df['new_col_name']=df['col_name'].interpolate(method='linear', order=2))


Spline

df['new_col_name']=df['col_name'].interpolate(method='linear', order=2))




Polynomial interpolation seems to produce the smoothest curve



If we look at all observations together



54 views0 comments

Recent Posts

See All

Harnessing the Power of TimescaleDB

TimescaleDB is an open-source database designed to make SQL scalable for time-series data. It is engineered up from PostgreSQL and packaged

Comments


bottom of page