Skip to content Skip to sidebar Skip to footer

Python Linear Interpolation Of Values In Dataframe

I have a python dataframe with hourly values for Jan 2015 except some hours are missing the index and values both. Ideally the dataframe with columns named 'dates' and 'values' sho

Solution 1:

Use df.asfreq to expand the DataFrame so as to have an hourly frequency. NaN is inserted for missing values:

df = df.asfreq('H')

then use df.interpolate to replace the NaNs with (linearly) interpolated values based on the DatetimeIndex and the nearest non-NaN values:

df = df.interpolate(method='time')

For example,

import numpy as np
import pandas as pd

N, M = 744, 734index = pd.date_range('2015-01-01', periods=N, freq='H')
idx = np.random.choice(np.arange(N), M, replace=False)
idx.sort()
index = index[idx]

# This creates a toy DataFrame with 734 non-null rows:
df = pd.DataFrame({'values': np.random.randint(10, size=(M,))}, index=index)

# This expands the DataFrame to 744 rows (10 null rows):
df = df.asfreq('H')# This makes `df` have 744 non-null rows:
df = df.interpolate(method='time')

Solution 2:

What you want requires a combination of this technique: Add missing dates to pandas dataframe

And the pandas function pandas.Series.interpolate. From what you've said, the option 'linear' is what you want.

EDIT: Interpolate will not work in the case were you have datapoints missing at the very start of the time series. One idea is to use pandas.Series.fillna with 'backfill' after the interpolation. Also, do not set fill_value to 0 whe you call reindex

Solution 3:

A general interpolation is the following:

If the key exits:

  • Return the value

else:

  • Find the first key before and after the required key, find the distance (which you can define using a desired metric) to both keys and take a weighted average of the values, weighed by the distances of the keys (close is heigher weight).

Post a Comment for "Python Linear Interpolation Of Values In Dataframe"