Skip to content Skip to sidebar Skip to footer

Add Missing Timestamp Row To A Dataframe

I have a dataframe which contains data that were measured at two hours interval each day, some time intervals are however missing. My dataset looks like below: 2020-12-01 08:00:00

Solution 1:

Create DatetimeIndex and use DataFrame.asfreq:

print (df)

                  date    val
0  2020-12-01 08:00:00  145.9
1  2020-12-01 10:00:00  100.0
2  2020-12-01 16:00:00   99.3
3  2020-12-01 18:00:00   91.0

df['date'] = pd.to_datetime(df['date'])

df = df.set_index('date').asfreq('2H')

print (df)
                       val
date                      
2020-12-01 08:00:00  145.9
2020-12-01 10:00:00  100.0
2020-12-01 12:00:00    NaN
2020-12-01 14:00:00    NaN
2020-12-01 16:00:00   99.3
2020-12-01 18:00:00   91.0

Solution 2:

assuming your df looks like

              datetime  value
0  2020-12-01T08:00:00  145.9
1  2020-12-01T10:00:00  100.0
2  2020-12-01T16:00:00   99.3
3  2020-12-01T18:00:00   91.0

make sure datetime column is dtype datetime;

df['datetime'] = pd.to_datetime(df['datetime'])

so that you can now resample to 2-hourly frequency:

df.resample('2H', on='datetime').mean()

                     value
datetime                  
2020-12-01 08:00:00  145.9
2020-12-01 10:00:00  100.0
2020-12-01 12:00:00    NaN
2020-12-01 14:00:00    NaN
2020-12-01 16:00:00   99.3
2020-12-01 18:00:00   91.0

Note that you don't need to set the on= keyword if your df already has a datetime index. The df resulting from resampling will have a datetime index.

Also note that I'm using .mean() as aggfunc, meaning that if you have multiple values within the two hour intervals, you'll get the mean of that.


Solution 3:

You can try the following:

I have used datetime and timedelta for this,

from datetime import datetime, timedelta

# Asuming that the data is given like below.
data = ['2020-12-01 08:00:00 145.9',
'2020-12-01 10:00:00 100.0',
'2020-12-01 16:00:00 99.3',
'2020-12-01 18:00:00 91.0']

# initialize the start time using data[0]
date = data[0].split()[0].split('-')
time = data[0].split()[1].split(':')
start = datetime(int(date[0]), int(date[1]), int(date[2]), int(time[0]), int(time[1]), int(time[2]))

newdata = []
newdata.append(data[0])

i = 1
while i < len(data):
    cur = start
    nxt = start + timedelta(hours=2)
    
    if (str(nxt) != (data[i].split()[0] + ' ' + data[i].split()[1])):
        newdata.append(str(nxt) + ' NaN')
    else:
        newdata.append(data[i])
        i+=1
        
    start = nxt
    
newdata

NOTE : temedelta(hours=2) will add 2 hours to the existing time.


Post a Comment for "Add Missing Timestamp Row To A Dataframe"