Skip to content Skip to sidebar Skip to footer

Pandas: Add Data For Missing Months

I have a dataframe of sales information by customers by month period, that looks something like this, with multiple customers and varying month periods and spend: customer_id

Solution 1:

Something like this; note that the filling the customer_id is not defined (as you probably have this in a groupby or something).

You may need a reset_index at the end (if desired)

In [130]:df2=df.set_index('month_year')In [131]:df2=df2.sort_index()In [132]:df2Out[132]:customer_idsalesmonth_year2011-07              1233.142011-11              12182.062012-01              1271.242012-03              12155.322012-05              122.58In [133]:df2.reindex(pd.period_range(df2.index[0],df2.index[-1],freq='M'))Out[133]:customer_idsales2011-07           1233.142011-08          NaNNaN2011-09          NaNNaN2011-10          NaNNaN2011-11           12182.062011-12          NaNNaN2012-01           1271.242012-02          NaNNaN2012-03           12155.322012-04          NaNNaN2012-05           122.58In [135]:df2['customer_id']=12In [136]:df2.fillna(0.0)Out[136]:customer_idsales2011-07           1233.142011-08           120.002011-09           120.002011-10           120.002011-11           12182.062011-12           120.002012-01           1271.242012-02           120.002012-03           12155.322012-04           120.002012-05           122.58

Solution 2:

I found a different way to fill in missing months (they will be filled with NaN), while also accounting for multiple possible customers.

df = df.set_index(['month_year', 'customer_id'])['sales'].unstack().unstack().reset_index()
df = df.rename(columns={0:'sales'})

While this is absolutley unelegant, it gets the job done.

Post a Comment for "Pandas: Add Data For Missing Months"