Pandas: Add Data For Missing Months
I have a dataframe of sales information by customers by month period, that looks something like this, with multiple customers and varying month periods and spend: customer_id
Solution 1:
Something like this; note that the filling the customer_id is not defined (as you probably have this in a groupby or something).
You may need a reset_index
at the end (if desired)
In [130]:df2=df.set_index('month_year')In [131]:df2=df2.sort_index()In [132]:df2Out[132]:customer_idsalesmonth_year2011-07 1233.142011-11 12182.062012-01 1271.242012-03 12155.322012-05 122.58In [133]:df2.reindex(pd.period_range(df2.index[0],df2.index[-1],freq='M'))Out[133]:customer_idsales2011-07 1233.142011-08 NaNNaN2011-09 NaNNaN2011-10 NaNNaN2011-11 12182.062011-12 NaNNaN2012-01 1271.242012-02 NaNNaN2012-03 12155.322012-04 NaNNaN2012-05 122.58In [135]:df2['customer_id']=12In [136]:df2.fillna(0.0)Out[136]:customer_idsales2011-07 1233.142011-08 120.002011-09 120.002011-10 120.002011-11 12182.062011-12 120.002012-01 1271.242012-02 120.002012-03 12155.322012-04 120.002012-05 122.58
Solution 2:
I found a different way to fill in missing months (they will be filled with NaN), while also accounting for multiple possible customers.
df = df.set_index(['month_year', 'customer_id'])['sales'].unstack().unstack().reset_index()
df = df.rename(columns={0:'sales'})
While this is absolutley unelegant, it gets the job done.
Post a Comment for "Pandas: Add Data For Missing Months"