Pandas Groupby With Bin Sum Aggregation
I have a similar question to this one I have a dataframe in pandas that looks like this - showing ages at which different users won awards. Interested in computing total awards fo
Solution 1:
You can define the bins and cuts as follows:
bins = [9* i for i inrange(0, df['age'].max() //9+2)]
cuts = pd.cut(df['age'], bins, right=False)
print(cuts)
0 [18, 27)
1 [18, 27)
2 [54, 63)
3 [27, 36)
4 [45, 54)
Name: age, dtype: category
Categories (7, interval[int64, left]): [[0, 9) < [9, 18) < [18, 27) < [27, 36) < [36, 45) < [45, 54) < [54, 63)]
Then, group by id
and the cuts
and sum awards
for the cuts to get total_awards
. Create age_interval
by GroupBy.cumcount()
df_out = (df.groupby(['id', cuts])
.agg(total_awards=('awards', 'sum'))
.reset_index(level=0)
.reset_index(drop=True)
)
df_out['age_interval'] = df_out.groupby('id').cumcount()
Result:
print(df_out)
id total_awards age_interval
0 1 0 0
1 1 0 1
2 1 250 2
3 1 0 3
4 1 0 4
5 1 0 5
6 1 50 6
7 2 0 0
8 2 0 1
9 2 0 2
10 2 193 3
11 2 0 4
12 2 209 5
13 2 0 6
Solution 2:
Pretty sure this covers what you are looking for
df = pd.read_clipboard()
bins = [i for i in range(0, 100 ,9)]
results = df.groupby(['id', pd.cut(df.age, bins)])['awards'].sum().reset_index()
print(results)
id age awards
01(0,9]NaN11(9,18]NaN21(18,27]250.031(27,36]NaN41(36,45]NaN51(45,54]50.061(54,63]NaN71(63,72]NaN81(72,81]NaN91(81,90]NaN101(90,99]NaN112(0,9]NaN122(9,18]NaN132(18,27]NaN142(27,36]193.0152(36,45]NaN162(45,54]209.0172(54,63]NaN182(63,72]NaN192(72,81]NaN202(81,90]NaN212(90,99]NaN
Post a Comment for "Pandas Groupby With Bin Sum Aggregation"