Sum The Duplicate Rows Of Particular Columns In Dataframe
I want to add the particular columns (C, D, E, F, G) based on the duplicate rows of column B. Whereas the remaining non-duplicate rows unchanged. The output of column A must be the
Solution 1:
Create dictionary of columns names with aggregate functions and pass to agg
, also here is necessary min_count=1
to sum
for avoid 0
for sum NaN
s values:
L = ['C','D','E','F','G']
d = {**dict.fromkeys(L, lambda x: x.sum(min_count=1)), **{'A':'first'}}
df = df.groupby('B', as_index=False, sort=False).agg(d).reindex(columns=df.columns)
print (df)
A B C D E F G
0 box1 0487 1.0 1.0 1.0 1.0 1.0
1 blue 0478 2.0 2.0 2.0 2.0 2.0
2 flat 8704 1.0 1.0 1.0 NaN 2.0
3 dark 8740 1.0 1.0 1.0 1.0 1.0
4 late 4087 1.0 NaN 1.0 NaN 1.0
d = {**dict.fromkeys(L, 'sum'), **{'A':'first'}}
df = df.groupby('B', as_index=False, sort=False).agg(d).reindex(columns=df.columns)
print (df)
A B C D E F G
0 box1 0487 1.0 1.0 1.0 1.0 1.0
1 blue 0478 2.0 2.0 2.0 2.0 2.0
2 flat 8704 1.0 1.0 1.0 0.0 2.0
3 dark 8740 1.0 1.0 1.0 1.0 1.0
4 late 4087 1.0 0.0 1.0 0.0 1.0
Post a Comment for "Sum The Duplicate Rows Of Particular Columns In Dataframe"