Is Grouping In Dataframe Based On Specific Parameters Possible Using Python?
Solution 1:
I would do it this way:
df_out = pd.concat([df1,df2])
df_out = (df_out[df_out.groupby(['Name'])['No.'].transform(lambda x: x.nunique() > 1)]
.reset_index(drop=True)
.set_index(['Name','No.'], append=True)['Comment']
.unstack([0,2]))
df_out.columns = df_out.columns.droplevel(0)
df_out
Output:
No. 2139300 2234903 2139300 2234903
Name
John Irrelevant Regardless Awesome Perfect
Use reset_index
to get unique index per row, then append 'name' and 'no.' to that index and unstack new row number index and no.to create a multiindex column header, then drop the top level of the column header.
You can use:
df_out.rename_axis(None, axis=1).rename_axis(None)
To get rid of index names and create a more "clean" table looking dataframe:
2139300 2234903 2139300 2234903
John Irrelevant Regardless Awesome Perfect
Solution 2:
How about this?
1) Group & unstack dataframe1 and dataframe2 to get the general shape you're going for:
dataframe1_transformed = \
dataframe1.groupby(["**Name**", '**No.**'])['**Comment**'].\
sum().unstack("**No.**")
dataframe2_transformed = \
dataframe2.groupby(["**Name**", '**No.**'])['**Comment**'].\
sum().unstack("**No.**")
dataframe1_transformed
**No.** **Name** 21233202139300223490328328830 Bob Doesnt MatterSomething NoneNoneNone1 Joe NoneNoneNone Whatever
2 John None Irrelevant Regardless None
dataframe2_transformed
**No.** **Name** 21233202139300223490328328830 Bob GreatGood NoneNoneNone1 Joe NoneNoneNone Solid
2 John None Awesome Perfect None
2) Combine them:
dataframe_all_transformed = \
dataframe1_transformed.merge(dataframe2_transformed,
how='inner', left_index=True,
right_index=True)
dataframe_all_transformed
**No.** **Name** 2123320_x 2139300_x 2234903_x 2832883_x 2123320_y 2139300_y 2234903_y 2832883_y
0 Bob DoesntMatterSomething NoneNoneNone GreatGood NoneNoneNone1 Joe NoneNoneNone Whatever NoneNoneNone Solid
2 John None Irrelevant Regardless NoneNone Awesome Perfect None
3) Separately count the number of unique appearances:
num_apperances = dataframe1.drop_duplicates(subset=['**Name**', '**No.**']).\
groupby(['**Name**']).size()
multiple_appearing_names = num_apperances[num_apperances >1].index
4) Filter the combined transformed data just for those names:
dataframe_multiple_transformed = dataframe_all_transformed.loc[
multiple_appearing_names].T.dropna().T
5) Technically it's a bad idea to have identical column names in a dataframe, but since you want it:
dataframe_multiple_transformed.columns = \
[x.split("_")[0] for x in dataframe_multiple_transformed.columns]
dataframe_multiple_transformed
**Name** 2139300 2234903 2139300 2234903
0 John Irrelevant Regardless Awesome Perfect
Post a Comment for "Is Grouping In Dataframe Based On Specific Parameters Possible Using Python?"