Accepting Top Rows In Pandas Dataframe Based On Grouping
Related to the question here: Reordering pandas dataframe based on multiple column and sum of one column How can I accept the top 2 countries in this dataframe, when using sort col
Solution 1:
UPDATE:
In [166]: df.loc[df.Country_FAO.isin(df.groupby('Country_FAO').sum().nlargest(2, 'mean_area').index)]
Out[166]:
Country_FAO type mean_area sort
5 Australia car 12141000.018910501.04 Australia car 6475695.018910501.06 Australia bus 293806.018910501.00 Afghanistan car 2029000.02141000.01 Afghanistan car 112000.02141000.0
i would do it this way:
In [153]: df.groupby('Country_FAO').sum()
Out[153]:
mean_area
Country_FAO
Afghanistan 2141000.0
Algeria 829351.0
Australia 18910501.0
In [154]: df.groupby('Country_FAO').sum().nlargest(2, 'mean_area')
Out[154]:
mean_area
Country_FAO
Australia 18910501.0
Afghanistan 2141000.0
In [155]: df.groupby('Country_FAO').sum().nlargest(2, 'mean_area').index
Out[155]: Index(['Australia', 'Afghanistan'], dtype='object', name='Country_FAO')
also, you may want to reset your index:
In [156]: df.groupby('Country_FAO').sum().nlargest(2, 'mean_area').reset_index()
Out[156]:
Country_FAO mean_area
0 Australia 18910501.01 Afghanistan 2141000.0
Post a Comment for "Accepting Top Rows In Pandas Dataframe Based On Grouping"