Pandas Group By Remove Outliers
I want to remove outliers based on percentile 99 values by group wise. import pandas as pd df = pd.DataFrame({'Group': ['A','A','A','B','B','B','B'], 'count': [1.1,11.2,1.1,3.3,
Solution 1:
Here is my solution:
defis_outlier(s):
lower_limit = s.mean() - (s.std() * 3)
upper_limit = s.mean() + (s.std() * 3)
return ~s.between(lower_limit, upper_limit)
df = df[~df.groupby('Group')['count'].apply(is_outlier)]
You can write your own is_outlier function
Solution 2:
I don't think you want to use quantile, as you'll exclude your lower values:
import pandas as pd
df = pd.DataFrame({'Group': ['A','A','A','B','B','B','B'], 'count': [1.1,11.2,1.1,3.3,3.40,3.3,100.0]})
print(pd.DataFrame(df.groupby('Group').quantile(.01)['count']))
output:
count
Group
A1.1B3.3
Those aren't outliers, right? So you wouldn't want to exclude them.
You could try setting left and right limits by using standard deviations from the median maybe? This is a bit verbose, but it gives you the right answer:
left = pd.DataFrame(df.groupby('Group').median() - pd.DataFrame(df.groupby('Group').std()))
right = pd.DataFrame(df.groupby('Group').median() + pd.DataFrame(df.groupby('Group').std()))
left.columns = ['left']
right.columns = ['right']
df = df.merge(left, left_on='Group', right_index=True)
df = df.merge(right, left_on='Group', right_index=True)
df = df[(df['count'] > df['left']) & (df['count'] < df['right'])]
df = df.drop(['left', 'right'], axis=1)
print(df)
output:
Group count
0A1.12A1.13B3.34B3.45B3.3
Post a Comment for "Pandas Group By Remove Outliers"