Skip to content Skip to sidebar Skip to footer

Flag Outliers In The Dataframe For Each Group

I would like to identify outliers for each group of values within a dataframe and return a dataframe with a column containing True/False for each row of the dataframe. data = {'Gro

Solution 1:

You can use groupby().transform to get mean and std by group, then between to find outliers:

groups = df.groupby('Group')
means = groups.Age.transform('mean')
stds = groups.Age.transform('std')

df['Flag'] = df.Age.between(means-stds*3, means+stds*3)

Solution 2:

change your function to the folllowing,

defflag_outlier(x):
    lower_limit  = np.mean(x) - np.std(x) * 3 
    upper_limit = np.mean(x) + np.std(x) * 3return (x>upper_limit)| (x<lower_limit)

because the way you are going about it, your function returns just one value per group

Post a Comment for "Flag Outliers In The Dataframe For Each Group"