Skip to content Skip to sidebar Skip to footer

Drop Columns With Low Standard Deviation In Pandas Dataframe

Is there any way of doing this without writing a for loop? Suppose we have the following data: d = {'A': {-1: 0.19052041339798062, 0: -0.0052531481871952871, 1: -0.0022

Solution 1:

You can use the loc method of a dataframe to select certain columns based on a Boolean indexer. Create the indexer like this (uses Numpy Array broadcasting to apply the condition to each column):

df.std() > 0.3

Out[84]: 
A    False
B    False
C    False
D    False
E     True
F    False
G    False
dtype: bool

Then call loc with : in the first position to indicate that you want to return all rows:

df.loc[:, df.std() > .3]
Out[85]: 
           E
-10.3027350 -0.3064021 -0.32698320.60257530.368600

Solution 2:

To drop columns, You need those column names.

threshold = 0.2

df.drop(df.std()[df.std() < threshold].index.values, axis=1)

         D       E       F       G
-10.17670.30270.25330.28760-0.0888-0.3064-0.0639-0.11021-0.0934-0.3270-0.1001-0.126420.09560.60260.08150.170330.51030.36860.36610.3010

Post a Comment for "Drop Columns With Low Standard Deviation In Pandas Dataframe"