Count Maximum Consecutive Occurences Of A String In A Dataframe Column
I have a panda dataframe in which I would like to count the number of consecutive occurences of a specific string in one column. Let's say I have the following dataframe. col1 0
Solution 1:
Can do the usual trick of grouping consecutive values:
df1 = df.groupby((df.col1 != df.col1.shift()).cumsum().rename(None)).col1.agg(['size', 'first'])
# size first#1 3 string1#2 1 string2#3 2 string3#4 1 string1
Then sort_values
+ drop_duplicates
to find the largest:
df1 = df1.sort_values('size').drop_duplicates('first', keep='last').set_index('first').rename_axis(None)
# size#string2 1#string3 2#string1 3
So now you can look them up easily:
df1.loc['string1']#size3#Name: string1, dtype: int64
Solution 2:
Just itertools
groupby
, the order here keep the same as original df
import itertools
pd.DataFrame([x,len(list(y))] for x , y in itertools.groupby(df['col1']))
Out[92]:
010 string1 31 string2 12 string3 23 string1 1
pd.DataFrame([x,len(list(y))] for x , y in itertools.groupby(df['col1'])).groupby(0)[1].max()
Out[94]:
0
string1 3
string2 1
string3 2
Name: 1, dtype: int64
Post a Comment for "Count Maximum Consecutive Occurences Of A String In A Dataframe Column"