Skip to content Skip to sidebar Skip to footer

Count Maximum Consecutive Occurences Of A String In A Dataframe Column

I have a panda dataframe in which I would like to count the number of consecutive occurences of a specific string in one column. Let's say I have the following dataframe. col1 0

Solution 1:

Can do the usual trick of grouping consecutive values:

df1 = df.groupby((df.col1 != df.col1.shift()).cumsum().rename(None)).col1.agg(['size', 'first'])
#   size    first#1     3  string1#2     1  string2#3     2  string3#4     1  string1

Then sort_values + drop_duplicates to find the largest:

df1 = df1.sort_values('size').drop_duplicates('first', keep='last').set_index('first').rename_axis(None)
#         size#string2     1#string3     2#string1     3

So now you can look them up easily:

df1.loc['string1']#size3#Name: string1, dtype: int64

Solution 2:

Just itertoolsgroupby, the order here keep the same as original df

import itertools 
pd.DataFrame([x,len(list(y))] for x , y in itertools.groupby(df['col1']))
Out[92]: 
         010  string1  31  string2  12  string3  23  string1  1

pd.DataFrame([x,len(list(y))] for x , y in itertools.groupby(df['col1'])).groupby(0)[1].max()
Out[94]: 
0
string1    3
string2    1
string3    2
Name: 1, dtype: int64

Post a Comment for "Count Maximum Consecutive Occurences Of A String In A Dataframe Column"