Split List Elements Into Sub-elements In Pandas Dataframe
I have a dataframe as:- Filtered_data ['defence possessed russia china','factors driving china modernise'] ['force bolster pentagon','strike capabilities pentagon congress detaili
Solution 1:
Use list comprehension with split
and flatenning:
df['Filtered_data'] = df['Filtered_data'].apply(lambda x: [z for y in x for z in y.split()])
print (df)
Filtered_data
0 [defence, possessed, russia, china, factors, d...
1 [force, bolster, pentagon, strike, capabilitie...
2 [missiles, warheads, deterrent, face, continue...
EDIT:
For unique values is standard way use set
s:
df['Filtered_data'] = df['Filtered_data'].apply(lambda x: list(set([z for y in x for z in y.split()])))
print (df)
Filtered_data
0 [russia, factors, defence, driving, china, mod...
1 [capabilities, detailing, china, force, pentag...
2 [deterrent, advances, face, warheads, missiles...
But if ordering of values is important use pandas.unique
:
df['Filtered_data'] = df['Filtered_data'].apply(lambda x: pd.unique([z for y in x for z in y.split()]).tolist())
print (df)
Filtered_data
0 [defence, possessed, russia, china, factors, d...
1 [force, bolster, pentagon, strike, capabilitie...
2 [missiles, warheads, deterrent, face, continue...
Solution 2:
You can use itertools.chain
+ toolz.unique
. The benefit of toolz.unique
versus set
is it preserves ordering.
from itertools import chain
from toolz import unique
df = pd.DataFrame({'strings': [['defence possessed russia china','factors driving china modernise'],
['force bolster pentagon','strike capabilities pentagon congress detailing china'],
['missiles warheads', 'deterrent face continued advances']]})
df['words'] = df['strings'].apply(lambda x: list(unique(chain.from_iterable(i.split() for i in x))))
print(df.iloc[0]['words'])
['defence', 'possessed', 'russia', 'china', 'factors', 'driving', 'modernise']
Post a Comment for "Split List Elements Into Sub-elements In Pandas Dataframe"