Skip to content Skip to sidebar Skip to footer

Form Groups Of Individuals Python (pandas)

I have a data set of the following form: import pandas as pd d1 = {'Subject': ['Subject1','Subject1','Subject1','Subject2','Subject2','Subject2','Subject3','Subject3','Subject3','S

Solution 1:

Use:

from  itertools import combinations

d1['Category'] = d1['Category'].mask(d1['Category'] == '')

L = [(i[0], i[1], y[0], y[1]) for i, x in d1.groupby(['Event','Category'])['Subject'] 
                              for y inlist(combinations(x, 2))]
df = pd.DataFrame(L, columns=['Event','Category','Match1','Match2'])

df1 = (df.rename(columns={'Match1':'Subject'})
         .merge(d1, on=['Event','Category','Subject'], how='left')
         .iloc[:, 4:]
         .add_suffix('.1'))
df2 = (df.rename(columns={'Match2':'Subject'})
         .merge(d1, on=['Event','Category','Subject'], how='left')
         .iloc[:, 4:]
         .add_suffix('.2'))

fin = pd.concat([df, df1, df2], axis=1)

print (fin)
  Event Category    Match1    Match2 Variable1.1 Variable2.1 Variable3.1  \
0     1        1  Subject1  Subject4           1          12          -6   
1     1        2  Subject2  Subject3           4           9          -3   
2     2        1  Subject1  Subject2           2          11          -5   
3     2        1  Subject1  Subject4           2          11          -5   
4     2        1  Subject2  Subject4           5           8          -4   
5     3        2  Subject1  Subject2           3          10          -4   
6     3        2  Subject1  Subject3           3          10          -4   
7     3        2  Subject2  Subject3           6           7          -3   

  Variable1.2 Variable2.2 Variable3.2  
0          10           3           1  
1           7           6          -2  
2           5           8          -4  
3          11           2           2  
4          11           2           2  
5           6           7          -3  
6           9           4           0  
7           9           4           0  

Explanation:

  1. Replace empty strings to NaNs by mask- groupby siletly remove these rows
  2. Create DataFrame by list comprehension with flattening of all combinations of length 2 of column Subject by groups per columns Event and Category
  3. Double join variable columns by merge with left join, filter out first 4 columns by positions by iloc and add add_suffix or add_prefix for avoid duplicated columns names
  4. Last concat all 3 DataFrames together

Post a Comment for "Form Groups Of Individuals Python (pandas)"