Skip to content Skip to sidebar Skip to footer

Pandas - Find Rows With Matching Values In Two Columns And Multiply Value In Another Column

First suppose we have a dataframe below: import pandas as pd data = pd.DataFrame({'id':['1','2','3','4','5','6','7','8'], 'A':['foo', 'bar', 'foo', 'bar','foo

Solution 1:

One way is to groupby A + C, take the product and count, filter out those that only have a single item in the group, then inner merge back on A + C to your original frame, eg:

df.merge(
    df.groupby(['A', 'C']).D.agg(['prod', 'count'])
    [lambda r: r['count'] > 1],
    left_on=['A', 'C'],
    right_index=True
)

Gives you:

     A   C  D  id  prod  count
0  foo  10  9   1    63      2
2  foo  10  7   3    63      2
4  foo  50  5   5    15      2
6  foo  50  3   7    15      2

Then drop/rename columns as appropriate.

Solution 2:

You can use self-join technique:

data[['id', 'C', 'D']] = data[['id', 'C', 'D']].apply(pd.to_numeric)
joint = pd.merge(data, data, on=('A', 'C'))
joint = joint.loc[join['id_x'] != join['id_y']]
joint['result'] = joint['D_x'] * joint['D_y']
result = joint[['id_x', 'A', 'result']]
result.columns = ['id', 'A', 'result']

Result:

id    A  result
1   1  foo      63
2   3  foo      63
7   5  foo      15
8   7  foo      15

Solution 3:

import pandas as pd
data = pd.DataFrame({'id':['1','2','3','4','5','6','7','8'], 
                     'A':['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],  
                     'C':['10','10','10','30','50','60','50','8'], 
                     'D':['9','8','7','6','5','4','3','2']})

First convert relevant columns to numeric

data[['C', 'D', 'id']] = data[['C', 'D', 'id']].apply(pd.to_numeric)

Create empty DataFrame to append to

finalDataFrame = pd.DataFrame()

groupby two columns, and then find product of column D within group and append it.

group = data.groupby(['A', 'C'])
for x, y in group:


    product = (y[["D"]].product(axis=0).values[0])


    for row in y.index:
        y.at[row, 'D'] = product

    finalDataFrame = finalDataFrame.append(y, ignore_index=True)

output = finalDataFrame[['id', 'A', 'D']]output = output.rename(columns = {'D': 'result'})
print(output)

gives you

id    A  result
0   2  bar       8
1   4  bar       6
2   6  bar       4
3   8  foo       2
4   1  foo      63
5   3  foo      63
6   5  foo      15
7   7  foo      15

Post a Comment for "Pandas - Find Rows With Matching Values In Two Columns And Multiply Value In Another Column"