Remove Specific Columns In Dataframe With Same Id On Date Condition
I have two datasets: One contains house energy certificates issued the last 10 years with an ID for the house and the date it was issued. One house could have more certificates is
Solution 1:
Extended your df wiht one more address transaction_id for better testing..and taken dataframe from excel you can modify that part as per your need..
input_df
transaction_id address_id official_date certificate issued_date
83866285 1157600091 5/25/2016 A2012-278940 17.12.2012 17:44:17
83866285 1157600091 5/25/2016 A2012-278941 17.12.2012 17:48:35
83866285 1157600091 5/25/2016 A2016-638538 22.02.2016 10:16:12
83866285 1157600091 5/25/2016 A2016-638577 22.02.2016 10:22:45
83866285 1157600091 5/25/2016 A2019-1065662 21.10.2019 15:39:30
83866286 1157600093 5/25/2019 A2012-278940 17.12.2012 17:44:17
83866286 1157600093 5/25/2019 A2012-278941 17.12.2012 17:48:35
83866286 1157600093 5/25/2019 A2016-638538 22.02.2016 10:16:12
83866286 1157600093 5/25/2019 A2016-638577 22.02.2016 10:22:45
83866286 1157600093 5/25/2019 A2019-1065662 21.11.2019 15:39:30
..
import pandas as pd
import numpy
import re
input_df = pd.read_excel('input.xlsx',sheet_name='Sheet1')
# convert columns in date time
input_df['issued_date'] = pd.to_datetime(input_df['issued_date'])
input_df['official_date'] = pd.to_datetime(input_df['official_date'])
# Add below column just for calculation
input_df['diff_days']= (input_df['issued_date']-input_df['official_date']).abs()
print(input_df)
# Filter the group of transaction_id
input_df=input_df.loc[input_df.groupby('transaction_id').diff_days.idxmin()]
# Now remove temp column
input_df = input_df.drop(['diff_days'], axis=1)
print(input_df)
Output -
3 83866285 1157600091 2016-05-25 A2016-638577 2016-02-22 10:22:45
9 83866286 1157600093 2019-05-25 A2019-1065662 2019-11-21 15:39:30
Post a Comment for "Remove Specific Columns In Dataframe With Same Id On Date Condition"