Skip to content Skip to sidebar Skip to footer

How To Find Repeated Patients And Add A New Column

I am dealing with a large medical dataset. Now I want to add a column that represent the readmission, that is, if a patient has had surgery at most 6 months ago, then that column '

Solution 1:

Here is an edited answer to the question

import pandas as pd
import datetime as dt
import numpy as np

# Your data plus a new patient that comes often                                                                                                                                                                    
data = {'Patient_ID':[12,1352,55,1352,12,6,1352,100,100,100,100] ,
        'Surgery_Date': ['25/01/2009', '28/01/2009','29/01/2009','12/12/2008','23/02/2008','2/02/2009','12/01/2009','01/01/2009','01/02/2009','01/01/2010','01/02/2010']}

df = pd.DataFrame(data,columns = ['Patient_ID','Surgery_Date'])
readmissions = pd.Series(np.zeros(len(df),dtype=int),index=df.index))

# Loop through all unique ids                                                                                                                                                                                      
all_id = df['Patient_ID'].unique()
id_admissions = {}
for pid in all_id:
    # These are all the times a patient with a given ID has had surgery                                                                                                                                            
    patient = df.loc[df['Patient_ID']==pid]
    admissions_sorted = pd.to_datetime(patient['Surgery_Date'], format='%d/%m/%Y').sort_values()

    # This checks if the previous surgery was longer than 180 days ago                                                                                                                                              
    frequency = admissions_sorted.diff()<dt.timedelta(days=180)

    # Compute the readmission                                                                                                                                                                                      
    n_admissions = [0]
    for v in frequency.values[1:]:
       n_admissions.append((n_admissions[-1]+1)*v)

    # Add these value to the time series                                                                                                                                                                           
    readmissions.loc[admissions_sorted.index] = n_admissions


df['Readmission'] = readmissions

This returns

Patient_IDSurgery_DateReadmission01225/01/2009011352   28/01/2009225529/01/2009031352   12/12/2008041223/02/20080562/02/2009061352   12/01/20091710001/01/20090810001/02/20091910001/01/201001010001/02/20101

Hope this helps ! This is probably not very python-esque or pandas-esque, but it should work as inteded. I am convinced this could be made much more efficient and readable.

Post a Comment for "How To Find Repeated Patients And Add A New Column"