Python Pandas: Replace Values Multiple Columns Matching Multiple Columns From Another Dataframe
I searched a lot for an answer, the closest question was Compare 2 columns of 2 different pandas dataframes, if the same insert 1 into the other in Python, but the answer to this p
Solution 1:
You can use the update
function (requires setting the matching criteria to index). I've modified your sample data to allow some mismatch.
# your data
# =====================
# df1 pos is modified from 10020 to 10010
print(df1)
chr snp x pos a1 a2
0 1 1-10020 0 10010 G A
1 1 1-10056 0 10056 C G
2 1 1-10108 0 10108 C G
3 1 1-10109 0 10109 C G
4 1 1-10139 0 10139 C T
print(df2)
ID CHR STOP OCHR OSTOP
0 rs376643643 1 10040 1 10020
1 rs373328635 1 10066 1 10056
2 rs62651026 1 10208 1 10108
3 rs376007522 1 10209 1 10109
4 rs368469931 3 30247 1 10139
# processing
# ==========================
# set matching columns to multi-level index
x1 = df1.set_index(['chr', 'pos'])['snp']
x2 = df2.set_index(['OCHR', 'OSTOP'])['ID']
# call update function, this is inplace
x1.update(x2)
# replace the values in original df1
df1['snp'] = x1.values
print(df1)
chr snp x pos a1 a2
0 1 1-10020 0 10010 G A
1 1 rs373328635 0 10056 C G
2 1 rs62651026 0 10108 C G
3 1 rs376007522 0 10109 C G
4 1 rs368469931 0 10139 C T
Solution 2:
Start by renaiming the columns you want to merge in df2
df2.rename(columns={'OCHR':'chr','OSTOP':'pos'},inplace=True)
Now merge on these columns
df_merged = pd.merge(df1, df2, how='inner', on=['chr', 'pos']) # you might have to preserve the df1 index at this stage, not sure
Next, you want to
updater = df_merged[['D','CHR','STOP']] #this will be your update frame
updater.rename( columns={'D':'snp','CHR':'chr','STOP':'pos'},inplace=True) # rename columns to update original
Finally update (see bottom of this link):
df1.update( df1_updater) #updates in place
# chr snp x pos a1 a2
#0 1 rs376643643 0 10040 G A
#1 1 rs373328635 0 10066 C G
#2 1 rs62651026 0 10208 C G
#3 1 rs376007522 0 10209 C G
#4 3 rs368469931 0 30247 C T
update works by matching index/column so you might have to string along the index of df1 for the entire process, then do df1_updater.re_index(...
before df1.update(df1_updater)
Post a Comment for "Python Pandas: Replace Values Multiple Columns Matching Multiple Columns From Another Dataframe"