Pandas Join Vs Add Column
I have 2 dataframes (df1 and df2) with the same MultiIndex. df1 has column A, df2 has column B. I found 2 ways of 'joining' these dataframes: df_joined = df1.join(df2, how='inner')
Solution 1:
There is no faster way than df1['B'] = df2['B']
if indices are aligned.
Assigning a series to another series is already well optimised in pandas
.
join
takes longer than assignment as it explicitly lines up df1.index
and df2.index
, which is expensive. It is not assumed that indices are in consistent order. As per pd.DataFrame.join documentation, if no column is specified the join
will take place on the dataframes' respective indices.
I would be surprised if you find this is a bottleneck in your workflow. If it is, then I suggest you drop down to numpy
arrays and avoid pandas
altogether.
Post a Comment for "Pandas Join Vs Add Column"