Skip to content Skip to sidebar Skip to footer

Pandas Join Vs Add Column

I have 2 dataframes (df1 and df2) with the same MultiIndex. df1 has column A, df2 has column B. I found 2 ways of 'joining' these dataframes: df_joined = df1.join(df2, how='inner')

Solution 1:

There is no faster way than df1['B'] = df2['B'] if indices are aligned.

Assigning a series to another series is already well optimised in pandas.

join takes longer than assignment as it explicitly lines up df1.index and df2.index, which is expensive. It is not assumed that indices are in consistent order. As per pd.DataFrame.join documentation, if no column is specified the join will take place on the dataframes' respective indices.

I would be surprised if you find this is a bottleneck in your workflow. If it is, then I suggest you drop down to numpy arrays and avoid pandas altogether.

Post a Comment for "Pandas Join Vs Add Column"