Skip to content Skip to sidebar Skip to footer

Average Entries With Duplicate First Element In 2d Numpy Array

I have an array that looks like this arr = np.array([[0, 1], [0, 2], [1, 3], [1, 3], [1, 4], [2, 3]]) and I would like to take the average of the 'entries' that have the same fi

Solution 1:

Here's a NumPythonic solution using np.unique and np.bincount for a generic case when the first column is not always sorted -

unqa,ID,counts = np.unique(arr[:,0],return_inverse=True,return_counts=True)
out= np.column_stack(( unqa , np.bincount(ID,arr[:,1])/counts ))

Sample run -

In [4]: arr
Out[4]: 
array([[5, 1],
       [5, 2],
       [1, 3],
       [1, 3],
       [5, 4],
       [2, 3]])

In [5]: unqa,ID,counts = np.unique(arr[:,0],return_inverse=True,return_counts=True)
   ...: out = np.column_stack(( unqa , np.bincount(ID,arr[:,1])/counts ))
   ...: 

In [6]: out
Out[6]: 
array([[ 1.        ,  3.        ],
       [ 2.        ,  3.        ],
       [ 5.        ,  2.33333333]])

Solution 2:

You can use a dictionary to grouping your items them use np.mean() within a list comprehension to get the expected result:

>>>for i,j in arr:...   d.setdefault(i,[]).append(j)...>>>d
{0: [1, 2], 1: [3, 3, 4], 2: [3]}
>>>>>>[[i,np.mean(j)] for i,j in d.items()]
[[0, 1.5], [1, 3.3333333333333335], [2, 3.0]]

Or if you want the data in a rounded mode:

>>> [[i,round(np.mean(j),2)] for i,j in d.items()]
[[0, 1.5], [1, 3.33], [2, 3.0]]

Post a Comment for "Average Entries With Duplicate First Element In 2d Numpy Array"