Skip to content Skip to sidebar Skip to footer

Using Numpy To Find Median Of Second Element Of List Of Tuples

Let's say I have a list of tuples, as follows: list = [(a,1), (b,3), (c,5)] My goal is to obtain the first element of the median of the list of tuples, using the tuples' second el

Solution 1:

You could calculate the median like this:

np.median(dict(list).values()) 
# in Python 2.7; in Python 3.x it would be `np.median(list(dict(list_of_tuples).values()))`

That converts your list to a dictionary first and then calculates the median of its values.

When you want to get the actual key, you can do it like this:

dl = dict(list) #{'a': 1, 'b': 3, 'c': 5}

dl.keys()[dl.values().index(np.median(dl.values()))]

which will print 'b'. That assumes that the median is in the list, if not a ValueError will be thrown. You could therefore then use a try/except like this using the example from @Anand S Kumar's answer:

import numpy as np

l = [('a',1), ('b',3), ('c',5), ('d',22),('e',11),('f',3)]

# l = [('a',1), ('b',3), ('c',5)]

dl = dict(l)
try:
    print(dl.keys()[dl.values().index(np.median(dl.values()))])
except ValueError:
    print('The median is not in this list. Its value is ',np.median(dl.values()))
    print('The closest key is ', dl.keys()[min(dl.values(), key=lambda x:abs(x-np.median(dl.values())))])

For the first list you will then obtain:

The median is not in this list. Its value is 4.0

The closest key is f

for your example it just prints:

b

Solution 2:

np.median does not accept any argument called key . Instead you can use a list comprehension, to take just the second elements from the inner list. Example -

In [3]: l = [('a',1), ('b',3), ('c',5)]

In [4]: np.median([x[1] for x in l])
Out[4]: 3.0

In [5]: l = [('a',1), ('b',3), ('c',5), ('d',22),('e',11),('f',3)]

In [6]: np.median([x[1] for x in l])
Out[6]: 4.0

Also, if its not for example purpose, do not use list as variable name, it shadows the builtin function list .

Solution 3:

np.median does not accept some sort of 'key' argument, and does not return the index of what it finds. Also, when there are an even number of items (along the axis), it returns the mean of the 2 center items.

But np.partition, which median uses to find the center items, does take structured array field name(s). So if we turn the list of tuples into a structured array, we can easily select the middle item(s).

The list:

In[1001]: llOut[1001]: [('a', 1), ('b', 3), ('c', 5)]

as structured array:

In [1002]: la1 = np.array(ll,dtype='a1,i')
In [1003]: la1
Out[1003]: 
array([(b'a', 1), (b'b', 3), (b'c', 5)], 
     dtype=[('f0', 'S1'), ('f1', '<i4')])

we can get the middle item (1 for size 3) with:

In [1115]: np.partition(la1, (1), order='f1')[[1]]
Out[1115]: 
array([(b'b', 3)], 
      dtype=[('f0', 'S1'), ('f1', '<i4')])

And allowing for even number of items (with code cribbed from np.median):

def mymedian1(arr, field):
    # return the middle items of arr, selected by field
    sz = arr.shape[0]  # 1d for nowif sz % 2 == 0:
        ind = ((sz // 2)-1, sz // 2)else:
        ind = ((sz - 1) // 2,)return np.partition(arr, ind, order=field)[list(ind)]

for the 3 item array:

In [1123]: mymedian1(la1,'f1')
Out[1123]: 
array([(b'b', 3)], 
      dtype=[('f0', 'S1'), ('f1', '<i4')])

for a 6 item array:

In [1124]: la2
Out[1124]: 
array([(b'a', 1), (b'b', 3), (b'c', 5), (b'd', 22), (b'e', 11), (b'f', 3)], 
      dtype=[('f0', 'S1'), ('f1', '<i4')])

In [1125]: mymedian1(la2,'f1')
Out[1125]: 
array([(b'f', 3), (b'c', 5)], 
      dtype=[('f0', 'S1'), ('f1', '<i4')])

See my edit history for an earlier version using np.argpartition.


It even works for the 1st field (the characters):

In [1132]: mymedian1(la2,'f0')
Out[1132]: 
array([(b'c', 5), (b'd', 22)], 
      dtype=[('f0', 'S1'), ('f1', '<i4')])

Post a Comment for "Using Numpy To Find Median Of Second Element Of List Of Tuples"