Using Numpy To Find Median Of Second Element Of List Of Tuples
Solution 1:
You could calculate the median like this:
np.median(dict(list).values())
# in Python 2.7; in Python 3.x it would be `np.median(list(dict(list_of_tuples).values()))`
That converts your list to a dictionary first and then calculates the median of its values.
When you want to get the actual key, you can do it like this:
dl = dict(list) #{'a': 1, 'b': 3, 'c': 5}
dl.keys()[dl.values().index(np.median(dl.values()))]
which will print 'b'
. That assumes that the median is in the list, if not a ValueError
will be thrown. You could therefore then use a try/except
like this using the example from @Anand S Kumar's answer:
import numpy as np
l = [('a',1), ('b',3), ('c',5), ('d',22),('e',11),('f',3)]
# l = [('a',1), ('b',3), ('c',5)]
dl = dict(l)
try:
print(dl.keys()[dl.values().index(np.median(dl.values()))])
except ValueError:
print('The median is not in this list. Its value is ',np.median(dl.values()))
print('The closest key is ', dl.keys()[min(dl.values(), key=lambda x:abs(x-np.median(dl.values())))])
For the first list you will then obtain:
The median is not in this list. Its value is 4.0
The closest key is f
for your example it just prints:
b
Solution 2:
np.median
does not accept any argument called key
. Instead you can use a list comprehension, to take just the second elements from the inner list. Example -
In [3]: l = [('a',1), ('b',3), ('c',5)]
In [4]: np.median([x[1] for x in l])
Out[4]: 3.0
In [5]: l = [('a',1), ('b',3), ('c',5), ('d',22),('e',11),('f',3)]
In [6]: np.median([x[1] for x in l])
Out[6]: 4.0
Also, if its not for example purpose, do not use list
as variable name, it shadows the builtin function list
.
Solution 3:
np.median
does not accept some sort of 'key' argument, and does not return the index of what it finds. Also, when there are an even number of items (along the axis), it returns the mean of the 2 center items.
But np.partition
, which median
uses to find the center items, does take structured array field name(s). So if we turn the list of tuples into a structured array, we can easily select the middle item(s).
The list:
In[1001]: llOut[1001]: [('a', 1), ('b', 3), ('c', 5)]
as structured array:
In [1002]: la1 = np.array(ll,dtype='a1,i')
In [1003]: la1
Out[1003]:
array([(b'a', 1), (b'b', 3), (b'c', 5)],
dtype=[('f0', 'S1'), ('f1', '<i4')])
we can get the middle item (1
for size 3) with:
In [1115]: np.partition(la1, (1), order='f1')[[1]]
Out[1115]:
array([(b'b', 3)],
dtype=[('f0', 'S1'), ('f1', '<i4')])
And allowing for even number of items (with code cribbed from np.median
):
def mymedian1(arr, field):
# return the middle items of arr, selected by field
sz = arr.shape[0] # 1d for nowif sz % 2 == 0:
ind = ((sz // 2)-1, sz // 2)else:
ind = ((sz - 1) // 2,)return np.partition(arr, ind, order=field)[list(ind)]
for the 3 item array:
In [1123]: mymedian1(la1,'f1')
Out[1123]:
array([(b'b', 3)],
dtype=[('f0', 'S1'), ('f1', '<i4')])
for a 6 item array:
In [1124]: la2
Out[1124]:
array([(b'a', 1), (b'b', 3), (b'c', 5), (b'd', 22), (b'e', 11), (b'f', 3)],
dtype=[('f0', 'S1'), ('f1', '<i4')])
In [1125]: mymedian1(la2,'f1')
Out[1125]:
array([(b'f', 3), (b'c', 5)],
dtype=[('f0', 'S1'), ('f1', '<i4')])
See my edit history for an earlier version using np.argpartition
.
It even works for the 1st field (the characters):
In [1132]: mymedian1(la2,'f0')
Out[1132]:
array([(b'c', 5), (b'd', 22)],
dtype=[('f0', 'S1'), ('f1', '<i4')])
Post a Comment for "Using Numpy To Find Median Of Second Element Of List Of Tuples"