Finding Matching Keys In Two Large Dictionaries And Doing It Fast
Solution 1:
Use sets, because they have a built-in intersection
method which ought to be quick:
myRDP = { 'Actinobacter': 'GATCGA...TCA', 'subtilus sp.': 'ATCGATT...ACT' }
myNames = { 'Actinobacter': '8924342' }
rdpSet = set(myRDP)
namesSet = set(myNames)
for name in rdpSet.intersection(namesSet):
print name, myNames[name]
# Prints: Actinobacter 8924342
Solution 2:
You could do this:
forkeyin myRDP:
ifkeyin myNames:
print key, myNames[key]
Your first attempt was slow because you were comparing every key in myRDP with every key in myNames. In algorithmic jargon, if myRDP has n elements and myNames has m elements, then that algorithm would take O(n×m) operations. For 600k elements each, this is 360,000,000,000 comparisons!
But testing whether a particular element is a key of a dictionary is fast -- in fact, this is one of the defining characteristics of dictionaries. In algorithmic terms, the key in dict
test is O(1), or constant-time. So my algorithm will take O(n) time, which is one 600,000th of the time.
Solution 3:
in python 3 you can just do
myNames.keys() & myRDP.keys()
Solution 4:
forkeyin myRDP:
name = myNames.get(key, None)
if name:
print key, name
dict.get
returns the default value you give it (in this case, None
) if the key doesn't exist.
Solution 5:
You could start by finding the common keys and then iterating over them. Set operations should be fast because they are implemented in C, at least in modern versions of Python.
common_keys = set(myRDP).intersection(myNames)
forkeyin common_keys:
print key, myNames[key]
Post a Comment for "Finding Matching Keys In Two Large Dictionaries And Doing It Fast"