Huge Difference In Timing Between Sorting A Set Vs Sorting A List In Python
Solution 1:
I extended your code a bit and hope that this will give you insight into what is happening:
import numpy
import uuid
import random
import time
defsorter(x):
t1 = time.time()
for i inrange(10000):
sorted(x)
return time.time() - t1
defpr(name, x):
print('sorter {:<12s} {:<11} (length {:>4})'.format(
name, '{:.8}'.format(sorter(x)), len(x)))
a2sizes = []
b2sizes = []
for x inrange(1000):
two = numpy.random.randint(1, 1000, 1000)
a2 = list(two)
b2 = set(two)
a2sizes.append(len(a2))
b2sizes.append(len(b2))
print'average number of elements in a2', sum(a2sizes)/len(a2sizes)
n = sum(b2sizes)/len(b2sizes)
print'average number of elements in b2', n
this prints out:
average numberof elements in a2 1000
average numberof elements in b2 632
This is because of the collision in the random number range
printpr('a2', a2)
# making a list of set gives you already sorted elements
y = list(b2)
pr('y', y)
random.shuffle(y)
pr('shuffled y ', y)
pr('b2', b2)
gives as output:
sorter a2 2.492537 (length1000)
sorter b2 0.25028086 (length633)
sorter y0.19689608 (length633)
sorter shuffled y1.4935901 (length633)
That b2
would be faster because there are less elements is logical, but that this is even faster if you first make a list of the set must have some reason. That it again is slower if you shuffle that list is again logical and the shuffled result is rather close to the result for a2 when compensated for the length of the lists.
So lets try to put something else in the list:
b3 = set()
for x in range(1000):
b3.add(uuid.uuid4())
print'\nuuid elements', len(b3)
a3 = list(b3)
pr('a3', a3)
random.shuffle(a3)
pr('shuffled a3', a3)
pr('b3', b3)
gives (I would have been rather surprised to have less than 1000 elements):
uuidelements1000sortera332.437758(length1000)sortershuffleda332.178433(length1000)sorterb332.163802(length1000)
So it must have something to do with having numbers in the set:
previous = -1
ordered = Truefor popped in b2:
if popped < previous:
print'popped', popped, previous
ordered = False
previous = popped
print'\nOrdered', ordered
gives you:
Ordered True
Instead of iterating , a set has a pop()
function you can try and use:
pop()
Remove and return an arbitrary element from the set. Raises KeyError if the set is empty.
So lets arbitrarily retrieve elements from the set b2
and see if there is something special:
previous = -1
ordered = Truewhile(b2):
popped = b2.pop()
if popped < previous:
print'popped', popped, previous
ordered = False
previous = popped
print'\nOrdered', ordered
gives the same result:
Ordered True
So arbitrarily retrieving elements of set of number retrieves those numbers in order, independent off the ordering how these numbers were put in.
As the iteration is how the list-making retrieves an element at a time to append to the list, the result of list(b2)
is an ordered list and that gets sorted quite fast using the Timsort algorithm used in Python.
Post a Comment for "Huge Difference In Timing Between Sorting A Set Vs Sorting A List In Python"