How To Count The Presence Of A Set Of Numbers In A Set Of Intervals Efficiently
Solution 1:
Put your integers, start points, and end points in a single list of pairs. Make the first element of each pair the value of the integer, start point, or end point, and the second element of each pair be 0, -1, or 1 depending on whether it's an integer, start point, or end point.
Next, sort the list.
Now, you can go through the list, maintaining a running sum of the second elements of the pairs. When you see a pair with second element 0, record the running sum (negated) for that integer.
This runs in O((N+M)log(N+M)) time in the worst case (and in practice I guess it'll be linear if the queries and intervals are mostly sorted, thanks to timsort).
For example:
Input intervals: [(1, 3), (5, 6), (6, 9)]
Input integers: [2, 4, 6, 8]
Unifiedlist(sorted):
[(1,-1), (2,0), (3,1), (4,0), (5,-1), (6, -1), (6,0), (6,1), (8,0), (9,1)]
Running sum:
[-1 , -1, 0, 0, -1, -2, 0, -1, -1, 0]
Values for integers:2:1,4:0,6:2,8,1
Example code:
defquery(qs, intervals):
xs = [(q, 0) for q in qs] + [(x, -1) for x, _ in intervals] + [(x, 1) for _, x in intervals]
S, r = 0, dict()
for v, s insorted(xs):
if s == 0:
r[v] = S
S -= s
return r
intervals = [(3, 3), (22, 30), (17, 29), (7, 12), (12, 34), (18, 38), (30, 40), (5, 27), (19, 26), (27, 27), (1, 31), (17, 17), (22, 25), (6, 14), (5, 7), (9, 19), (24, 28), (19, 40), (9, 36), (2, 32)]
queries = [16, 18, 39, 40, 27, 28, 4, 23, 15, 24, 2, 6, 32, 17, 21, 29, 31, 7, 20, 10]
print(query(queries, intervals))
Output:
{2:2, 4:2, 6:5, 7:6, 10:7, 15:6, 16:6, 17:8, 18:8, 20:9, 21:9, 23:11, 24:12, 27:11, 28:9, 29:8, 31:7, 32:6, 39:2, 40:2}
Solution 2:
You can presort the integers
and then use bisect_left
on the lower bound. Sorting has O(M*log(M)) complexity while bisect has O(log(M)). So effectively you have O(max(M, N) * log(M)).
import bisect
from collections import defaultdict
result = defaultdict(int)
integers = sorted(integers)
for low, high in intervals:
index = bisect.bisect_left(integers, low)
whileindex < len(integers) and integers[index] <= high:
result[integers[index]] += 1index += 1
Solution 3:
Depending on the use case and context, something simple may be adequate:
from collections import Counter
from itertools import chain
counts = Counter(chain.from_iterable(range(f, t+1) for f,t in input_intervals))
result = {k:counts[k] for k in input_numbers}
O(n*k + m) where n
is the number of intervals, k
is the average size of an interval, and m
is the number of integers.
Post a Comment for "How To Count The Presence Of A Set Of Numbers In A Set Of Intervals Efficiently"