Find Median from Data Stream
Problem Statement
Design a data structure that supports adding integers from a continuous stream and retrieving the median of all integers seen so far at any point.
The median is the middle value in a sorted list of numbers. If the list has an even number of elements, the median is the average of the two middle values.
For example:
[3, 5, 7] has median 5
[3, 5] has median (3 + 5) / 2 = 4.0
Implement the MedianFinder class:
MedianFinder() initializes the data structure.
void addNum(int num) adds the integer num to the data structure.
float findMedian() returns the median of all elements added so far. Answers within 10^-5 of the actual answer will be accepted.
Additional information
-100,000 <= num <= 100,000
findMedian will only be called after at least one integer has been added.
- At most
5 * 10^4 calls will be made to addNum and findMedian.
Example 1:
Input:
["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"]
[[], [5], [10], [], [3], []]
Output: [null, null, null, 7.5, null, 5.0]
Explanation:
addNum(5) -- stream is [5]
addNum(10) -- stream is [5, 10]
findMedian() -- sorted stream is [5, 10], median is (5 + 10) / 2 = 7.5
addNum(3) -- stream is [3, 5, 10]
findMedian() -- sorted stream is [3, 5, 10], median is 5.0
Example 2:
Input:
["MedianFinder", "addNum", "findMedian"]
[[], [42], []]
Output: [null, null, 42.0]
Brute Force (Sorted Insert)
Intuition
The most straightforward approach is to keep all numbers in a sorted array. When we need the median, we just look at the middle element(s). The problem is that inserting into a sorted array requires shifting elements, which takes O(n) time per insertion.
Algorithm
- Maintain a sorted list
data.
- On
addNum(num): use binary search to find the correct position and insert num.
- On
findMedian(): if the length is odd, return the middle element. If even, return the average of the two middle elements.
import bisect
class MedianFinder:
def __init__(self):
self.data = []
def addNum(self, num):
bisect.insort(self.data, num)
def findMedian(self):
n = len(self.data)
if n % 2 == 1:
return float(self.data[n // 2])
return (self.data[n // 2 - 1] + self.data[n // 2]) / 2.0
Time & Space Complexity
Time complexity: O(n) per addNum, O(1) per findMedian
Reason: Binary search finds the insertion point in O(log n), but shifting elements after insertion takes O(n).
Space complexity: O(n)
Reason: We store all n numbers in the array.
Two Heaps (Optimal)
Intuition
We can split the stream into two halves: a lower half and an upper half. The median always sits at the boundary between them.
A max-heap gives instant access to the largest element in the lower half. A min-heap gives instant access to the smallest element in the upper half. By keeping these two heaps balanced (sizes differ by at most 1), the median is always at one or both of their tops.
Algorithm
- Maintain two heaps:
small -- a max-heap storing the smaller half (Python uses negation to simulate max-heap with heapq).
large -- a min-heap storing the larger half.
- On
addNum(num):
- Push
num onto small (as negative for max-heap behavior).
- Pop the max from
small and push it onto large (ensures small's max <= large's min).
- If
large has more elements than small, pop from large and push onto small.
- On
findMedian():
- If
small has more elements, return -small[0] (the max of the lower half).
- Otherwise, return
(-small[0] + large[0]) / 2.0.
import heapq
class MedianFinder:
def __init__(self):
self.small = [] # max-heap (negated values)
self.large = [] # min-heap
def addNum(self, num):
heapq.heappush(self.small, -num)
heapq.heappush(self.large, -heapq.heappop(self.small))
if len(self.large) > len(self.small):
heapq.heappush(self.small, -heapq.heappop(self.large))
def findMedian(self):
if len(self.small) > len(self.large):
return float(-self.small[0])
return (-self.small[0] + self.large[0]) / 2.0
Time & Space Complexity
Time complexity: O(log n) per addNum, O(1) per findMedian
Reason: Each addNum performs at most 3 heap operations, each costing O(log n). findMedian just reads the tops of the heaps.
Space complexity: O(n)
Reason: Both heaps together store all n elements.