Create a class called MedianTracker
. The purpose of the class is to take in new numbers, and to return the median number that you've seen so far. Write methods for initializing the class, adding a new number, and returning the median number you've seen so far.
The median is the middle number you've seen, in sorted order. If there are 2 middle numbers, then the median is the average of the two numbers.
Example:
To solve this problem, you need to find the median element many times.
To do this optimally, you should realize that each time you find the median, many of the numbers will be the same. For example, here are typical numbers that you might track over time:
Since many of the same numbers appear each time, you should be able to optimize by remembering something about your past computations.
To do this, we should use a Heap. You can store two Heaps: one that has the bigger half of all the numbers, and one that has the smaller half.
This lets you quickly find the middle numbers - they're just the root nodes of the heaps.
We'll make sure the top heap always gets the extra element, if there are an odd number of elements. This means the median is the root node of the top heap if there are an odd number of elements. It also means the median is the average of the 2 root nodes if there are an even number of elements. Here's the code for this:
We have to code the add
function. There are 2 cases we have to consider:
Case 1. You have an odd number of elements.
In this case, the top heap will start with an extra element (that's one of our assumptions).
We need to add num
to one of the heaps, and then rebalance the heaps so they're the same size again. Here are some examples you should consider to figure out how to do this:
We can generalize this with code, like this:
Case 2. You have an even number of elements.
In this case, both heaps will be the same size:
We need to make sure the top heap gets an extra number (by assumption). To do this, you need to consider examples:
Here's the code that generalizes this:
And if we start out with no elements, we should add the first element we see to the top heap. Here's the full code:
Add Function Time Complexity O(logn) for every number added. This is because the heaps contain O(n/2)=O(n) numbers, which means they both have a height of O(log n). So adding/removing from the heaps is O(logn). When we add a new number, we add/remove from the heaps something like 3 times worst-case, so the total time complexity is O(3logn). But remember, big O notation ignores factors of 3, so this is the same as O(logn).
Find Median Time Complexity O(1). The median comes directly from the 2 root nodes of the heaps, which we can access immediately.
Space Complexity O(n) to store the heap.
Other solutions:
You might think to store all the numbers you've seen in a Sorted Array so you can find the median by looking at the middle number(s). However, shift costs which would slow this down, yielding an O(n) Add complexity. LinkedLists are just as bad (you can prove that any ordered data structure in general comes with O(n) cost). You can turn the Sorted Array into a BST to truly get the optimal solution, but you need to make sure the BST is always balanced which is a hard problem on its own. This Heap solution is much simpler.