Binary Heap: Min-Heap and Max-Heap Implementation
A binary heap is a complete binary tree that satisfies the heap property. 'Complete' means every level is fully filled except possibly the last, which fills left to right. The heap property defines...
Key Insights
- Binary heaps use array storage with simple index arithmetic (
parent = (i-1)/2,left = 2i+1,right = 2i+2) to represent a complete binary tree without pointer overhead. - The heap property (parent ≤ children for min-heap, parent ≥ children for max-heap) is maintained through two operations: sift-up after insertion and sift-down after extraction.
- Building a heap from an existing array takes O(n) time using Floyd’s bottom-up algorithm—not O(n log n) as naive insertion would suggest.
Introduction to Binary Heaps
A binary heap is a complete binary tree that satisfies the heap property. “Complete” means every level is fully filled except possibly the last, which fills left to right. The heap property defines the ordering: in a min-heap, every parent is smaller than or equal to its children; in a max-heap, every parent is larger than or equal to its children.
This structure gives you O(1) access to the minimum or maximum element and O(log n) insertion and extraction. That’s the sweet spot for priority queues, where you constantly need the highest-priority item.
You’ll find heaps everywhere: priority queues in task schedulers, Dijkstra’s shortest path algorithm, heap sort, finding the k largest elements in a stream, and median maintenance in streaming data. The heap is one of those fundamental structures that shows up repeatedly once you know to look for it.
Array-Based Representation
Here’s where heaps get elegant. Because a binary heap is always complete, you can store it in a contiguous array with zero wasted space. No node pointers, no memory fragmentation, excellent cache locality.
The index relationships are simple arithmetic:
- Parent of node at index
i:(i - 1) // 2 - Left child of node at index
i:2 * i + 1 - Right child of node at index
i:2 * i + 2
The root sits at index 0. Its children are at indices 1 and 2. Their children are at 3, 4, 5, 6. The pattern continues level by level.
class BinaryHeap:
def __init__(self):
self._data = []
def _parent(self, i: int) -> int:
return (i - 1) // 2
def _left_child(self, i: int) -> int:
return 2 * i + 1
def _right_child(self, i: int) -> int:
return 2 * i + 2
def _has_left_child(self, i: int) -> bool:
return self._left_child(i) < len(self._data)
def _has_right_child(self, i: int) -> bool:
return self._right_child(i) < len(self._data)
def _swap(self, i: int, j: int) -> None:
self._data[i], self._data[j] = self._data[j], self._data[i]
These helper methods form the foundation. Every heap operation builds on this index arithmetic.
Core Operations: Insert (Push)
Insertion follows a two-step process: append the new element to the end of the array (maintaining completeness), then restore the heap property by “sifting up.”
Sifting up means comparing the new element with its parent. If it violates the heap property, swap them and repeat with the new position. Continue until the element finds its correct spot or reaches the root.
class MinHeap(BinaryHeap):
def push(self, value) -> None:
self._data.append(value)
self._sift_up(len(self._data) - 1)
def _sift_up(self, i: int) -> None:
while i > 0:
parent = self._parent(i)
if self._data[i] < self._data[parent]:
self._swap(i, parent)
i = parent
else:
break
class MaxHeap(BinaryHeap):
def push(self, value) -> None:
self._data.append(value)
self._sift_up(len(self._data) - 1)
def _sift_up(self, i: int) -> None:
while i > 0:
parent = self._parent(i)
if self._data[i] > self._data[parent]:
self._swap(i, parent)
i = parent
else:
break
The only difference between min-heap and max-heap is the comparison operator. Time complexity is O(log n) because the tree height is logarithmic in the number of elements.
Core Operations: Extract (Pop)
Extraction removes and returns the root element. The challenge is maintaining both completeness and the heap property afterward.
The solution: move the last element to the root position (maintaining completeness), then sift it down to restore the heap property. Sifting down means comparing with children and swapping with the smaller child (min-heap) or larger child (max-heap) until the element settles.
class MinHeap(BinaryHeap):
# ... push and _sift_up from above ...
def pop(self) -> any:
if not self._data:
raise IndexError("pop from empty heap")
result = self._data[0]
last = self._data.pop()
if self._data:
self._data[0] = last
self._sift_down(0)
return result
def _sift_down(self, i: int) -> None:
while self._has_left_child(i):
smallest = self._left_child(i)
if (self._has_right_child(i) and
self._data[self._right_child(i)] < self._data[smallest]):
smallest = self._right_child(i)
if self._data[i] <= self._data[smallest]:
break
self._swap(i, smallest)
i = smallest
class MaxHeap(BinaryHeap):
# ... push and _sift_up from above ...
def pop(self) -> any:
if not self._data:
raise IndexError("pop from empty heap")
result = self._data[0]
last = self._data.pop()
if self._data:
self._data[0] = last
self._sift_down(0)
return result
def _sift_down(self, i: int) -> None:
while self._has_left_child(i):
largest = self._left_child(i)
if (self._has_right_child(i) and
self._data[self._right_child(i)] > self._data[largest]):
largest = self._right_child(i)
if self._data[i] >= self._data[largest]:
break
self._swap(i, largest)
i = largest
The edge case handling matters: when popping from a single-element heap, we remove it directly without sifting.
Building a Heap from an Array
Given an unsorted array, you could insert elements one by one for O(n log n) total time. But there’s a better way.
Floyd’s algorithm builds a heap in O(n) time by working bottom-up. The insight: leaf nodes (the bottom half of the array) already satisfy the heap property trivially. Start from the last non-leaf node and sift down each element.
class MinHeap(BinaryHeap):
# ... previous methods ...
@classmethod
def from_array(cls, arr: list) -> 'MinHeap':
heap = cls()
heap._data = arr.copy()
heap._heapify()
return heap
def _heapify(self) -> None:
# Start from last non-leaf node and work backwards
start = self._parent(len(self._data) - 1)
for i in range(start, -1, -1):
self._sift_down(i)
Why is this O(n) instead of O(n log n)? Most nodes are near the bottom where sift-down operations are cheap. The mathematical analysis shows the total work sums to O(n). This matters when you’re building heaps frequently or from large datasets.
Complete Implementation
Here’s a production-ready implementation with comparator support, allowing the same class to function as either a min-heap or max-heap:
from typing import TypeVar, Generic, Callable, Optional, List
T = TypeVar('T')
class Heap(Generic[T]):
def __init__(self, comparator: Optional[Callable[[T, T], bool]] = None):
"""
Create a heap with custom ordering.
comparator(a, b) should return True if a should be above b in the heap.
Default is min-heap behavior (smaller values on top).
"""
self._data: List[T] = []
self._compare = comparator or (lambda a, b: a < b)
def _parent(self, i: int) -> int:
return (i - 1) // 2
def _left_child(self, i: int) -> int:
return 2 * i + 1
def _right_child(self, i: int) -> int:
return 2 * i + 2
def _has_left_child(self, i: int) -> bool:
return self._left_child(i) < len(self._data)
def _has_right_child(self, i: int) -> bool:
return self._right_child(i) < len(self._data)
def _swap(self, i: int, j: int) -> None:
self._data[i], self._data[j] = self._data[j], self._data[i]
def _sift_up(self, i: int) -> None:
while i > 0:
parent = self._parent(i)
if self._compare(self._data[i], self._data[parent]):
self._swap(i, parent)
i = parent
else:
break
def _sift_down(self, i: int) -> None:
while self._has_left_child(i):
target = self._left_child(i)
if (self._has_right_child(i) and
self._compare(self._data[self._right_child(i)], self._data[target])):
target = self._right_child(i)
if not self._compare(self._data[target], self._data[i]):
break
self._swap(i, target)
i = target
def push(self, value: T) -> None:
self._data.append(value)
self._sift_up(len(self._data) - 1)
def pop(self) -> T:
if not self._data:
raise IndexError("pop from empty heap")
result = self._data[0]
last = self._data.pop()
if self._data:
self._data[0] = last
self._sift_down(0)
return result
def peek(self) -> T:
if not self._data:
raise IndexError("peek at empty heap")
return self._data[0]
def __len__(self) -> int:
return len(self._data)
def __bool__(self) -> bool:
return bool(self._data)
@classmethod
def from_array(cls, arr: List[T],
comparator: Optional[Callable[[T, T], bool]] = None) -> 'Heap[T]':
heap = cls(comparator)
heap._data = arr.copy()
if heap._data:
start = heap._parent(len(heap._data) - 1)
for i in range(start, -1, -1):
heap._sift_down(i)
return heap
# Usage examples
min_heap = Heap[int]() # Default min-heap
max_heap = Heap[int](lambda a, b: a > b) # Max-heap
# Priority queue with tuples (priority, value)
task_queue = Heap[tuple](lambda a, b: a[0] < b[0])
task_queue.push((1, "high priority"))
task_queue.push((5, "low priority"))
task_queue.push((3, "medium priority"))
while task_queue:
print(task_queue.pop()) # Processes in priority order
Performance Analysis and Applications
| Operation | Time Complexity | Space Complexity |
|---|---|---|
| Push | O(log n) | O(1) |
| Pop | O(log n) | O(1) |
| Peek | O(1) | O(1) |
| Build from array | O(n) | O(1) if in-place |
| Search | O(n) | O(1) |
Heaps excel when you need repeated access to the extreme element. They’re the wrong choice when you need arbitrary element access, searching, or sorted traversal—use a balanced BST or sorted array instead.
Heap sort uses a max-heap to sort in-place with O(n log n) time and O(1) space. Build the heap, then repeatedly extract the maximum and place it at the end.
Dijkstra’s algorithm uses a min-heap as a priority queue to always process the closest unvisited vertex. The heap operations dominate the complexity at O((V + E) log V).
Finding k largest elements in a stream: maintain a min-heap of size k. For each new element, if it’s larger than the heap minimum, replace the minimum. The heap always contains the k largest seen so far.
Median maintenance: keep a max-heap for the lower half and a min-heap for the upper half. Balance their sizes after each insertion. The median is always accessible from one or both heap roots.
Binary heaps are simple, cache-friendly, and exactly right for priority queue workloads. Understand the sift-up and sift-down operations, and you understand heaps.