Priority Queue: Binary Heap Implementation

Key Insights

Binary heaps provide O(log n) insertion and extraction with minimal overhead, making them the default choice for priority queue implementations in production systems.
The array-based representation eliminates pointer overhead and leverages CPU cache locality—a simple index formula replaces explicit parent-child references.
Building a heap from an unsorted array takes O(n) time using bottom-up heapification, not O(n log n)—this counterintuitive result matters for batch processing scenarios.

Introduction to Priority Queues

A priority queue is an abstract data type where each element has an associated priority, and elements are served based on priority rather than insertion order. Unlike a standard queue’s FIFO behavior, a priority queue always yields the highest (or lowest) priority element first.

You’ll find priority queues everywhere in systems programming. Operating system schedulers use them to manage process priorities. Dijkstra’s shortest path algorithm relies on them to always process the closest unvisited node. Event-driven simulations process events in chronological order using priority queues. Huffman coding builds optimal prefix trees by repeatedly combining the two lowest-frequency nodes.

While you could implement a priority queue with a sorted array (O(1) extraction, O(n) insertion) or an unsorted array (O(1) insertion, O(n) extraction), binary heaps hit the sweet spot: O(log n) for both operations with minimal constant factors. That’s why every major standard library—Python’s heapq, Java’s PriorityQueue, C++’s priority_queue—uses binary heaps under the hood.

Binary Heap Fundamentals

A binary heap is a complete binary tree that satisfies the heap property. In a min-heap, every parent node is less than or equal to its children. In a max-heap, every parent is greater than or equal to its children. This property applies recursively, guaranteeing the minimum (or maximum) element sits at the root.

The “complete binary tree” constraint means all levels are fully filled except possibly the last, which fills left-to-right. This structure enables a brilliant optimization: we can represent the entire tree as a contiguous array with no pointers.

The mapping is straightforward. For a node at index i (using zero-based indexing):

def parent(i):
    return (i - 1) // 2

def left_child(i):
    return 2 * i + 1

def right_child(i):
    return 2 * i + 2

The root lives at index 0. Its children occupy indices 1 and 2. Their children occupy 3, 4, 5, and 6. The pattern continues level by level. This layout eliminates pointer overhead and keeps related nodes close in memory, improving cache performance significantly.

For a heap with n elements, the tree height is ⌊log₂ n⌋. This logarithmic height bounds the cost of maintaining the heap property during modifications.

Core Operations: Insert (Enqueue)

Insertion follows a two-step process: append the new element to the end of the array (maintaining completeness), then restore the heap property by “bubbling up.”

The bubble-up operation compares the inserted element with its parent. If the heap property is violated, swap them and repeat with the new parent. Continue until the element reaches its correct position or becomes the root.

class MinHeap:
    def __init__(self):
        self.heap = []
    
    def insert(self, value):
        self.heap.append(value)
        self._bubble_up(len(self.heap) - 1)
    
    def _bubble_up(self, index):
        while index > 0:
            parent_idx = (index - 1) // 2
            if self.heap[index] < self.heap[parent_idx]:
                self.heap[index], self.heap[parent_idx] = (
                    self.heap[parent_idx], self.heap[index]
                )
                index = parent_idx
            else:
                break

The worst case occurs when inserting a new minimum—it bubbles all the way to the root, traversing the tree’s height. This gives O(log n) time complexity. The average case is actually O(1) for random insertions, since most elements settle near the bottom of the tree, but we typically quote the worst case.

Core Operations: Extract (Dequeue)

Extraction removes and returns the root element (the minimum in a min-heap). The challenge is maintaining both the heap property and completeness after removal.

The algorithm swaps the root with the last element, removes the last element (now containing the old minimum), then restores the heap property by “bubbling down” the new root.

Bubbling down compares a node with both children and swaps with the smaller child if the heap property is violated. This continues until the element settles into a valid position or reaches a leaf.

def extract_min(self):
    if not self.heap:
        raise IndexError("Extract from empty heap")
    
    if len(self.heap) == 1:
        return self.heap.pop()
    
    min_value = self.heap[0]
    self.heap[0] = self.heap.pop()  # Move last element to root
    self._bubble_down(0)
    return min_value

def _bubble_down(self, index):
    size = len(self.heap)
    
    while True:
        smallest = index
        left = 2 * index + 1
        right = 2 * index + 2
        
        if left < size and self.heap[left] < self.heap[smallest]:
            smallest = left
        if right < size and self.heap[right] < self.heap[smallest]:
            smallest = right
        
        if smallest == index:
            break
        
        self.heap[index], self.heap[smallest] = (
            self.heap[smallest], self.heap[index]
        )
        index = smallest

Like insertion, extraction is O(log n) in the worst case—the swapped element might bubble all the way down to a leaf. Unlike insertion, there’s no favorable average case; extraction consistently requires logarithmic work.

Building a Heap from an Array

Given an unsorted array, how do you transform it into a valid heap? The naive approach inserts elements one by one, yielding O(n log n) total time. But there’s a better way.

Bottom-up heapification starts from the last non-leaf node and applies bubble-down to each node moving toward the root. Since leaves (roughly half the nodes) are already valid single-element heaps, we skip them entirely.

def build_heap(self, array):
    self.heap = array[:]  # Copy the array
    
    # Start from last non-leaf node
    start = len(self.heap) // 2 - 1
    
    for i in range(start, -1, -1):
        self._bubble_down(i)

Why is this O(n) instead of O(n log n)? The key insight is that most nodes are near the bottom of the tree. Nodes at the bottom level do zero work. Nodes one level up do at most one swap. Only the root does O(log n) work.

Summing across all levels: roughly n/2 nodes do 0 work, n/4 nodes do 1 swap, n/8 nodes do 2 swaps, and so on. This geometric series converges to O(n). The formal proof uses the fact that Σ(k/2^k) for k from 0 to infinity equals 2.

This matters in practice. If you’re building a priority queue from a known dataset, use build_heap rather than repeated insertions.

Complete Implementation

Here’s a production-ready priority queue supporting custom comparison:

class PriorityQueue:
    def __init__(self, items=None, key=None, reverse=False):
        """
        Initialize priority queue.
        
        Args:
            items: Optional initial items
            key: Function to extract comparison key (like sorted())
            reverse: If True, creates a max-heap instead of min-heap
        """
        self._heap = []
        self._key = key or (lambda x: x)
        self._multiplier = -1 if reverse else 1
        
        if items:
            self._heap = list(items)
            self._heapify()
    
    def _compare(self, i, j):
        """Returns True if element at i should be higher priority than j."""
        key_i = self._multiplier * self._key(self._heap[i])
        key_j = self._multiplier * self._key(self._heap[j])
        return key_i < key_j
    
    def _bubble_up(self, index):
        while index > 0:
            parent = (index - 1) // 2
            if self._compare(index, parent):
                self._heap[index], self._heap[parent] = (
                    self._heap[parent], self._heap[index]
                )
                index = parent
            else:
                break
    
    def _bubble_down(self, index):
        size = len(self._heap)
        while True:
            priority = index
            left, right = 2 * index + 1, 2 * index + 2
            
            if left < size and self._compare(left, priority):
                priority = left
            if right < size and self._compare(right, priority):
                priority = right
            
            if priority == index:
                break
            
            self._heap[index], self._heap[priority] = (
                self._heap[priority], self._heap[index]
            )
            index = priority
    
    def _heapify(self):
        for i in range(len(self._heap) // 2 - 1, -1, -1):
            self._bubble_down(i)
    
    def push(self, item):
        self._heap.append(item)
        self._bubble_up(len(self._heap) - 1)
    
    def pop(self):
        if not self._heap:
            raise IndexError("Pop from empty priority queue")
        if len(self._heap) == 1:
            return self._heap.pop()
        
        result = self._heap[0]
        self._heap[0] = self._heap.pop()
        self._bubble_down(0)
        return result
    
    def peek(self):
        if not self._heap:
            raise IndexError("Peek at empty priority queue")
        return self._heap[0]
    
    def __len__(self):
        return len(self._heap)
    
    def __bool__(self):
        return bool(self._heap)

Usage examples:

# Basic min-heap
pq = PriorityQueue([5, 3, 8, 1, 2])
while pq:
    print(pq.pop())  # 1, 2, 3, 5, 8

# Max-heap
pq = PriorityQueue([5, 3, 8, 1, 2], reverse=True)
print(pq.pop())  # 8

# Custom objects with key function
tasks = [("email", 2), ("deploy", 1), ("meeting", 3)]
pq = PriorityQueue(tasks, key=lambda x: x[1])
print(pq.pop())  # ("deploy", 1)

Performance Summary and When to Use

Operation	Time Complexity	Space Complexity
Insert	O(log n)	O(1)
Extract	O(log n)	O(1)
Peek	O(1)	O(1)
Build Heap	O(n)	O(1) if in-place
Search	O(n)	O(1)

Binary heaps are the right choice for most priority queue needs. They’re simple to implement, have excellent cache locality, and provide consistent logarithmic performance.

Consider alternatives only in specific scenarios. Fibonacci heaps offer O(1) amortized insertion and decrease-key operations, making them theoretically superior for Dijkstra’s algorithm on dense graphs—but their constant factors and complexity rarely justify the overhead in practice. Balanced BSTs like red-black trees support O(log n) arbitrary deletion and search, useful when you need to remove elements by value rather than priority.

For 95% of use cases, stick with the binary heap. It’s what the standard libraries use, and it’s what you should reach for first.