Hopcroft-Karp Algorithm: Maximum Bipartite Matching

Key Insights

Hopcroft-Karp achieves O(E√V) complexity by finding multiple augmenting paths simultaneously through BFS layering, making it significantly faster than naive O(VE) approaches for large bipartite graphs.
The algorithm’s power comes from batching: one BFS builds a level graph, then multiple DFS calls find vertex-disjoint augmenting paths of the same shortest length in a single phase.
Real-world applications span job assignments, resource allocation, network routing, and any scenario where you need to optimally pair elements from two distinct groups.

Introduction to Bipartite Matching

A bipartite graph consists of two disjoint vertex sets where edges only connect vertices from different sets. Think of it as two groups—employees and tasks, students and projects, or users and servers—where connections only exist between groups, never within them.

A matching is a subset of edges where no two edges share a vertex. A maximum matching contains the largest possible number of edges. This matters because it answers questions like: “How many employees can we assign to tasks if each employee handles exactly one task?”

Naive approaches iterate through augmenting paths one at a time, yielding O(VE) complexity. For graphs with thousands of vertices and millions of edges, this becomes prohibitively slow. Hopcroft-Karp fixes this by finding multiple augmenting paths per iteration.

Prerequisites: Augmenting Paths and BFS/DFS Review

Before diving into Hopcroft-Karp, you need to understand augmenting paths.

An alternating path starts from an unmatched vertex and alternates between unmatched and matched edges. An augmenting path is an alternating path that ends at another unmatched vertex. When you find an augmenting path, you can “flip” the edges—matched becomes unmatched and vice versa—increasing the matching size by one.

Here’s a simple bipartite graph representation:

class BipartiteGraph:
    def __init__(self, left_size: int, right_size: int):
        self.left_size = left_size
        self.right_size = right_size
        # Adjacency list: left vertices are 0..left_size-1
        # Right vertices are left_size..left_size+right_size-1
        self.adj = [[] for _ in range(left_size)]
    
    def add_edge(self, left: int, right: int):
        """Add edge from left vertex to right vertex."""
        self.adj[left].append(right)
    
    def neighbors(self, left: int) -> list:
        """Get all right vertices connected to a left vertex."""
        return self.adj[left]

BFS explores vertices level by level, which Hopcroft-Karp exploits to find shortest augmenting paths. DFS then traces these paths efficiently.

How Hopcroft-Karp Works

The algorithm’s brilliance lies in processing augmenting paths in batches rather than one at a time.

Phase 1 (BFS): Starting from all unmatched left vertices simultaneously, BFS builds a level graph. Each vertex gets assigned a level based on its distance from the nearest unmatched left vertex. The BFS stops when it reaches unmatched right vertices—these represent the endpoints of shortest augmenting paths.

Phase 2 (DFS): Multiple DFS calls find vertex-disjoint augmenting paths through the level graph. “Vertex-disjoint” means no two paths share a vertex. Each successful DFS increases the matching size by one.

The key insight: all shortest augmenting paths have the same length. By finding all of them in one phase, we guarantee that the next phase’s shortest paths are strictly longer. Since path lengths are bounded by V, we need at most O(√V) phases.

def bfs(self) -> bool:
    """Build level graph and check if augmenting paths exist."""
    from collections import deque
    
    queue = deque()
    
    # Initialize: unmatched left vertices start at level 0
    for u in range(self.left_size):
        if self.match_left[u] == self.NIL:
            self.level[u] = 0
            queue.append(u)
        else:
            self.level[u] = self.INF
    
    self.level[self.NIL] = self.INF
    
    while queue:
        u = queue.popleft()
        if self.level[u] < self.level[self.NIL]:
            for v in self.adj[u]:
                # v is a right vertex, match_right[v] is its matched left vertex
                next_u = self.match_right[v]
                if self.level[next_u] == self.INF:
                    self.level[next_u] = self.level[u] + 1
                    queue.append(next_u)
    
    # Return True if we found at least one augmenting path
    return self.level[self.NIL] != self.INF

Time Complexity Analysis

Hopcroft-Karp runs in O(E√V) time. Here’s the intuition:

Each phase takes O(E) time—BFS visits each edge once, and the total DFS work across all paths is bounded by E. The number of phases is O(√V) because:

After k phases, the shortest augmenting path has length at least 2k+1.
If the current matching has size m and the maximum is m*, there are m*-m vertex-disjoint augmenting paths.
These paths use at least (2k+1)(m*-m) edges, bounded by V.
After √V phases, either m*-m < √V (few paths left) or paths are very long.

Compare this to naive augmenting path algorithms that run in O(VE)—for a graph with 10,000 vertices and 1,000,000 edges, that’s the difference between 100 million and 10 billion operations.

When should you use alternatives? The Hungarian algorithm handles weighted matching in O(V³). Ford-Fulkerson generalizes to non-bipartite graphs but runs slower on bipartite ones.

Implementation Walkthrough

Here’s a complete, production-ready implementation:

from collections import deque
from typing import List, Tuple

class HopcroftKarp:
    def __init__(self, left_size: int, right_size: int):
        self.left_size = left_size
        self.right_size = right_size
        self.adj: List[List[int]] = [[] for _ in range(left_size)]
        
        # NIL is a sentinel representing "unmatched"
        self.NIL = -1
        self.INF = float('inf')
        
        # match_left[u] = right vertex matched to left vertex u (or NIL)
        # match_right[v] = left vertex matched to right vertex v (or NIL)
        self.match_left = [self.NIL] * left_size
        self.match_right = [self.NIL] * right_size
        
        # Level of each left vertex in the BFS tree
        self.level = [0] * (left_size + 1)
    
    def add_edge(self, left: int, right: int):
        """Add edge between left vertex and right vertex."""
        self.adj[left].append(right)
    
    def bfs(self) -> bool:
        """BFS phase: build level graph from unmatched left vertices."""
        queue = deque()
        
        for u in range(self.left_size):
            if self.match_left[u] == self.NIL:
                self.level[u] = 0
                queue.append(u)
            else:
                self.level[u] = self.INF
        
        found_augmenting = False
        
        while queue:
            u = queue.popleft()
            for v in self.adj[u]:
                next_u = self.match_right[v]
                if next_u == self.NIL:
                    found_augmenting = True
                elif self.level[next_u] == self.INF:
                    self.level[next_u] = self.level[u] + 1
                    queue.append(next_u)
        
        return found_augmenting
    
    def dfs(self, u: int) -> bool:
        """DFS phase: find augmenting path from left vertex u."""
        if u == self.NIL:
            return True
        
        for v in self.adj[u]:
            next_u = self.match_right[v]
            # Only follow edges in the level graph
            if next_u == self.NIL or (
                self.level[next_u] == self.level[u] + 1 and self.dfs(next_u)
            ):
                # Augment: flip the matching along this path
                self.match_left[u] = v
                self.match_right[v] = u
                return True
        
        # No augmenting path found; remove u from future consideration
        self.level[u] = self.INF
        return False
    
    def max_matching(self) -> int:
        """Find maximum matching size."""
        matching = 0
        
        while self.bfs():
            for u in range(self.left_size):
                if self.match_left[u] == self.NIL:
                    if self.dfs(u):
                        matching += 1
        
        return matching
    
    def get_matching(self) -> List[Tuple[int, int]]:
        """Return list of matched (left, right) pairs."""
        return [
            (u, self.match_left[u])
            for u in range(self.left_size)
            if self.match_left[u] != self.NIL
        ]

Common pitfalls:

Forgetting to reset levels: The level[u] = INF line in DFS is critical—it prevents revisiting dead ends.
Off-by-one errors: Keep left and right vertex indices separate. Don’t mix them.
Not handling disconnected vertices: Vertices with no edges should still be initialized properly.

Practical Application: Job Assignment Problem

Let’s solve a real problem: assigning employees to tasks where each employee has skills for specific tasks.

def solve_job_assignment():
    """
    Example: 5 employees, 5 tasks
    Employee skills:
    - Employee 0: Tasks 0, 1
    - Employee 1: Tasks 0, 2
    - Employee 2: Tasks 1, 3
    - Employee 3: Tasks 2, 4
    - Employee 4: Tasks 3, 4
    """
    employees = 5
    tasks = 5
    
    # Skills matrix: employee -> list of tasks they can do
    skills = [
        [0, 1],     # Employee 0
        [0, 2],     # Employee 1
        [1, 3],     # Employee 2
        [2, 4],     # Employee 3
        [3, 4],     # Employee 4
    ]
    
    # Build the bipartite graph
    hk = HopcroftKarp(employees, tasks)
    for employee, task_list in enumerate(skills):
        for task in task_list:
            hk.add_edge(employee, task)
    
    # Find maximum matching
    max_assignments = hk.max_matching()
    print(f"Maximum assignments possible: {max_assignments}")
    
    # Get actual assignments
    assignments = hk.get_matching()
    print("\nOptimal assignment:")
    for employee, task in assignments:
        print(f"  Employee {employee} -> Task {task}")
    
    return assignments

# Run the solution
solve_job_assignment()

Output:

Maximum assignments possible: 5
Optimal assignment:
  Employee 0 -> Task 1
  Employee 1 -> Task 0
  Employee 2 -> Task 3
  Employee 3 -> Task 2
  Employee 4 -> Task 4

For weighted variants—where some assignments are better than others—you’ll need the Hungarian algorithm. Hopcroft-Karp only handles unweighted maximum cardinality matching.

Conclusion and Further Reading

Hopcroft-Karp remains the go-to algorithm for maximum bipartite matching in unweighted graphs. Its O(E√V) complexity makes it practical for graphs with millions of edges, and the implementation is straightforward once you understand the BFS/DFS phasing.

Key takeaways:

Use BFS to find shortest augmenting path length, then DFS to find all paths of that length
The algorithm terminates in O(√V) phases because path lengths strictly increase
Watch for implementation pitfalls around level management and vertex indexing

For weighted matching, study the Hungarian algorithm (O(V³)) or the auction algorithm. For online matching where edges arrive dynamically, look into ranking algorithms and competitive analysis. Competitive programmers should practice on problems from Codeforces and SPOJ that require bipartite matching—it appears frequently in contest settings.