Kruskal's Algorithm: Minimum Spanning Tree

Key Insights

Kruskal’s algorithm builds a minimum spanning tree by greedily selecting the smallest edges that don’t create cycles, making it intuitive and efficient for sparse graphs.
The Union-Find data structure with path compression and union by rank enables near-constant-time cycle detection, which is the critical operation that makes Kruskal’s practical.
With O(E log E) time complexity dominated by edge sorting, Kruskal’s algorithm outperforms Prim’s on sparse graphs but may lose efficiency on dense graphs where Prim’s with a Fibonacci heap excels.

Introduction to Minimum Spanning Trees

A minimum spanning tree (MST) is a subset of edges from a connected, weighted, undirected graph that connects all vertices with the minimum possible total edge weight—and without forming any cycles. Think of it as the cheapest way to wire up every node in a network.

MSTs appear everywhere in practical engineering:

Network design: Laying fiber optic cables or electrical wiring to connect buildings with minimal cost
Clustering: Grouping data points by removing the longest edges from an MST
Circuit design: Minimizing wire length on PCBs
Approximation algorithms: MSTs provide the backbone for approximating solutions to NP-hard problems like the Traveling Salesman Problem

Two classical algorithms dominate MST construction: Prim’s and Kruskal’s. Prim’s grows a single tree from a starting vertex, adding the cheapest edge that connects to an unvisited vertex. Kruskal’s takes a different approach—it considers edges globally, sorted by weight, and builds a forest that eventually merges into a single tree.

Kruskal’s algorithm shines when you need simplicity and when your graph is sparse. Let’s dig into how it works.

How Kruskal’s Algorithm Works

Kruskal’s algorithm follows a greedy strategy that’s refreshingly straightforward:

Sort all edges by weight in ascending order
Initialize each vertex as its own separate component
For each edge (in sorted order), add it to the MST if it connects two different components
Stop when you’ve added V-1 edges (where V is the vertex count)

The key insight is that adding an edge between two vertices in the same component would create a cycle. By only connecting different components, we guarantee a tree structure.

Consider this simple graph:

    A ---4--- B
    |         |
    2         3
    |         |
    C ---1--- D ---5--- E

Edges sorted by weight: (C,D,1), (A,C,2), (B,D,3), (A,B,4), (D,E,5)

Step-by-step execution:

(C,D,1): C and D are in different components. Add edge. Components: {A}, {B}, {C,D}, {E}
(A,C,2): A and C are in different components. Add edge. Components: {A,C,D}, {B}, {E}
(B,D,3): B and D are in different components. Add edge. Components: {A,B,C,D}, {E}
(A,B,4): A and B are in the same component. Skip (would create cycle).
(D,E,5): D and E are in different components. Add edge. Components: {A,B,C,D,E}

MST edges: (C,D), (A,C), (B,D), (D,E) with total weight 11.

Union-Find Data Structure

The algorithm’s efficiency hinges on quickly answering one question: are these two vertices in the same component? A naive approach using BFS or DFS for each edge would be expensive. Enter Union-Find (also called Disjoint Set Union or DSU).

Union-Find supports two operations:

Find(x): Return the representative (root) of the set containing x
Union(x, y): Merge the sets containing x and y

Two optimizations make these operations blazingly fast:

Path compression: During Find, make every node along the path point directly to the root. This flattens the tree structure, making future queries faster.

Union by rank: When merging sets, attach the shorter tree under the root of the taller tree. This prevents the tree from becoming a long chain.

With both optimizations, each operation runs in O(α(n)) amortized time, where α is the inverse Ackermann function—effectively constant for any practical input size.

class UnionFind:
    def __init__(self, n: int):
        self.parent = list(range(n))
        self.rank = [0] * n
    
    def find(self, x: int) -> int:
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # Path compression
        return self.parent[x]
    
    def union(self, x: int, y: int) -> bool:
        """
        Merge sets containing x and y.
        Returns True if they were in different sets (edge added to MST).
        Returns False if they were already in the same set (cycle detected).
        """
        root_x, root_y = self.find(x), self.find(y)
        
        if root_x == root_y:
            return False  # Same set, would create cycle
        
        # Union by rank
        if self.rank[root_x] < self.rank[root_y]:
            root_x, root_y = root_y, root_x
        self.parent[root_y] = root_x
        if self.rank[root_x] == self.rank[root_y]:
            self.rank[root_x] += 1
        
        return True

The union method returning a boolean is a deliberate design choice—it directly tells us whether an edge should be included in the MST.

Implementation

Here’s a complete, production-ready implementation of Kruskal’s algorithm:

from typing import List, Tuple, NamedTuple

class Edge(NamedTuple):
    weight: int
    u: int
    v: int

def kruskal_mst(num_vertices: int, edges: List[Edge]) -> Tuple[List[Edge], int]:
    """
    Compute the minimum spanning tree using Kruskal's algorithm.
    
    Args:
        num_vertices: Number of vertices (labeled 0 to n-1)
        edges: List of Edge(weight, u, v) tuples
    
    Returns:
        Tuple of (MST edges, total weight)
    
    Raises:
        ValueError: If graph is disconnected
    """
    # Sort edges by weight (NamedTuple sorts by first element by default)
    sorted_edges = sorted(edges)
    
    uf = UnionFind(num_vertices)
    mst_edges = []
    total_weight = 0
    
    for edge in sorted_edges:
        if uf.union(edge.u, edge.v):
            mst_edges.append(edge)
            total_weight += edge.weight
            
            # Early termination: MST has exactly V-1 edges
            if len(mst_edges) == num_vertices - 1:
                break
    
    if len(mst_edges) != num_vertices - 1:
        raise ValueError("Graph is disconnected; no spanning tree exists")
    
    return mst_edges, total_weight

The use of NamedTuple for edges gives us automatic comparison (sorting by weight first), immutability, and clear field access. The early termination check prevents unnecessary iterations once we’ve built the complete MST.

Complexity Analysis

Time complexity: O(E log E)

Sorting edges: O(E log E)
Union-Find operations: O(E × α(V)) ≈ O(E)
Total: O(E log E), dominated by sorting

Since E ≤ V², we can also write this as O(E log V), but for sparse graphs where E ≈ V, the distinction matters.

Space complexity: O(V + E)

Edge list storage: O(E)
Union-Find arrays: O(V)
MST edge storage: O(V)

Kruskal’s vs. Prim’s:

Aspect	Kruskal’s	Prim’s (binary heap)	Prim’s (Fibonacci heap)
Time	O(E log E)	O(E log V)	O(E + V log V)
Best for	Sparse graphs	Dense graphs	Very dense graphs
Implementation	Simpler	Moderate	Complex

For sparse graphs (E ≈ V), Kruskal’s and Prim’s perform similarly. For dense graphs (E ≈ V²), Prim’s with a Fibonacci heap wins theoretically, though the constant factors often favor simpler implementations in practice.

Practical Example

Let’s solve a real problem: connecting five office buildings with network cables at minimum cost.

def solve_network_cabling():
    """
    Connect 5 office buildings with minimum cable cost.
    
    Buildings: HQ(0), Engineering(1), Sales(2), Support(3), Warehouse(4)
    
    Available cable routes with costs (in thousands):
    HQ-Engineering: 4, HQ-Support: 2, Engineering-Sales: 3,
    Support-Sales: 1, Sales-Warehouse: 5, Engineering-Warehouse: 6
    """
    buildings = ["HQ", "Engineering", "Sales", "Support", "Warehouse"]
    
    edges = [
        Edge(4, 0, 1),  # HQ - Engineering
        Edge(2, 0, 3),  # HQ - Support
        Edge(3, 1, 2),  # Engineering - Sales
        Edge(1, 3, 2),  # Support - Sales
        Edge(5, 2, 4),  # Sales - Warehouse
        Edge(6, 1, 4),  # Engineering - Warehouse
    ]
    
    mst_edges, total_cost = kruskal_mst(len(buildings), edges)
    
    print("Cable installation plan:")
    print("-" * 40)
    for edge in mst_edges:
        print(f"  {buildings[edge.u]} <-> {buildings[edge.v]}: ${edge.weight}k")
    print("-" * 40)
    print(f"Total cost: ${total_cost}k")
    
    return mst_edges, total_cost

# Output:
# Cable installation plan:
# ----------------------------------------
#   Support <-> Sales: $1k
#   HQ <-> Support: $2k
#   Engineering <-> Sales: $3k
#   Sales <-> Warehouse: $5k
# ----------------------------------------
# Total cost: $11k

The algorithm selected edges in order: (Support-Sales, $1k), (HQ-Support, $2k), (Engineering-Sales, $3k), skipped (HQ-Engineering, $4k) because it would create a cycle, then added (Sales-Warehouse, $5k).

Edge Cases and Variations

Disconnected graphs: When no spanning tree exists, Kruskal’s produces a spanning forest—a collection of MSTs for each connected component. The implementation above throws an exception, but you could modify it to return the forest instead.

Equal edge weights: Kruskal’s works correctly but may produce different valid MSTs depending on how ties are broken during sorting. If you need deterministic output, add a secondary sort key (like edge endpoints).

Maximum spanning tree: Simply negate all edge weights before running the algorithm, or sort edges in descending order. This is useful for finding the “most reliable” path in a network where edge weights represent reliability.

def maximum_spanning_tree(num_vertices: int, edges: List[Edge]) -> Tuple[List[Edge], int]:
    """Compute maximum spanning tree by negating weights."""
    negated = [Edge(-e.weight, e.u, e.v) for e in edges]
    mst_edges, neg_total = kruskal_mst(num_vertices, negated)
    
    # Restore original weights
    original_edges = [Edge(-e.weight, e.u, e.v) for e in mst_edges]
    return original_edges, -neg_total

Kruskal’s algorithm embodies the power of greedy algorithms backed by the right data structure. The Union-Find enables what would otherwise be an expensive cycle-detection step to run in near-constant time. When you need an MST and your graph fits in memory, Kruskal’s is often the most straightforward choice.