Centroid Decomposition: Divide and Conquer on Trees

Standard divide and conquer works beautifully on arrays because splitting in half guarantees O(log n) depth. Trees don't offer this luxury. A naive approach—picking an arbitrary node and recursing on...

Key Insights

  • Centroid decomposition guarantees O(log n) recursion depth on any tree by always splitting at the “balance point,” making divide-and-conquer efficient regardless of tree shape
  • Every path in the original tree passes through exactly one centroid in the decomposition hierarchy, enabling efficient path queries by only considering paths through each centroid
  • The technique transforms tree problems into a logarithmic number of subproblems, achieving O(n log n) or O(n log² n) complexity for problems that would otherwise require O(n²)

Introduction: Why Trees Need Special Divide and Conquer

Standard divide and conquer works beautifully on arrays because splitting in half guarantees O(log n) depth. Trees don’t offer this luxury. A naive approach—picking an arbitrary node and recursing on subtrees—fails catastrophically on skewed trees. A linear chain of 100,000 nodes gives you 100,000 levels of recursion.

Centroid decomposition solves this by always choosing the optimal split point: the centroid. This guarantees O(log n) depth regardless of tree structure, transforming intractable O(n²) problems into manageable O(n log n) solutions.

The technique shines in network analysis, game trees, and competitive programming. Any problem involving path queries, distance calculations, or aggregating information across tree paths becomes tractable with centroid decomposition.

Understanding Centroids

A centroid of a tree is a node whose removal splits the tree into components, each containing at most n/2 nodes. Think of it as the tree’s center of mass—the point that most evenly divides the structure.

Existence proof: Start at any node. If some subtree has more than n/2 nodes, move toward it. Each step reduces the “heavy” side, so you eventually reach a node where no subtree exceeds n/2.

Uniqueness: A tree has at most two centroids, and if two exist, they’re adjacent. For our purposes, we pick one arbitrarily.

Here’s how to find a centroid:

#include <vector>
using namespace std;

class CentroidFinder {
    vector<vector<int>> adj;
    vector<bool> removed;
    vector<int> subtree_size;
    int n;

public:
    CentroidFinder(int n) : n(n), adj(n), removed(n, false), subtree_size(n) {}

    void add_edge(int u, int v) {
        adj[u].push_back(v);
        adj[v].push_back(u);
    }

    int compute_subtree_size(int node, int parent) {
        subtree_size[node] = 1;
        for (int child : adj[node]) {
            if (child != parent && !removed[child]) {
                subtree_size[node] += compute_subtree_size(child, node);
            }
        }
        return subtree_size[node];
    }

    int find_centroid(int node, int parent, int tree_size) {
        for (int child : adj[node]) {
            if (child != parent && !removed[child]) {
                // If this subtree has more than half the nodes, centroid is deeper
                if (subtree_size[child] > tree_size / 2) {
                    return find_centroid(child, node, tree_size);
                }
            }
        }
        return node;
    }

    int get_centroid(int root) {
        int tree_size = compute_subtree_size(root, -1);
        return find_centroid(root, -1, tree_size);
    }
};

The key insight: after computing subtree sizes, we walk toward any “heavy” subtree (size > n/2). Since at most one such subtree exists at each node, we find the centroid in O(n) time.

Building the Centroid Tree

The decomposition process is recursive:

  1. Find the centroid of the current tree
  2. Mark it as “removed” (logically delete it)
  3. Recurse on each remaining subtree
  4. Connect centroids in a new “centroid tree” structure

The centroid tree has a crucial property: its height is O(log n). Each recursion operates on subtrees of at most half the original size, giving us the logarithmic guarantee.

#include <vector>
using namespace std;

class CentroidDecomposition {
    vector<vector<int>> adj;
    vector<bool> removed;
    vector<int> subtree_size;
    vector<int> centroid_parent;  // Parent in centroid tree
    vector<int> centroid_depth;   // Depth in centroid tree
    int n;

    int compute_size(int node, int parent) {
        subtree_size[node] = 1;
        for (int child : adj[node]) {
            if (child != parent && !removed[child]) {
                subtree_size[node] += compute_size(child, node);
            }
        }
        return subtree_size[node];
    }

    int find_centroid(int node, int parent, int tree_size) {
        for (int child : adj[node]) {
            if (child != parent && !removed[child]) {
                if (subtree_size[child] > tree_size / 2) {
                    return find_centroid(child, node, tree_size);
                }
            }
        }
        return node;
    }

    void decompose(int node, int parent_centroid, int depth) {
        int tree_size = compute_size(node, -1);
        int centroid = find_centroid(node, -1, tree_size);

        removed[centroid] = true;
        centroid_parent[centroid] = parent_centroid;
        centroid_depth[centroid] = depth;

        for (int child : adj[centroid]) {
            if (!removed[child]) {
                decompose(child, centroid, depth + 1);
            }
        }
    }

public:
    CentroidDecomposition(int n) : n(n), adj(n), removed(n, false),
        subtree_size(n), centroid_parent(n, -1), centroid_depth(n) {}

    void add_edge(int u, int v) {
        adj[u].push_back(v);
        adj[v].push_back(u);
    }

    void build() {
        decompose(0, -1, 0);
    }

    int get_centroid_parent(int node) { return centroid_parent[node]; }
    int get_centroid_depth(int node) { return centroid_depth[node]; }
};

After decomposition, every node has at most O(log n) ancestors in the centroid tree. This property is the foundation for efficient queries.

Classic Application: Path Queries

Consider this problem: count all pairs of nodes at distance exactly k. The brute force O(n²) approach checks every pair. Centroid decomposition reduces this to O(n log n).

The key observation: every path in the tree passes through exactly one centroid in our decomposition. At each centroid, we count paths passing through it, then recurse on subtrees.

#include <vector>
#include <map>
using namespace std;

class PathCounter {
    vector<vector<int>> adj;
    vector<bool> removed;
    vector<int> subtree_size;
    int n, target_k;
    long long answer;

    int compute_size(int node, int parent) {
        subtree_size[node] = 1;
        for (int child : adj[node]) {
            if (child != parent && !removed[child]) {
                subtree_size[node] += compute_size(child, node);
            }
        }
        return subtree_size[node];
    }

    int find_centroid(int node, int parent, int tree_size) {
        for (int child : adj[node]) {
            if (child != parent && !removed[child]) {
                if (subtree_size[child] > tree_size / 2) {
                    return find_centroid(child, node, tree_size);
                }
            }
        }
        return node;
    }

    void collect_distances(int node, int parent, int dist, vector<int>& distances) {
        distances.push_back(dist);
        for (int child : adj[node]) {
            if (child != parent && !removed[child]) {
                collect_distances(child, node, dist + 1, distances);
            }
        }
    }

    long long count_paths_through_centroid(int centroid) {
        map<int, long long> distance_count;
        distance_count[0] = 1;  // The centroid itself at distance 0
        long long result = 0;

        for (int child : adj[centroid]) {
            if (removed[child]) continue;

            vector<int> distances;
            collect_distances(child, centroid, 1, distances);

            // Count pairs: one node from this subtree, one from previous subtrees
            for (int d : distances) {
                int needed = target_k - d;
                if (distance_count.count(needed)) {
                    result += distance_count[needed];
                }
            }

            // Add this subtree's distances to the map
            for (int d : distances) {
                distance_count[d]++;
            }
        }

        return result;
    }

    void solve(int node) {
        int tree_size = compute_size(node, -1);
        int centroid = find_centroid(node, -1, tree_size);

        removed[centroid] = true;
        answer += count_paths_through_centroid(centroid);

        for (int child : adj[centroid]) {
            if (!removed[child]) {
                solve(child);
            }
        }
    }

public:
    PathCounter(int n, int k) : n(n), target_k(k), adj(n), 
        removed(n, false), subtree_size(n), answer(0) {}

    void add_edge(int u, int v) {
        adj[u].push_back(v);
        adj[v].push_back(u);
    }

    long long count_paths() {
        solve(0);
        return answer;
    }
};

The algorithm processes each node O(log n) times (once per centroid ancestor), giving O(n log n) total work with appropriate data structures.

Handling Updates: Dynamic Problems

Centroid decomposition truly shines with dynamic queries. Consider: maintain a set of marked nodes, supporting “mark node x” and “find nearest marked node to x.”

The insight: climbing the centroid tree from any node visits O(log n) ancestors. At each ancestor, we maintain the minimum distance to any marked node in its subtree.

#include <vector>
#include <set>
#include <climits>
using namespace std;

class NearestMarkedNode {
    vector<vector<int>> adj;
    vector<int> centroid_parent;
    vector<int> depth;  // Depth in original tree for LCA/distance
    vector<int> parent; // Parent in original tree
    vector<set<pair<int,int>>> marked_distances; // {distance, node} at each centroid
    int n;

    // Precompute for distance queries (simplified - use LCA for production)
    int get_distance(int u, int v) {
        // Placeholder: implement with LCA + depth for O(log n) distance
        // For this example, assume we have this function
        return 0; // Replace with actual implementation
    }

    // ... centroid decomposition setup code ...

public:
    void mark(int node) {
        int current = node;
        while (current != -1) {
            int dist = get_distance(node, current);
            marked_distances[current].insert({dist, node});
            current = centroid_parent[current];
        }
    }

    void unmark(int node) {
        int current = node;
        while (current != -1) {
            int dist = get_distance(node, current);
            marked_distances[current].erase({dist, node});
            current = centroid_parent[current];
        }
    }

    int query_nearest(int node) {
        int result = INT_MAX;
        int current = node;
        while (current != -1) {
            if (!marked_distances[current].empty()) {
                int dist_to_centroid = get_distance(node, current);
                auto it = marked_distances[current].begin();
                result = min(result, dist_to_centroid + it->first);
            }
            current = centroid_parent[current];
        }
        return result == INT_MAX ? -1 : result;
    }
};

Updates and queries both traverse O(log n) centroid ancestors. With O(log n) distance computation via LCA, we achieve O(log² n) per operation.

Complexity Analysis and Comparison

Time complexity:

  • Decomposition: O(n log n) — each node processed O(log n) times
  • Path queries: O(n log n) with proper data structures
  • Dynamic updates: O(log² n) per operation

Space complexity: O(n log n) in the worst case when storing per-centroid data structures.

Centroid vs. Heavy-Light Decomposition:

Aspect Centroid HLD
Best for Path counting, distance queries Path updates, range queries on paths
Structure Tree of centroids Chains with segment trees
Update type Point updates Range updates on paths
Complexity Often simpler More versatile

Use centroid decomposition when you need to aggregate over all paths or find paths with specific properties. Use HLD when you need to update or query specific paths with range operations.

Common pitfalls:

  1. Forgetting to check removed[child] in every DFS
  2. Not recomputing subtree sizes after marking nodes as removed
  3. Confusing original tree depth with centroid tree depth

Practice Problems and Further Reading

Beginner:

  • CSES: Finding a Centroid (direct application)
  • Codeforces 321C: Ciel the Commander (greedy + centroid)

Intermediate:

  • SPOJ: QTREE5 (nearest marked node)
  • Codeforces 342E: Xenia and Tree (classic dynamic problem)

Advanced:

  • IOI 2011: Race (path of exact length with minimum edges)
  • Codeforces 715C: Digit Tree (paths forming numbers divisible by k)

For further study, explore virtual trees (auxiliary trees) for handling queries on subsets of nodes, and consider combining centroid decomposition with persistent data structures for version-controlled tree queries.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.