Tarjan's Algorithm: Strongly Connected Components
A strongly connected component (SCC) is a maximal subgraph where every vertex can reach every other vertex through directed edges. 'Maximal' means you can't add another vertex without breaking this...
Key Insights
- Tarjan’s algorithm finds all strongly connected components in a single DFS pass using discovery times and low-link values, achieving O(V + E) time complexity
- The low-link value tracks the smallest discovery time reachable from a node’s subtree, and when it equals the node’s discovery time, that node is an SCC root
- Unlike Kosaraju’s two-pass approach, Tarjan’s algorithm uses a stack to track nodes in the current DFS path, making it more cache-friendly and often faster in practice
Introduction to Strongly Connected Components
A strongly connected component (SCC) is a maximal subgraph where every vertex can reach every other vertex through directed edges. “Maximal” means you can’t add another vertex without breaking this property.
SCCs appear everywhere in software systems. Package managers use them to detect circular dependencies—if packages A, B, and C form an SCC, you’ve got a dependency cycle that needs breaking. Compilers identify SCCs in control flow graphs to optimize loops. Social networks cluster users into tightly-knit communities. Circuit designers analyze feedback loops in digital systems.
For small graphs, naive approaches work fine. But production systems deal with dependency graphs containing millions of nodes. You need an algorithm that scales linearly with graph size. Tarjan’s algorithm delivers exactly that: O(V + E) time complexity in a single depth-first traversal.
Prerequisites: Graph Representation and DFS
Before diving into Tarjan’s algorithm, let’s establish our foundation. We’ll represent directed graphs using adjacency lists—the standard choice for sparse graphs.
from collections import defaultdict
from typing import List, Set
class DirectedGraph:
def __init__(self):
self.adj: dict[int, List[int]] = defaultdict(list)
self.vertices: Set[int] = set()
def add_edge(self, u: int, v: int) -> None:
self.adj[u].append(v)
self.vertices.add(u)
self.vertices.add(v)
def dfs_iterative(self, start: int) -> List[int]:
"""Basic iterative DFS for reference."""
visited = set()
result = []
stack = [start]
while stack:
node = stack.pop()
if node in visited:
continue
visited.add(node)
result.append(node)
# Add neighbors in reverse for consistent ordering
for neighbor in reversed(self.adj[node]):
if neighbor not in visited:
stack.append(neighbor)
return result
DFS assigns each node two timestamps: discovery time (when first visited) and finish time (when all descendants are processed). Tarjan’s algorithm exploits the discovery time to identify SCC boundaries.
Core Concepts: Discovery Time and Low-Link Values
Tarjan’s algorithm tracks two values for each node:
Discovery time (disc): A counter incremented each time we visit a new node. If we visit nodes in order A → B → C, they get discovery times 0, 1, 2.
Low-link value (low): The smallest discovery time reachable from this node through its DFS subtree, including back edges to ancestors still on the stack.
The key insight: when a node’s low-link value equals its discovery time, that node is the root of an SCC. It can’t reach any node discovered earlier, meaning it and its descendants form a closed component.
class TarjanState:
"""Tracks algorithm state during traversal."""
def __init__(self):
self.disc: dict[int, int] = {} # Discovery times
self.low: dict[int, int] = {} # Low-link values
self.on_stack: dict[int, bool] = {} # Currently on DFS stack
self.stack: List[int] = [] # The DFS stack
self.time: int = 0 # Global timestamp counter
self.sccs: List[List[int]] = [] # Discovered SCCs
def initialize_node(self, node: int) -> None:
"""Set initial values when discovering a node."""
self.disc[node] = self.time
self.low[node] = self.time
self.time += 1
self.stack.append(node)
self.on_stack[node] = True
def update_low(self, node: int, neighbor: int) -> None:
"""Update low-link after processing a neighbor."""
if self.on_stack.get(neighbor, False):
# Back edge to ancestor on stack
self.low[node] = min(self.low[node], self.disc[neighbor])
elif neighbor in self.low:
# Tree edge to already-processed descendant
self.low[node] = min(self.low[node], self.low[neighbor])
The on_stack check is crucial. We only consider back edges to nodes currently in our DFS path—not to nodes in already-completed SCCs.
Tarjan’s Algorithm: Step-by-Step Walkthrough
Here’s the complete implementation. I’m using an iterative approach to avoid stack overflow on deep graphs:
from typing import List, Tuple
from enum import Enum, auto
class NodeState(Enum):
UNVISITED = auto()
PROCESSING = auto()
DONE = auto()
def tarjan_scc(graph: DirectedGraph) -> List[List[int]]:
"""
Find all strongly connected components using Tarjan's algorithm.
Returns list of SCCs, each SCC is a list of node IDs.
"""
disc: dict[int, int] = {}
low: dict[int, int] = {}
on_stack: dict[int, bool] = defaultdict(bool)
stack: List[int] = []
time_counter: List[int] = [0] # Mutable container for closure
sccs: List[List[int]] = []
state: dict[int, NodeState] = defaultdict(lambda: NodeState.UNVISITED)
def strongconnect(start: int) -> None:
# Iterative DFS using explicit call stack
# Each frame: (node, neighbor_index, phase)
# phase 0: initial visit, phase 1: processing neighbors
call_stack: List[Tuple[int, int]] = [(start, 0)]
while call_stack:
node, neighbor_idx = call_stack.pop()
neighbors = graph.adj[node]
if neighbor_idx == 0 and state[node] == NodeState.UNVISITED:
# First visit: initialize node
disc[node] = time_counter[0]
low[node] = time_counter[0]
time_counter[0] += 1
stack.append(node)
on_stack[node] = True
state[node] = NodeState.PROCESSING
# Process neighbors starting from neighbor_idx
found_unvisited = False
for i in range(neighbor_idx, len(neighbors)):
neighbor = neighbors[i]
if state[neighbor] == NodeState.UNVISITED:
# Push current node back with next index, then recurse
call_stack.append((node, i + 1))
call_stack.append((neighbor, 0))
found_unvisited = True
break
elif on_stack[neighbor]:
# Back edge to node in current SCC
low[node] = min(low[node], disc[neighbor])
if found_unvisited:
continue
# All neighbors processed - update parent's low-link
if call_stack:
parent, _ = call_stack[-1]
low[parent] = min(low[parent], low[node])
# Check if node is SCC root
if low[node] == disc[node]:
scc = []
while True:
w = stack.pop()
on_stack[w] = False
state[w] = NodeState.DONE
scc.append(w)
if w == node:
break
sccs.append(scc)
# Run from each unvisited vertex
for v in graph.vertices:
if state[v] == NodeState.UNVISITED:
strongconnect(v)
return sccs
The algorithm maintains a stack of nodes in the current DFS path. When we find an SCC root (low-link equals discovery time), we pop nodes from the stack until we reach the root—all those nodes belong to the same SCC.
Visual Example: Tracing the Algorithm
Consider this graph with 6 nodes:
0 → 1 → 2 → 0 (cycle: 0,1,2)
↓
3 → 4 → 5 → 3 (cycle: 3,4,5)
Let’s trace the algorithm:
| Step | Action | Stack | disc | low | Notes |
|---|---|---|---|---|---|
| 1 | Visit 0 | [0] | {0:0} | {0:0} | Start DFS |
| 2 | Visit 1 | [0,1] | {0:0,1:1} | {0:0,1:1} | Follow edge 0→1 |
| 3 | Visit 2 | [0,1,2] | {0:0,1:1,2:2} | {0:0,1:1,2:2} | Follow edge 1→2 |
| 4 | Back edge 2→0 | [0,1,2] | same | {0:0,1:1,2:0} | low[2] = min(2, disc[0]) = 0 |
| 5 | Return to 1 | [0,1,2] | same | {0:0,1:0,2:0} | low[1] = min(1, low[2]) = 0 |
| 6 | Visit 3 | [0,1,2,3] | +{3:3} | +{3:3} | Follow edge 1→3 |
| 7 | Visit 4 | [0,1,2,3,4] | +{4:4} | +{4:4} | Continue DFS |
| 8 | Visit 5 | [0,1,2,3,4,5] | +{5:5} | +{5:5} | Continue DFS |
| 9 | Back edge 5→3 | same | same | {5:3} | low[5] = min(5, disc[3]) = 3 |
| 10 | Return, pop SCC | [0,1,2] | same | {4:3,3:3} | low[3]==disc[3], SCC: [5,4,3] |
| 11 | Return to 1 | [0,1,2] | same | same | low[1] unchanged (3 not on stack) |
| 12 | Return to 0 | [0,1,2] | same | {0:0,1:0} | low[0] = min(0, low[1]) = 0 |
| 13 | Pop SCC | [] | same | same | low[0]==disc[0], SCC: [2,1,0] |
Final SCCs: [[5,4,3], [2,1,0]]
Complexity Analysis and Comparison with Kosaraju’s
Tarjan’s Algorithm:
- Time: O(V + E) — single DFS pass
- Space: O(V) — stack, disc, low arrays
Kosaraju’s Algorithm:
- Time: O(V + E) — but requires two DFS passes
- Space: O(V + E) — needs to store the transposed graph
Both achieve linear time, but Tarjan’s has practical advantages:
- Single pass: Better cache locality, fewer memory accesses
- No graph transposition: Kosaraju’s must reverse all edges, which costs O(E) time and space
- Online processing: Tarjan’s can output SCCs as they’re found
Choose Kosaraju’s when you already have the transposed graph or when code simplicity matters more than performance. Choose Tarjan’s for production systems where every microsecond counts.
Practical Applications
Here’s a real-world example: detecting circular dependencies in a module system.
def find_circular_dependencies(modules: dict[str, List[str]]) -> List[List[str]]:
"""
Given a dict of module -> [dependencies], find circular dependency groups.
Example:
modules = {
"auth": ["database", "utils"],
"database": ["config"],
"api": ["auth", "database"],
"config": ["utils"],
"utils": ["config"], # Circular: utils -> config -> utils
}
"""
# Build graph with string-to-int mapping
name_to_id: dict[str, int] = {}
id_to_name: dict[int, str] = {}
for i, name in enumerate(modules.keys()):
name_to_id[name] = i
id_to_name[i] = name
graph = DirectedGraph()
for module, deps in modules.items():
for dep in deps:
if dep in name_to_id: # Only internal dependencies
graph.add_edge(name_to_id[module], name_to_id[dep])
sccs = tarjan_scc(graph)
# Filter to SCCs with more than one node (actual cycles)
cycles = [
[id_to_name[node] for node in scc]
for scc in sccs
if len(scc) > 1
]
return cycles
# Usage
modules = {
"auth": ["database", "utils"],
"database": ["config"],
"api": ["auth", "database"],
"config": ["utils"],
"utils": ["config"],
}
cycles = find_circular_dependencies(modules)
print(f"Circular dependencies found: {cycles}")
# Output: Circular dependencies found: [['config', 'utils']]
Beyond dependency analysis, Tarjan’s algorithm powers:
- 2-SAT solvers: Each SCC in the implication graph represents equivalent literals
- Compiler optimization: Identifying loops for strength reduction and invariant code motion
- Deadlock detection: Cycles in resource allocation graphs indicate potential deadlocks
Tarjan’s algorithm is one of those foundational tools every systems programmer should internalize. It’s elegant, efficient, and solves problems you’ll encounter repeatedly throughout your career.