Floyd-Warshall Algorithm: All-Pairs Shortest Path

Key Insights

Floyd-Warshall uses dynamic programming to compute shortest paths between all vertex pairs in O(V³) time, making it ideal for dense graphs where you need the complete distance matrix.
The algorithm’s correctness depends on the loop ordering: the intermediate vertex k must be the outermost loop, allowing paths to build incrementally through vertices 0, 1, 2, …, k.
Unlike Dijkstra’s algorithm, Floyd-Warshall handles negative edge weights gracefully and can detect negative cycles by checking for negative values on the matrix diagonal.

The All-Pairs Shortest Path Problem

Sometimes you need more than the shortest path from a single source. Routing protocols need distance tables between all nodes. Social network analysis requires computing closeness centrality for every user. Game engines precompute pathfinding data for all landmark pairs. These scenarios demand the all-pairs shortest path solution.

The naive approach runs Dijkstra’s algorithm from every vertex. With a binary heap, that’s O(V × (V + E) log V). For dense graphs where E ≈ V², this becomes O(V³ log V). Floyd-Warshall solves the same problem in O(V³) with a simpler implementation and better cache behavior. For sparse graphs with non-negative weights, repeated Dijkstra wins. For everything else, Floyd-Warshall is your tool.

Core Intuition: Building Paths Through Intermediate Vertices

Floyd-Warshall’s elegance comes from a single insight: any shortest path from vertex i to vertex j either goes directly, or passes through some intermediate vertex k. If we systematically consider all possible intermediate vertices, we’ll find the optimal path.

The algorithm builds solutions incrementally. First, it considers paths that only use vertex 0 as an intermediate. Then paths using vertices {0, 1}. Then {0, 1, 2}, and so on. By the time we’ve considered all vertices as potential intermediates, we’ve found every shortest path.

The recurrence relation captures this:

dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j])

For each pair (i, j), we ask: “Is the current best path from i to j shorter than going through k?” If routing through k is better, we update our distance.

Algorithm Implementation

Here’s the core algorithm. Pay attention to the loop ordering—it’s the most common source of bugs.

def floyd_warshall(graph):
    """
    Compute all-pairs shortest paths.
    
    Args:
        graph: 2D list where graph[i][j] is the edge weight from i to j.
               Use float('inf') for no direct edge.
    
    Returns:
        2D distance matrix where dist[i][j] is shortest path from i to j.
    """
    n = len(graph)
    
    # Initialize distance matrix with direct edge weights
    dist = [[float('inf')] * n for _ in range(n)]
    
    for i in range(n):
        for j in range(n):
            if i == j:
                dist[i][j] = 0
            elif graph[i][j] != float('inf'):
                dist[i][j] = graph[i][j]
    
    # Consider each vertex as intermediate
    for k in range(n):  # k MUST be outermost
        for i in range(n):
            for j in range(n):
                if dist[i][k] + dist[k][j] < dist[i][j]:
                    dist[i][j] = dist[i][k] + dist[k][j]
    
    return dist

Why must k be the outermost loop? When we process intermediate vertex k, we need dist[i][k] and dist[k][j] to already reflect the shortest paths using intermediates {0, 1, …, k-1}. If we put i or j outermost, we’d be using stale values that haven’t considered all previous intermediates.

Think of it this way: we’re building a sequence of matrices D⁰, D¹, D², …, Dⁿ. Matrix Dᵏ contains shortest paths using only vertices 0 through k-1 as intermediates. Each matrix depends on the previous one being complete.

Worked Example: Tracing the Algorithm

Let’s trace through a concrete example. Consider this 4-vertex graph:

def trace_floyd_warshall():
    """Step-by-step trace on a 4-node graph."""
    INF = float('inf')
    
    # Graph representation:
    # 0 --2--> 1 --3--> 2
    # |        ^        |
    # 7        1        1
    # v        |        v
    # 3 -------+        (to 3)
    
    graph = [
        [0,   2,   INF, 7  ],  # From vertex 0
        [INF, 0,   3,   INF],  # From vertex 1
        [INF, INF, 0,   1  ],  # From vertex 2
        [INF, 1,   INF, 0  ],  # From vertex 3
    ]
    
    n = 4
    dist = [row[:] for row in graph]  # Copy initial matrix
    
    print("Initial matrix (direct edges only):")
    print_matrix(dist)
    
    for k in range(n):
        for i in range(n):
            for j in range(n):
                if dist[i][k] != INF and dist[k][j] != INF:
                    if dist[i][k] + dist[k][j] < dist[i][j]:
                        dist[i][j] = dist[i][k] + dist[k][j]
        
        print(f"\nAfter considering vertex {k} as intermediate:")
        print_matrix(dist)
    
    return dist

def print_matrix(matrix):
    for row in matrix:
        print([f"{x:3}" if x != float('inf') else "INF" for x in row])

# Running trace_floyd_warshall() produces:
# Initial matrix: direct edges only
# After k=0: paths through vertex 0 (no changes - 0 has no useful shortcuts)
# After k=1: dist[0][2] = 5 (0->1->2), dist[3][2] = 4 (3->1->2)
# After k=2: dist[0][3] = 6 (0->1->2->3), dist[1][3] = 4 (1->2->3)
# After k=3: dist[0][1] = min(2, 7+1) = 2 (no change, direct is better)

Watch how paths emerge. Initially, we only know direct edges. After considering vertex 1, we discover 0→1→2. After vertex 2, we find 0→1→2→3. The algorithm systematically explores all possible shortcuts.

Path Reconstruction

Distances alone aren’t always enough. To reconstruct actual paths, maintain a predecessor matrix alongside distances.

def floyd_warshall_with_paths(graph):
    """
    Compute all-pairs shortest paths with path reconstruction.
    
    Returns:
        dist: Distance matrix
        next_hop: next_hop[i][j] is the first vertex after i on shortest path to j
    """
    n = len(graph)
    INF = float('inf')
    
    dist = [[INF] * n for _ in range(n)]
    next_hop = [[None] * n for _ in range(n)]
    
    # Initialize
    for i in range(n):
        for j in range(n):
            if i == j:
                dist[i][j] = 0
                next_hop[i][j] = j
            elif graph[i][j] != INF:
                dist[i][j] = graph[i][j]
                next_hop[i][j] = j  # Direct edge: next hop is destination
    
    # Main algorithm
    for k in range(n):
        for i in range(n):
            for j in range(n):
                if dist[i][k] != INF and dist[k][j] != INF:
                    if dist[i][k] + dist[k][j] < dist[i][j]:
                        dist[i][j] = dist[i][k] + dist[k][j]
                        next_hop[i][j] = next_hop[i][k]  # Go toward k first
    
    return dist, next_hop


def reconstruct_path(next_hop, start, end):
    """Reconstruct the shortest path from start to end."""
    if next_hop[start][end] is None:
        return None  # No path exists
    
    path = [start]
    current = start
    
    while current != end:
        current = next_hop[current][end]
        if current is None:
            return None  # Broken path (shouldn't happen with valid input)
        path.append(current)
    
    return path


# Usage example
dist, next_hop = floyd_warshall_with_paths(graph)
path = reconstruct_path(next_hop, 0, 3)
print(f"Shortest path 0→3: {path}")  # Output: [0, 1, 2, 3]
print(f"Distance: {dist[0][3]}")      # Output: 6

The key insight: when we update dist[i][j] to go through k, we set next_hop[i][j] = next_hop[i][k]. We don’t need to store the entire path—just the first step. Reconstruction follows the chain of next hops.

Handling Edge Cases

Negative Cycle Detection

Negative cycles make shortest paths undefined (you can always go around again for a shorter path). Floyd-Warshall detects them elegantly: after the algorithm completes, check the diagonal.

def detect_negative_cycle(dist):
    """
    Check for negative cycles after running Floyd-Warshall.
    
    A negative cycle exists if any vertex has negative distance to itself.
    """
    n = len(dist)
    for i in range(n):
        if dist[i][i] < 0:
            return True
    return False


def floyd_warshall_safe(graph):
    """Floyd-Warshall with negative cycle detection."""
    dist = floyd_warshall(graph)
    
    if detect_negative_cycle(dist):
        raise ValueError("Graph contains a negative cycle")
    
    return dist

Why does this work? If there’s a negative cycle reachable from vertex i and that can reach vertex i again, the algorithm will find a path i→…→i with negative total weight. The diagonal, initialized to 0, becomes negative.

Disconnected Graphs

For disconnected graphs, some vertex pairs have no path. The algorithm handles this naturally—infinity plus anything stays infinity (in practice, check for overflow or use explicit infinity checks).

# Safe addition that handles infinity
def safe_add(a, b):
    INF = float('inf')
    if a == INF or b == INF:
        return INF
    return a + b

Complexity Analysis and Algorithm Selection

Floyd-Warshall runs in O(V³) time with O(V²) space. The triple nested loop is cache-friendly since we access the matrix in predictable patterns.

When should you use each algorithm?

Scenario	Best Algorithm	Time Complexity
All-pairs, dense graph, negative edges OK	Floyd-Warshall	O(V³)
All-pairs, sparse graph, non-negative edges	Dijkstra × V (with heap)	O(V² log V + VE)
All-pairs, sparse graph, negative edges	Johnson’s algorithm	O(V² log V + VE)
Single source, non-negative edges	Dijkstra	O((V + E) log V)
Single source, negative edges	Bellman-Ford	O(VE)

For graphs under 1000 vertices, Floyd-Warshall’s simplicity often beats theoretically faster alternatives. The constant factors are tiny, and the implementation fits in 15 lines. For larger sparse graphs, invest in Johnson’s algorithm, which runs Bellman-Ford once to reweight edges, then Dijkstra from each vertex.

Floyd-Warshall shines in specific domains: network routing protocols precomputing forwarding tables, transitive closure computation (use boolean OR instead of addition), and any scenario where you need the complete distance matrix upfront rather than on-demand queries.