Linear Algebra: QR Decomposition Explained

Key Insights

QR decomposition factors any matrix into an orthogonal matrix Q and upper triangular matrix R, providing superior numerical stability compared to methods like normal equations for solving linear systems
The Gram-Schmidt process offers an intuitive approach to QR decomposition, but Householder reflections are preferred in production libraries due to better numerical properties and computational efficiency
QR decomposition excels at solving least squares problems and is the foundation of modern eigenvalue algorithms, making it essential for regression analysis and machine learning applications

Introduction to QR Decomposition

QR decomposition is a matrix factorization technique that breaks down any matrix A into the product of two matrices: Q (an orthogonal matrix) and R (an upper triangular matrix), such that A = QR. This seemingly simple factorization is one of the most powerful tools in numerical linear algebra.

Why does QR decomposition matter? Three reasons: numerical stability, versatility, and reliability. When solving linear systems or least squares problems, QR decomposition avoids the numerical instabilities that plague other methods. Unlike LU decomposition, which can fail without pivoting, QR always exists for any matrix. Compared to Singular Value Decomposition (SVD), QR is computationally cheaper while still providing excellent numerical properties for many common tasks.

The key advantage over normal equations (A^T A x = A^T b) for solving least squares is that QR doesn’t square the condition number of the matrix. This means you maintain precision even with ill-conditioned matrices—a critical property for real-world data analysis.

Mathematical Foundation

To understand QR decomposition, you need to grasp two concepts: orthogonal matrices and upper triangular matrices.

An orthogonal matrix Q satisfies Q^T Q = I, where I is the identity matrix. This means Q’s columns form an orthonormal basis—they’re unit vectors that are mutually perpendicular. Geometrically, orthogonal matrices represent rotations and reflections, transformations that preserve lengths and angles.

An upper triangular matrix R has all zeros below the main diagonal. These matrices are computationally convenient because systems involving them can be solved efficiently through back substitution.

Here’s how to verify the orthogonality property in practice:

import numpy as np

# Generate a random matrix
A = np.random.rand(4, 3)

# Compute QR decomposition
Q, R = np.linalg.qr(A)

# Verify Q is orthogonal: Q^T Q should equal identity
result = Q.T @ Q
print("Q^T Q:")
print(result)
print("\nIs Q orthogonal?", np.allclose(result, np.eye(Q.shape[1])))

# Verify reconstruction: A should equal Q @ R
reconstruction = Q @ R
print("\nReconstruction error:", np.linalg.norm(A - reconstruction))

This verification demonstrates the fundamental property that makes QR decomposition so useful: Q preserves geometric properties while R captures the scaling and relationships between variables.

The Gram-Schmidt Process

The Gram-Schmidt process is the most intuitive method for computing QR decomposition. It systematically orthogonalizes a set of vectors by projecting out components that aren’t perpendicular.

Here’s the classical algorithm: take each column of A, subtract its projections onto all previous orthogonal vectors, then normalize. Let’s implement this from scratch:

def gram_schmidt(A):
    """
    Perform QR decomposition using classical Gram-Schmidt process.
    
    Args:
        A: m x n matrix (m >= n)
    
    Returns:
        Q: m x n orthogonal matrix
        R: n x n upper triangular matrix
    """
    m, n = A.shape
    Q = np.zeros((m, n))
    R = np.zeros((n, n))
    
    for j in range(n):
        # Start with the j-th column of A
        v = A[:, j].copy()
        
        # Subtract projections onto all previous Q columns
        for i in range(j):
            R[i, j] = Q[:, i].T @ A[:, j]
            v -= R[i, j] * Q[:, i]
        
        # Normalize
        R[j, j] = np.linalg.norm(v)
        Q[:, j] = v / R[j, j]
    
    return Q, R

# Test with a simple 3x3 matrix
A = np.array([[1.0, 1.0, 0.0],
              [1.0, 0.0, 1.0],
              [0.0, 1.0, 1.0]])

Q_custom, R_custom = gram_schmidt(A)
Q_numpy, R_numpy = np.linalg.qr(A)

print("Custom Q:")
print(Q_custom)
print("\nNumPy Q:")
print(Q_numpy)
print("\nReconstruction error (custom):", np.linalg.norm(A - Q_custom @ R_custom))

The modified Gram-Schmidt algorithm improves numerical stability by reorthogonalizing against the updated Q vectors rather than the original A columns. In practice, even modified Gram-Schmidt can suffer from numerical issues, which is why production libraries use Householder reflections.

Householder Reflections Method

Householder reflections provide a more numerically stable approach to QR decomposition. Instead of building orthogonal vectors incrementally, Householder transformations use reflection matrices to zero out elements below the diagonal in one column at a time.

A Householder reflection is defined by a vector v: H = I - 2(vv^T)/(v^T v). This matrix reflects points across the hyperplane perpendicular to v. Here’s a simple demonstration:

def householder_reflection(x):
    """
    Compute Householder reflection that zeros out all but first element of x.
    
    Returns:
        H: Householder matrix
        v: Householder vector
    """
    v = x.copy()
    v[0] += np.sign(x[0]) * np.linalg.norm(x)
    v = v / np.linalg.norm(v)
    
    H = np.eye(len(x)) - 2 * np.outer(v, v)
    return H, v

# Example: reflect a vector onto the x-axis
x = np.array([3.0, 4.0, 0.0])
H, v = householder_reflection(x)

result = H @ x
print("Original vector:", x)
print("Reflected vector:", result)
print("Expected: [±5, 0, 0]")

The full Householder QR algorithm applies successive reflections to each column, building up Q as the product of these reflections. This approach is O(n³) but with better constants and numerical properties than Gram-Schmidt.

Practical Applications

QR decomposition shines in solving least squares problems. Consider linear regression: given data matrix X and target vector y, we want to find coefficients β that minimize ||Xβ - y||².

The normal equations approach computes β = (X^T X)^(-1) X^T y, but this squares the condition number. Using QR decomposition, we solve Rβ = Q^T y directly:

# Generate synthetic regression data
np.random.seed(42)
n_samples, n_features = 100, 5
X = np.random.randn(n_samples, n_features)
true_coef = np.random.randn(n_features)
y = X @ true_coef + 0.1 * np.random.randn(n_samples)

# Method 1: Normal equations (numerically unstable)
coef_normal = np.linalg.inv(X.T @ X) @ X.T @ y

# Method 2: QR decomposition (stable)
Q, R = np.linalg.qr(X)
coef_qr = np.linalg.solve(R, Q.T @ y)

# Method 3: NumPy's lstsq (uses SVD, most stable but slower)
coef_lstsq = np.linalg.lstsq(X, y, rcond=None)[0]

print("True coefficients:", true_coef)
print("Normal equations:", coef_normal)
print("QR decomposition:", coef_qr)
print("lstsq (SVD):", coef_lstsq)
print("\nQR error:", np.linalg.norm(true_coef - coef_qr))
print("Normal equations error:", np.linalg.norm(true_coef - coef_normal))

For well-conditioned problems, all methods agree. But as condition numbers grow, normal equations deteriorate while QR remains stable.

Implementation and Performance

Modern numerical libraries provide highly optimized QR implementations. NumPy uses LAPACK’s dgeqrf routine, which employs blocked Householder reflections for cache efficiency.

Let’s benchmark QR decomposition across different matrix sizes:

import time

def benchmark_qr(sizes):
    """Benchmark QR decomposition for different matrix sizes."""
    results = []
    
    for n in sizes:
        A = np.random.randn(n, n)
        
        start = time.time()
        Q, R = np.linalg.qr(A)
        elapsed = time.time() - start
        
        results.append((n, elapsed))
        print(f"n={n:4d}: {elapsed:.4f} seconds")
    
    return results

sizes = [100, 200, 400, 800, 1600]
results = benchmark_qr(sizes)

# Verify O(n³) complexity
for i in range(1, len(results)):
    n_ratio = results[i][0] / results[i-1][0]
    time_ratio = results[i][1] / results[i-1][1]
    print(f"Size ratio: {n_ratio:.1f}x, Time ratio: {time_ratio:.1f}x "
          f"(expected ~{n_ratio**3:.1f}x for O(n³))")

When should you use QR decomposition? Choose QR when you need:

Stable solutions to least squares problems
Orthonormal bases (Gram-Schmidt applications)
Foundation for iterative eigenvalue algorithms

Use alternatives when:

You need the full spectrum (use SVD)
Matrix is sparse (use iterative methods)
You’re solving multiple systems with the same matrix (LU with pivoting may be faster)

Conclusion

QR decomposition is a workhorse of numerical linear algebra, offering an optimal balance between stability, versatility, and computational cost. Its ability to maintain numerical precision while solving least squares problems makes it indispensable for data analysis, while its role in eigenvalue algorithms cements its importance in scientific computing.

The key takeaway: when you’re solving overdetermined systems or need orthogonal bases, reach for QR decomposition. Use library implementations like NumPy’s qr() for production code—they incorporate decades of numerical analysis research. Understanding the underlying Gram-Schmidt and Householder methods helps you reason about numerical behavior and debug unexpected results.

For deeper exploration, study the QR algorithm for eigenvalues, investigate rank-revealing QR factorizations, and explore how QR decomposition integrates into modern machine learning libraries. The principles you’ve learned here form the foundation for understanding more advanced matrix factorizations and their applications.