NumPy - Cholesky Decomposition
Cholesky decomposition transforms a symmetric positive definite matrix A into the product of a lower triangular matrix L and its transpose: A = L·L^T. This factorization is unique when A is positive...
Key Insights
- Cholesky decomposition factors a positive definite matrix into L·L^T, where L is lower triangular, providing a computationally efficient alternative to LU decomposition with half the operations
- NumPy’s
numpy.linalg.cholesky()performs the decomposition in O(n³/3) time, making it ideal for solving linear systems, generating correlated random variables, and optimizing matrix operations in machine learning - The decomposition requires strict positive definiteness—matrices must be symmetric with positive eigenvalues—and fails on ill-conditioned or near-singular matrices without proper numerical handling
Understanding Cholesky Decomposition
Cholesky decomposition transforms a symmetric positive definite matrix A into the product of a lower triangular matrix L and its transpose: A = L·L^T. This factorization is unique when A is positive definite and provides significant computational advantages over general matrix factorizations.
import numpy as np
# Create a positive definite matrix
A = np.array([[4, 2, 1],
[2, 5, 3],
[1, 3, 6]], dtype=float)
# Perform Cholesky decomposition
L = np.linalg.cholesky(A)
print("Original matrix A:")
print(A)
print("\nLower triangular matrix L:")
print(L)
print("\nVerification L @ L.T:")
print(L @ L.T)
The decomposition requires exactly half the operations of LU decomposition because it exploits the symmetry of the input matrix. For an n×n matrix, Cholesky needs approximately n³/3 floating-point operations compared to 2n³/3 for LU.
Solving Linear Systems
Cholesky decomposition excels at solving systems of linear equations Ax = b when A is positive definite. The process involves two triangular solves instead of one general matrix inversion.
# Solve Ax = b using Cholesky decomposition
A = np.array([[4, 2, 1],
[2, 5, 3],
[1, 3, 6]], dtype=float)
b = np.array([7, 13, 10], dtype=float)
# Decompose A = L @ L.T
L = np.linalg.cholesky(A)
# Solve L @ y = b (forward substitution)
y = np.linalg.solve(L, b)
# Solve L.T @ x = y (backward substitution)
x = np.linalg.solve(L.T, y)
print("Solution x:", x)
print("Verification A @ x:", A @ x)
print("Expected b:", b)
print("Error:", np.linalg.norm(A @ x - b))
This approach is numerically stable and faster than direct inversion. For repeated solves with the same matrix but different right-hand sides, compute L once and reuse it:
# Multiple right-hand sides
L = np.linalg.cholesky(A)
b_vectors = np.array([[7, 13, 10],
[1, 2, 3],
[5, 8, 11]]).T
solutions = []
for b in b_vectors.T:
y = np.linalg.solve(L, b)
x = np.linalg.solve(L.T, y)
solutions.append(x)
solutions = np.array(solutions)
print("Solutions:\n", solutions)
Generating Correlated Random Variables
Cholesky decomposition generates multivariate normal distributions with specified covariance structures, essential for Monte Carlo simulations and financial modeling.
# Generate correlated random variables
np.random.seed(42)
# Desired covariance matrix
cov_matrix = np.array([[1.0, 0.6, 0.3],
[0.6, 1.0, 0.5],
[0.3, 0.5, 1.0]])
# Cholesky decomposition of covariance
L = np.linalg.cholesky(cov_matrix)
# Generate uncorrelated standard normal samples
n_samples = 1000
uncorrelated = np.random.standard_normal((3, n_samples))
# Transform to correlated samples
correlated = L @ uncorrelated
# Verify the covariance structure
empirical_cov = np.cov(correlated)
print("Target covariance:\n", cov_matrix)
print("\nEmpirical covariance:\n", empirical_cov)
print("\nCovariance error:\n", np.abs(cov_matrix - empirical_cov))
This technique works because if Z ~ N(0, I) and Y = LZ, then Cov(Y) = L·Cov(Z)·L^T = L·I·L^T = LL^T = Σ, where Σ is the target covariance matrix.
Matrix Inversion and Determinants
Cholesky decomposition simplifies computing matrix inverses and determinants for positive definite matrices.
# Compute inverse using Cholesky
A = np.array([[4, 2, 1],
[2, 5, 3],
[1, 3, 6]], dtype=float)
L = np.linalg.cholesky(A)
# Inverse of A = (L.T)^-1 @ L^-1
L_inv = np.linalg.inv(L)
A_inv = L_inv.T @ L_inv
print("A inverse via Cholesky:\n", A_inv)
print("Direct inverse:\n", np.linalg.inv(A))
print("Verification A @ A_inv:\n", A @ A_inv)
# Determinant: det(A) = det(L)^2 = (product of diagonal elements)^2
det_A = np.prod(np.diag(L)) ** 2
print("\nDeterminant via Cholesky:", det_A)
print("Direct determinant:", np.linalg.det(A))
The determinant calculation is numerically stable because it only involves multiplication of diagonal elements, avoiding the subtraction operations that cause numerical cancellation in direct methods.
Handling Numerical Issues
Real-world matrices often have numerical issues. Implementing robust checks prevents silent failures.
def safe_cholesky(A, epsilon=1e-10):
"""Perform Cholesky with numerical checks."""
# Check symmetry
if not np.allclose(A, A.T):
raise ValueError("Matrix is not symmetric")
# Check positive definiteness via eigenvalues
eigenvalues = np.linalg.eigvalsh(A)
if np.any(eigenvalues < -epsilon):
raise ValueError(f"Matrix is not positive definite. Min eigenvalue: {eigenvalues.min()}")
# Add regularization for near-singular matrices
if eigenvalues.min() < epsilon:
print(f"Warning: Matrix is ill-conditioned. Adding regularization.")
A = A + epsilon * np.eye(A.shape[0])
try:
return np.linalg.cholesky(A)
except np.linalg.LinAlgError as e:
raise ValueError(f"Cholesky decomposition failed: {e}")
# Test with ill-conditioned matrix
A_ill = np.array([[1.0, 0.999],
[0.999, 1.0]])
L = safe_cholesky(A_ill)
print("Regularized decomposition successful")
print("Condition number:", np.linalg.cond(A_ill))
Application: Gaussian Process Regression
Cholesky decomposition is fundamental to Gaussian process computations, where covariance matrices are inherently positive definite.
def gaussian_process_predict(X_train, y_train, X_test, kernel_func, noise=1e-6):
"""Simple GP regression using Cholesky decomposition."""
n_train = X_train.shape[0]
# Compute kernel matrices
K = kernel_func(X_train, X_train) + noise * np.eye(n_train)
K_star = kernel_func(X_test, X_train)
# Cholesky decomposition of K
L = np.linalg.cholesky(K)
# Solve for alpha: K @ alpha = y
alpha = np.linalg.solve(L.T, np.linalg.solve(L, y_train))
# Predictive mean
mu = K_star @ alpha
return mu
# RBF kernel
def rbf_kernel(X1, X2, length_scale=1.0):
"""Radial basis function kernel."""
X1_sq = np.sum(X1**2, axis=1).reshape(-1, 1)
X2_sq = np.sum(X2**2, axis=1).reshape(1, -1)
distances_sq = X1_sq + X2_sq - 2 * X1 @ X2.T
return np.exp(-distances_sq / (2 * length_scale**2))
# Generate test data
X_train = np.linspace(0, 10, 20).reshape(-1, 1)
y_train = np.sin(X_train).ravel() + np.random.normal(0, 0.1, 20)
X_test = np.linspace(0, 10, 100).reshape(-1, 1)
predictions = gaussian_process_predict(X_train, y_train, X_test, rbf_kernel)
print(f"Predictions shape: {predictions.shape}")
print(f"Mean prediction: {predictions.mean():.4f}")
Performance Considerations
Cholesky decomposition’s efficiency makes it preferable for large positive definite systems.
import time
# Compare solving methods
sizes = [100, 500, 1000]
for n in sizes:
# Generate random positive definite matrix
A_rand = np.random.randn(n, n)
A = A_rand.T @ A_rand + np.eye(n)
b = np.random.randn(n)
# Method 1: Direct solve
start = time.time()
x1 = np.linalg.solve(A, b)
time_direct = time.time() - start
# Method 2: Cholesky solve
start = time.time()
L = np.linalg.cholesky(A)
y = np.linalg.solve(L, b)
x2 = np.linalg.solve(L.T, y)
time_cholesky = time.time() - start
print(f"\nSize {n}x{n}:")
print(f" Direct solve: {time_direct*1000:.2f} ms")
print(f" Cholesky solve: {time_cholesky*1000:.2f} ms")
print(f" Speedup: {time_direct/time_cholesky:.2f}x")
print(f" Solution error: {np.linalg.norm(x1 - x2):.2e}")
Cholesky decomposition provides both theoretical elegance and practical performance benefits for positive definite matrices. Its applications span numerical linear algebra, statistics, optimization, and machine learning, making it an essential tool for computational applications requiring efficient matrix operations.