NumPy - Singular Value Decomposition (SVD)
Singular Value Decomposition factorizes an m×n matrix A into three component matrices:
Key Insights
- SVD decomposes any matrix A into three matrices (U, Σ, V^T) where U and V are orthogonal and Σ is diagonal, enabling dimensionality reduction, noise filtering, and solving linear systems
- Unlike eigendecomposition which only works on square matrices, SVD applies to any m×n matrix and provides numerical stability for rank-deficient or ill-conditioned matrices
- Practical applications include image compression (reducing storage by 90%+), recommender systems (collaborative filtering), and principal component analysis for high-dimensional data
Understanding SVD Fundamentals
Singular Value Decomposition factorizes an m×n matrix A into three component matrices:
A = UΣV^T
Where:
- U is an m×m orthogonal matrix (left singular vectors)
- Σ is an m×n diagonal matrix (singular values in descending order)
- V^T is an n×n orthogonal matrix (right singular vectors)
import numpy as np
# Create a sample matrix
A = np.array([[3, 1, 1],
[-1, 3, 1],
[1, 1, 3],
[1, -1, 1]])
# Perform SVD
U, s, Vt = np.linalg.svd(A, full_matrices=True)
print(f"Original shape: {A.shape}")
print(f"U shape: {U.shape}")
print(f"s shape: {s.shape}")
print(f"Vt shape: {Vt.shape}")
# Reconstruct Σ matrix
Sigma = np.zeros((A.shape[0], A.shape[1]))
Sigma[:A.shape[1], :A.shape[1]] = np.diag(s)
# Verify reconstruction
A_reconstructed = U @ Sigma @ Vt
print(f"\nReconstruction error: {np.linalg.norm(A - A_reconstructed)}")
Note that np.linalg.svd returns singular values as a 1D array, not a matrix. You must construct the diagonal matrix manually for full reconstruction.
Low-Rank Approximation and Compression
SVD enables optimal low-rank approximations. By keeping only the k largest singular values, you minimize the Frobenius norm of the approximation error.
def low_rank_approximation(A, k):
"""
Approximate matrix A using only k singular values.
Args:
A: Input matrix (m x n)
k: Number of singular values to retain
Returns:
Approximated matrix
"""
U, s, Vt = np.linalg.svd(A, full_matrices=False)
# Zero out all but the k largest singular values
s[k:] = 0
# Reconstruct
Sigma_k = np.diag(s)
A_k = U @ Sigma_k @ Vt
return A_k, s
# Example with random data
np.random.seed(42)
A = np.random.randn(100, 50)
for k in [5, 10, 20]:
A_approx, singular_values = low_rank_approximation(A, k)
error = np.linalg.norm(A - A_approx, 'fro')
compression_ratio = (k * (A.shape[0] + A.shape[1])) / (A.shape[0] * A.shape[1])
print(f"k={k}: Error={error:.4f}, Compression={compression_ratio:.2%}")
This technique is fundamental for data compression. Instead of storing m×n values, you store k(m+n)+k values, where k « min(m,n).
Image Compression with SVD
SVD provides lossy image compression by treating grayscale images as matrices.
from PIL import Image
import matplotlib.pyplot as plt
def compress_image(image_path, k_values):
"""
Compress a grayscale image using SVD with different ranks.
Args:
image_path: Path to image file
k_values: List of ranks to test
"""
# Load and convert to grayscale
img = Image.open(image_path).convert('L')
img_array = np.array(img, dtype=np.float64)
# Perform SVD
U, s, Vt = np.linalg.svd(img_array, full_matrices=False)
results = {}
for k in k_values:
# Reconstruct using k components
S_k = np.diag(s[:k])
compressed = U[:, :k] @ S_k @ Vt[:k, :]
# Calculate metrics
original_size = img_array.size * 8 # bits
compressed_size = (U[:, :k].size + k + Vt[:k, :].size) * 8
compression_ratio = compressed_size / original_size
mse = np.mean((img_array - compressed) ** 2)
psnr = 10 * np.log10(255**2 / mse)
results[k] = {
'image': compressed,
'compression_ratio': compression_ratio,
'psnr': psnr
}
print(f"k={k}: Compression={compression_ratio:.2%}, PSNR={psnr:.2f}dB")
return results
# Usage example (requires actual image file)
# results = compress_image('sample.jpg', [10, 25, 50, 100])
For a 512×512 image, using k=50 typically achieves 90%+ compression while maintaining visual quality (PSNR > 30dB).
Solving Linear Systems and Pseudoinverse
SVD provides the most numerically stable method for computing the Moore-Penrose pseudoinverse, essential for solving overdetermined or underdetermined linear systems.
def solve_with_svd(A, b, rcond=1e-15):
"""
Solve Ax = b using SVD, handling singular/rectangular matrices.
Args:
A: Coefficient matrix (m x n)
b: Right-hand side (m,)
rcond: Cutoff for small singular values
Returns:
x: Solution vector
residual: Residual norm
"""
U, s, Vt = np.linalg.svd(A, full_matrices=False)
# Filter small singular values
threshold = rcond * s[0]
s_inv = np.where(s > threshold, 1/s, 0)
# Compute pseudoinverse: A+ = V @ S+ @ U^T
A_pinv = Vt.T @ np.diag(s_inv) @ U.T
# Solve
x = A_pinv @ b
residual = np.linalg.norm(A @ x - b)
return x, residual
# Overdetermined system (more equations than unknowns)
A = np.array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
b = np.array([1, 2, 3, 4])
x, res = solve_with_svd(A, b)
print(f"Solution: {x}")
print(f"Residual: {res:.6f}")
# Compare with NumPy's lstsq
x_lstsq, residuals, rank, s = np.linalg.lstsq(A, b, rcond=None)
print(f"NumPy lstsq solution: {x_lstsq}")
print(f"Difference: {np.linalg.norm(x - x_lstsq):.10f}")
Principal Component Analysis
SVD is the computational backbone of PCA, providing a more numerically stable alternative to eigendecomposition of the covariance matrix.
def pca_with_svd(X, n_components):
"""
Perform PCA using SVD.
Args:
X: Data matrix (n_samples x n_features)
n_components: Number of principal components
Returns:
X_transformed: Transformed data
components: Principal components
explained_variance: Variance explained by each component
"""
# Center the data
X_centered = X - np.mean(X, axis=0)
# Perform SVD
U, s, Vt = np.linalg.svd(X_centered, full_matrices=False)
# Principal components are rows of Vt
components = Vt[:n_components]
# Transform data
X_transformed = X_centered @ components.T
# Explained variance
explained_variance = (s ** 2) / (X.shape[0] - 1)
explained_variance_ratio = explained_variance / explained_variance.sum()
return X_transformed, components, explained_variance_ratio[:n_components]
# Generate sample data
np.random.seed(42)
X = np.random.randn(100, 5)
X[:, 1] = X[:, 0] * 2 + np.random.randn(100) * 0.1 # Correlated feature
# Apply PCA
X_pca, components, var_ratio = pca_with_svd(X, n_components=3)
print(f"Transformed shape: {X_pca.shape}")
print(f"Explained variance ratio: {var_ratio}")
print(f"Cumulative variance: {np.cumsum(var_ratio)}")
Condition Number and Numerical Stability
SVD provides insight into matrix conditioning, critical for understanding numerical stability in computations.
def analyze_conditioning(A):
"""
Analyze matrix conditioning using SVD.
Args:
A: Input matrix
Returns:
Dictionary with conditioning metrics
"""
U, s, Vt = np.linalg.svd(A)
# Condition number
cond = s[0] / s[-1] if s[-1] > 0 else np.inf
# Effective rank (singular values > tolerance)
tol = s[0] * max(A.shape) * np.finfo(float).eps
rank = np.sum(s > tol)
# Null space dimension
null_space_dim = A.shape[1] - rank
return {
'condition_number': cond,
'rank': rank,
'null_space_dim': null_space_dim,
'singular_values': s,
'log_singular_values': np.log10(s + 1e-16)
}
# Well-conditioned matrix
A_good = np.diag([10, 5, 3, 1])
print("Well-conditioned matrix:")
print(analyze_conditioning(A_good))
# Ill-conditioned matrix
A_bad = np.array([[1, 1], [1, 1.0001]])
print("\nIll-conditioned matrix:")
print(analyze_conditioning(A_bad))
When the condition number exceeds 10^15, expect significant numerical errors. SVD-based methods handle ill-conditioned matrices better than direct methods like Gaussian elimination.
Performance Considerations
SVD computation is O(min(mn^2, m^2n)) for an m×n matrix. Use full_matrices=False when you don’t need the complete orthogonal matrices.
import time
def benchmark_svd(sizes):
"""Compare full vs reduced SVD performance."""
for m, n in sizes:
A = np.random.randn(m, n)
# Full SVD
start = time.time()
U_full, s_full, Vt_full = np.linalg.svd(A, full_matrices=True)
time_full = time.time() - start
# Reduced SVD
start = time.time()
U_red, s_red, Vt_red = np.linalg.svd(A, full_matrices=False)
time_reduced = time.time() - start
print(f"{m}×{n}: Full={time_full:.4f}s, Reduced={time_reduced:.4f}s, "
f"Speedup={time_full/time_reduced:.2f}x")
benchmark_svd([(500, 100), (1000, 200), (2000, 400)])
For large-scale problems, consider randomized SVD algorithms (scipy.sparse.linalg.svds) which compute only the k largest singular values in O(mnk) time.