Linear Algebra: SVD Explained
Singular Value Decomposition (SVD) is one of the most important matrix factorization techniques in applied mathematics. Whether you're building recommender systems, compressing images, or reducing...
Key Insights
- SVD decomposes any matrix into three components (U, Σ, V^T) representing rotation, scaling, and another rotation—making it more versatile than eigendecomposition which only works on square matrices
- Truncated SVD enables powerful applications like image compression and recommender systems by approximating matrices with lower-rank versions that capture most of the variance
- Unlike PCA which requires mean-centering, SVD works directly on raw data matrices and is numerically more stable, making it the preferred choice for most dimensionality reduction tasks
Introduction to SVD
Singular Value Decomposition (SVD) is one of the most important matrix factorization techniques in applied mathematics. Whether you’re building recommender systems, compressing images, or reducing dimensionality in machine learning pipelines, SVD provides a mathematically rigorous foundation for these tasks.
The core idea is deceptively simple: any m×n matrix A can be decomposed into three matrices:
A = UΣV^T
Where U is an m×m orthogonal matrix, Σ is an m×n diagonal matrix containing singular values, and V^T is the transpose of an n×n orthogonal matrix. This decomposition always exists for any real or complex matrix, making it more general than eigendecomposition.
Let’s start with a basic example:
import numpy as np
# Create a sample matrix
A = np.array([[3, 1, 1],
[-1, 3, 1]])
# Perform SVD
U, s, Vt = np.linalg.svd(A, full_matrices=True)
print("Original matrix A:")
print(A)
print("\nU (left singular vectors):")
print(U)
print("\nSingular values:")
print(s)
print("\nV^T (right singular vectors transposed):")
print(Vt)
# Reconstruct the original matrix
Sigma = np.zeros_like(A, dtype=float)
np.fill_diagonal(Sigma, s)
A_reconstructed = U @ Sigma @ Vt
print("\nReconstructed A:")
print(A_reconstructed)
print("\nReconstruction error:", np.linalg.norm(A - A_reconstructed))
Mathematical Foundation
Understanding what U, Σ, and V^T represent is crucial for applying SVD effectively.
The U matrix contains the left singular vectors as columns. These are orthonormal vectors that form a basis for the column space of A. Each column of U represents a direction in the output space.
The Σ matrix contains singular values on its diagonal, arranged in descending order. These values represent the “importance” or “energy” of each component. Large singular values indicate directions where the matrix stretches the input significantly.
The V^T matrix (or V) contains the right singular vectors as rows (columns in V). These orthonormal vectors form a basis for the row space of A and represent directions in the input space.
Geometrically, the decomposition represents a sequence of transformations: V^T rotates the input space, Σ scales along principal axes, and U rotates the result into the output space.
# Verify orthogonality of U and V
def verify_orthogonality(matrix, name):
product = matrix.T @ matrix
identity = np.eye(matrix.shape[1])
error = np.linalg.norm(product - identity)
print(f"{name} orthogonality error: {error:.2e}")
return error < 1e-10
verify_orthogonality(U, "U")
verify_orthogonality(Vt.T, "V")
# Visualize a 2D transformation
import matplotlib.pyplot as plt
def plot_transformation():
# Create a simple 2x2 matrix
A_2d = np.array([[3, 1],
[1, 2]])
U_2d, s_2d, Vt_2d = np.linalg.svd(A_2d)
# Create unit circle points
theta = np.linspace(0, 2*np.pi, 100)
circle = np.array([np.cos(theta), np.sin(theta)])
# Apply transformation
transformed = A_2d @ circle
plt.figure(figsize=(12, 4))
plt.subplot(131)
plt.plot(circle[0], circle[1], 'b-', label='Original circle')
plt.axis('equal')
plt.grid(True)
plt.title('Input Space')
plt.subplot(132)
rotated = Vt_2d @ circle
plt.plot(rotated[0], rotated[1], 'g-', label='After V^T rotation')
plt.axis('equal')
plt.grid(True)
plt.title('After V^T')
plt.subplot(133)
plt.plot(transformed[0], transformed[1], 'r-', label='Final transformation')
plt.axis('equal')
plt.grid(True)
plt.title('After UΣV^T')
plt.tight_layout()
plt.savefig('svd_transformation.png', dpi=150, bbox_inches='tight')
plot_transformation()
Computing SVD
SVD is computed through the relationship with eigenvalue decomposition. The key insight is:
- The columns of V are eigenvectors of A^T A
- The columns of U are eigenvectors of AA^T
- The singular values are square roots of the eigenvalues of A^T A (or AA^T)
Here’s a manual implementation for educational purposes:
def manual_svd(A):
"""
Compute SVD manually using eigendecomposition
"""
m, n = A.shape
# Compute A^T A and its eigendecomposition
ATA = A.T @ A
eigenvalues, V = np.linalg.eigh(ATA)
# Sort in descending order
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
V = V[:, idx]
# Singular values are square roots of eigenvalues
singular_values = np.sqrt(np.maximum(eigenvalues, 0))
# Compute U from A and V
# U = A V Σ^(-1)
U = np.zeros((m, m))
for i in range(min(m, n)):
if singular_values[i] > 1e-10:
U[:, i] = (A @ V[:, i]) / singular_values[i]
# Complete U to orthonormal basis if needed
if m > n:
# Fill remaining columns with orthonormal vectors
Q, R = np.linalg.qr(np.random.randn(m, m - n))
U[:, n:] = Q
return U, singular_values, V.T
# Test manual implementation
A_test = np.array([[3, 1, 1],
[-1, 3, 1]], dtype=float)
U_manual, s_manual, Vt_manual = manual_svd(A_test)
U_numpy, s_numpy, Vt_numpy = np.linalg.svd(A_test, full_matrices=True)
print("Manual singular values:", s_manual)
print("NumPy singular values:", s_numpy)
print("Difference:", np.linalg.norm(s_manual - s_numpy))
Practical Applications
The real power of SVD emerges in its applications. Let’s explore image compression, one of the most intuitive use cases.
from PIL import Image
import matplotlib.pyplot as plt
def compress_image_svd(image_path, ranks):
"""
Compress image using truncated SVD at different ranks
"""
# Load and convert to grayscale
img = Image.open(image_path).convert('L')
A = np.array(img, dtype=float)
# Perform SVD
U, s, Vt = np.linalg.svd(A, full_matrices=False)
results = {}
for rank in ranks:
# Truncate to rank k
U_k = U[:, :rank]
s_k = s[:rank]
Vt_k = Vt[:rank, :]
# Reconstruct
A_k = U_k @ np.diag(s_k) @ Vt_k
A_k = np.clip(A_k, 0, 255)
# Calculate compression ratio
original_size = A.shape[0] * A.shape[1]
compressed_size = rank * (A.shape[0] + A.shape[1] + 1)
compression_ratio = original_size / compressed_size
results[rank] = {
'image': A_k,
'compression_ratio': compression_ratio,
'variance_explained': np.sum(s_k**2) / np.sum(s**2)
}
return A, results
# Example usage (assuming you have an image file)
# original, compressed = compress_image_svd('sample.jpg', [5, 20, 50, 100])
#
# for rank, data in compressed.items():
# print(f"Rank {rank}:")
# print(f" Compression ratio: {data['compression_ratio']:.2f}x")
# print(f" Variance explained: {data['variance_explained']*100:.2f}%")
For recommender systems, SVD helps discover latent factors in user-item matrices:
def collaborative_filtering_svd(ratings_matrix, k=10):
"""
Simple collaborative filtering using SVD
ratings_matrix: users × items matrix with ratings
k: number of latent factors
"""
# Handle missing values by replacing with row means
row_means = np.nanmean(ratings_matrix, axis=1, keepdims=True)
ratings_filled = np.where(np.isnan(ratings_matrix),
row_means,
ratings_matrix)
# Perform SVD
U, s, Vt = np.linalg.svd(ratings_filled, full_matrices=False)
# Truncate to k factors
U_k = U[:, :k]
s_k = s[:k]
Vt_k = Vt[:k, :]
# Reconstruct predictions
predictions = U_k @ np.diag(s_k) @ Vt_k
return predictions
# Example with synthetic data
n_users, n_items = 100, 50
ratings = np.random.rand(n_users, n_items) * 5
# Mask 30% as missing
mask = np.random.rand(n_users, n_items) < 0.3
ratings[mask] = np.nan
predictions = collaborative_filtering_svd(ratings, k=10)
SVD vs. Other Decomposition Methods
SVD vs. PCA: PCA is essentially SVD on mean-centered data. When you perform PCA, you’re computing the SVD of the covariance matrix. Use PCA when you need interpretable principal components and variance explained. Use SVD directly when working with sparse matrices or when mean-centering is problematic.
SVD vs. Eigendecomposition: Eigendecomposition only works on square matrices, while SVD works on any matrix. For symmetric matrices, they’re equivalent, but SVD is numerically more stable.
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
# Generate sample data
X = np.random.randn(100, 5)
# PCA approach
pca = PCA(n_components=3)
X_pca = pca.fit_transform(X)
# SVD approach (manual PCA)
X_centered = X - X.mean(axis=0)
U, s, Vt = np.linalg.svd(X_centered, full_matrices=False)
X_svd = U[:, :3] @ np.diag(s[:3])
print("PCA components shape:", X_pca.shape)
print("SVD components shape:", X_svd.shape)
print("Difference:", np.linalg.norm(np.abs(X_pca) - np.abs(X_svd)))
Implementation Considerations
SVD computational complexity is O(min(mn², m²n)) for an m×n matrix. For large matrices, this becomes prohibitive. Use truncated SVD when you only need the top k singular values:
from scipy.sparse.linalg import svds
from scipy.sparse import csr_matrix
import time
def benchmark_svd(sizes):
"""
Benchmark SVD performance across different matrix sizes
"""
results = []
for m, n in sizes:
A = np.random.randn(m, n)
# Full SVD
start = time.time()
U, s, Vt = np.linalg.svd(A, full_matrices=False)
full_time = time.time() - start
# Truncated SVD (top 10 components)
k = min(10, min(m, n) - 1)
start = time.time()
U_k, s_k, Vt_k = svds(A, k=k)
truncated_time = time.time() - start
results.append({
'size': f"{m}×{n}",
'full_svd': full_time,
'truncated_svd': truncated_time,
'speedup': full_time / truncated_time
})
return results
# Benchmark
sizes = [(100, 100), (500, 500), (1000, 1000), (2000, 500)]
benchmark_results = benchmark_svd(sizes)
for result in benchmark_results:
print(f"Matrix {result['size']}:")
print(f" Full SVD: {result['full_svd']:.3f}s")
print(f" Truncated SVD: {result['truncated_svd']:.3f}s")
print(f" Speedup: {result['speedup']:.2f}x\n")
For sparse matrices, always use scipy.sparse.linalg.svds() instead of converting to dense format. The memory savings are substantial:
from scipy.sparse import random as sparse_random
# Create sparse matrix (1% density)
sparse_matrix = sparse_random(1000, 1000, density=0.01, format='csr')
# Compute truncated SVD on sparse matrix
U_sparse, s_sparse, Vt_sparse = svds(sparse_matrix, k=10)
print(f"Sparse matrix memory: {sparse_matrix.data.nbytes / 1024:.2f} KB")
print(f"Dense equivalent would be: {1000*1000*8 / 1024:.2f} KB")
SVD is numerically stable and handles ill-conditioned matrices better than direct eigendecomposition. Modern implementations use iterative algorithms like the Golub-Kahan-Lanczos bidiagonalization for large matrices.
When choosing between libraries, NumPy’s implementation is sufficient for small to medium matrices. For large-scale problems, use SciPy’s sparse implementations or specialized libraries like ARPACK (wrapped by SciPy). For machine learning workflows, scikit-learn’s TruncatedSVD provides a clean interface that integrates with pipelines.
The key is understanding your matrix properties: size, sparsity, and whether you need all singular values or just the top k. This determines both your implementation choice and computational feasibility.