Linear Algebra: SVD Explained

Key Insights

SVD decomposes any matrix into three components (U, Σ, V^T) representing rotation, scaling, and another rotation—making it more versatile than eigendecomposition which only works on square matrices
Truncated SVD enables powerful applications like image compression and recommender systems by approximating matrices with lower-rank versions that capture most of the variance
Unlike PCA which requires mean-centering, SVD works directly on raw data matrices and is numerically more stable, making it the preferred choice for most dimensionality reduction tasks

Introduction to SVD

Singular Value Decomposition (SVD) is one of the most important matrix factorization techniques in applied mathematics. Whether you’re building recommender systems, compressing images, or reducing dimensionality in machine learning pipelines, SVD provides a mathematically rigorous foundation for these tasks.

The core idea is deceptively simple: any m×n matrix A can be decomposed into three matrices:

A = UΣV^T

Where U is an m×m orthogonal matrix, Σ is an m×n diagonal matrix containing singular values, and V^T is the transpose of an n×n orthogonal matrix. This decomposition always exists for any real or complex matrix, making it more general than eigendecomposition.

Let’s start with a basic example:

import numpy as np

# Create a sample matrix
A = np.array([[3, 1, 1],
              [-1, 3, 1]])

# Perform SVD
U, s, Vt = np.linalg.svd(A, full_matrices=True)

print("Original matrix A:")
print(A)
print("\nU (left singular vectors):")
print(U)
print("\nSingular values:")
print(s)
print("\nV^T (right singular vectors transposed):")
print(Vt)

# Reconstruct the original matrix
Sigma = np.zeros_like(A, dtype=float)
np.fill_diagonal(Sigma, s)
A_reconstructed = U @ Sigma @ Vt

print("\nReconstructed A:")
print(A_reconstructed)
print("\nReconstruction error:", np.linalg.norm(A - A_reconstructed))

Mathematical Foundation

Understanding what U, Σ, and V^T represent is crucial for applying SVD effectively.

The U matrix contains the left singular vectors as columns. These are orthonormal vectors that form a basis for the column space of A. Each column of U represents a direction in the output space.

The Σ matrix contains singular values on its diagonal, arranged in descending order. These values represent the “importance” or “energy” of each component. Large singular values indicate directions where the matrix stretches the input significantly.

The V^T matrix (or V) contains the right singular vectors as rows (columns in V). These orthonormal vectors form a basis for the row space of A and represent directions in the input space.

Geometrically, the decomposition represents a sequence of transformations: V^T rotates the input space, Σ scales along principal axes, and U rotates the result into the output space.

# Verify orthogonality of U and V
def verify_orthogonality(matrix, name):
    product = matrix.T @ matrix
    identity = np.eye(matrix.shape[1])
    error = np.linalg.norm(product - identity)
    print(f"{name} orthogonality error: {error:.2e}")
    return error < 1e-10

verify_orthogonality(U, "U")
verify_orthogonality(Vt.T, "V")

# Visualize a 2D transformation
import matplotlib.pyplot as plt

def plot_transformation():
    # Create a simple 2x2 matrix
    A_2d = np.array([[3, 1],
                     [1, 2]])
    
    U_2d, s_2d, Vt_2d = np.linalg.svd(A_2d)
    
    # Create unit circle points
    theta = np.linspace(0, 2*np.pi, 100)
    circle = np.array([np.cos(theta), np.sin(theta)])
    
    # Apply transformation
    transformed = A_2d @ circle
    
    plt.figure(figsize=(12, 4))
    
    plt.subplot(131)
    plt.plot(circle[0], circle[1], 'b-', label='Original circle')
    plt.axis('equal')
    plt.grid(True)
    plt.title('Input Space')
    
    plt.subplot(132)
    rotated = Vt_2d @ circle
    plt.plot(rotated[0], rotated[1], 'g-', label='After V^T rotation')
    plt.axis('equal')
    plt.grid(True)
    plt.title('After V^T')
    
    plt.subplot(133)
    plt.plot(transformed[0], transformed[1], 'r-', label='Final transformation')
    plt.axis('equal')
    plt.grid(True)
    plt.title('After UΣV^T')
    
    plt.tight_layout()
    plt.savefig('svd_transformation.png', dpi=150, bbox_inches='tight')
    
plot_transformation()

Computing SVD

SVD is computed through the relationship with eigenvalue decomposition. The key insight is:

The columns of V are eigenvectors of A^T A
The columns of U are eigenvectors of AA^T
The singular values are square roots of the eigenvalues of A^T A (or AA^T)

Here’s a manual implementation for educational purposes:

def manual_svd(A):
    """
    Compute SVD manually using eigendecomposition
    """
    m, n = A.shape
    
    # Compute A^T A and its eigendecomposition
    ATA = A.T @ A
    eigenvalues, V = np.linalg.eigh(ATA)
    
    # Sort in descending order
    idx = eigenvalues.argsort()[::-1]
    eigenvalues = eigenvalues[idx]
    V = V[:, idx]
    
    # Singular values are square roots of eigenvalues
    singular_values = np.sqrt(np.maximum(eigenvalues, 0))
    
    # Compute U from A and V
    # U = A V Σ^(-1)
    U = np.zeros((m, m))
    for i in range(min(m, n)):
        if singular_values[i] > 1e-10:
            U[:, i] = (A @ V[:, i]) / singular_values[i]
    
    # Complete U to orthonormal basis if needed
    if m > n:
        # Fill remaining columns with orthonormal vectors
        Q, R = np.linalg.qr(np.random.randn(m, m - n))
        U[:, n:] = Q
    
    return U, singular_values, V.T

# Test manual implementation
A_test = np.array([[3, 1, 1],
                   [-1, 3, 1]], dtype=float)

U_manual, s_manual, Vt_manual = manual_svd(A_test)
U_numpy, s_numpy, Vt_numpy = np.linalg.svd(A_test, full_matrices=True)

print("Manual singular values:", s_manual)
print("NumPy singular values:", s_numpy)
print("Difference:", np.linalg.norm(s_manual - s_numpy))

Practical Applications

The real power of SVD emerges in its applications. Let’s explore image compression, one of the most intuitive use cases.

from PIL import Image
import matplotlib.pyplot as plt

def compress_image_svd(image_path, ranks):
    """
    Compress image using truncated SVD at different ranks
    """
    # Load and convert to grayscale
    img = Image.open(image_path).convert('L')
    A = np.array(img, dtype=float)
    
    # Perform SVD
    U, s, Vt = np.linalg.svd(A, full_matrices=False)
    
    results = {}
    
    for rank in ranks:
        # Truncate to rank k
        U_k = U[:, :rank]
        s_k = s[:rank]
        Vt_k = Vt[:rank, :]
        
        # Reconstruct
        A_k = U_k @ np.diag(s_k) @ Vt_k
        A_k = np.clip(A_k, 0, 255)
        
        # Calculate compression ratio
        original_size = A.shape[0] * A.shape[1]
        compressed_size = rank * (A.shape[0] + A.shape[1] + 1)
        compression_ratio = original_size / compressed_size
        
        results[rank] = {
            'image': A_k,
            'compression_ratio': compression_ratio,
            'variance_explained': np.sum(s_k**2) / np.sum(s**2)
        }
    
    return A, results

# Example usage (assuming you have an image file)
# original, compressed = compress_image_svd('sample.jpg', [5, 20, 50, 100])
# 
# for rank, data in compressed.items():
#     print(f"Rank {rank}:")
#     print(f"  Compression ratio: {data['compression_ratio']:.2f}x")
#     print(f"  Variance explained: {data['variance_explained']*100:.2f}%")

For recommender systems, SVD helps discover latent factors in user-item matrices:

def collaborative_filtering_svd(ratings_matrix, k=10):
    """
    Simple collaborative filtering using SVD
    ratings_matrix: users × items matrix with ratings
    k: number of latent factors
    """
    # Handle missing values by replacing with row means
    row_means = np.nanmean(ratings_matrix, axis=1, keepdims=True)
    ratings_filled = np.where(np.isnan(ratings_matrix), 
                              row_means, 
                              ratings_matrix)
    
    # Perform SVD
    U, s, Vt = np.linalg.svd(ratings_filled, full_matrices=False)
    
    # Truncate to k factors
    U_k = U[:, :k]
    s_k = s[:k]
    Vt_k = Vt[:k, :]
    
    # Reconstruct predictions
    predictions = U_k @ np.diag(s_k) @ Vt_k
    
    return predictions

# Example with synthetic data
n_users, n_items = 100, 50
ratings = np.random.rand(n_users, n_items) * 5
# Mask 30% as missing
mask = np.random.rand(n_users, n_items) < 0.3
ratings[mask] = np.nan

predictions = collaborative_filtering_svd(ratings, k=10)

SVD vs. Other Decomposition Methods

SVD vs. PCA: PCA is essentially SVD on mean-centered data. When you perform PCA, you’re computing the SVD of the covariance matrix. Use PCA when you need interpretable principal components and variance explained. Use SVD directly when working with sparse matrices or when mean-centering is problematic.

SVD vs. Eigendecomposition: Eigendecomposition only works on square matrices, while SVD works on any matrix. For symmetric matrices, they’re equivalent, but SVD is numerically more stable.

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Generate sample data
X = np.random.randn(100, 5)

# PCA approach
pca = PCA(n_components=3)
X_pca = pca.fit_transform(X)

# SVD approach (manual PCA)
X_centered = X - X.mean(axis=0)
U, s, Vt = np.linalg.svd(X_centered, full_matrices=False)
X_svd = U[:, :3] @ np.diag(s[:3])

print("PCA components shape:", X_pca.shape)
print("SVD components shape:", X_svd.shape)
print("Difference:", np.linalg.norm(np.abs(X_pca) - np.abs(X_svd)))

Implementation Considerations

SVD computational complexity is O(min(mn², m²n)) for an m×n matrix. For large matrices, this becomes prohibitive. Use truncated SVD when you only need the top k singular values:

from scipy.sparse.linalg import svds
from scipy.sparse import csr_matrix
import time

def benchmark_svd(sizes):
    """
    Benchmark SVD performance across different matrix sizes
    """
    results = []
    
    for m, n in sizes:
        A = np.random.randn(m, n)
        
        # Full SVD
        start = time.time()
        U, s, Vt = np.linalg.svd(A, full_matrices=False)
        full_time = time.time() - start
        
        # Truncated SVD (top 10 components)
        k = min(10, min(m, n) - 1)
        start = time.time()
        U_k, s_k, Vt_k = svds(A, k=k)
        truncated_time = time.time() - start
        
        results.append({
            'size': f"{m}×{n}",
            'full_svd': full_time,
            'truncated_svd': truncated_time,
            'speedup': full_time / truncated_time
        })
    
    return results

# Benchmark
sizes = [(100, 100), (500, 500), (1000, 1000), (2000, 500)]
benchmark_results = benchmark_svd(sizes)

for result in benchmark_results:
    print(f"Matrix {result['size']}:")
    print(f"  Full SVD: {result['full_svd']:.3f}s")
    print(f"  Truncated SVD: {result['truncated_svd']:.3f}s")
    print(f"  Speedup: {result['speedup']:.2f}x\n")

For sparse matrices, always use scipy.sparse.linalg.svds() instead of converting to dense format. The memory savings are substantial:

from scipy.sparse import random as sparse_random

# Create sparse matrix (1% density)
sparse_matrix = sparse_random(1000, 1000, density=0.01, format='csr')

# Compute truncated SVD on sparse matrix
U_sparse, s_sparse, Vt_sparse = svds(sparse_matrix, k=10)

print(f"Sparse matrix memory: {sparse_matrix.data.nbytes / 1024:.2f} KB")
print(f"Dense equivalent would be: {1000*1000*8 / 1024:.2f} KB")

SVD is numerically stable and handles ill-conditioned matrices better than direct eigendecomposition. Modern implementations use iterative algorithms like the Golub-Kahan-Lanczos bidiagonalization for large matrices.

When choosing between libraries, NumPy’s implementation is sufficient for small to medium matrices. For large-scale problems, use SciPy’s sparse implementations or specialized libraries like ARPACK (wrapped by SciPy). For machine learning workflows, scikit-learn’s TruncatedSVD provides a clean interface that integrates with pipelines.

The key is understanding your matrix properties: size, sparsity, and whether you need all singular values or just the top k. This determines both your implementation choice and computational feasibility.