How to Perform Singular Value Decomposition (SVD) in Python

Key Insights

SVD decomposes any matrix into three components (U, Σ, V^T) that reveal its fundamental structure, enabling applications from image compression to recommendation systems with a single NumPy function call.
For practical applications, you rarely need all singular values—selecting the top k values gives you controllable tradeoffs between accuracy and computational efficiency, with 10-20% of values often preserving 90%+ of the information.
Use NumPy’s svd() for dense matrices under 10K elements, scipy.sparse.linalg.svds() for sparse matrices, and scikit-learn’s TruncatedSVD for production pipelines requiring consistent dimensionality reduction.

Introduction to SVD

Singular Value Decomposition (SVD) is a matrix factorization technique that decomposes any m×n matrix A into three matrices: A = UΣV^T. Here, U is an m×m orthogonal matrix, Σ is an m×n diagonal matrix containing singular values in descending order, and V^T is an n×n orthogonal matrix.

The power of SVD lies in its ability to identify the most important features of your data. The singular values in Σ represent the “strength” of each component, letting you rank features by importance. This makes SVD invaluable for dimensionality reduction, where you keep only the top k singular values and their corresponding vectors.

Common applications include image compression (representing images with fewer values), recommendation systems (filling in missing user preferences), noise reduction (filtering out weak signals), and text analysis (finding semantic relationships in documents). Python’s numerical libraries make SVD implementation straightforward, with NumPy providing the core functionality and SciPy/scikit-learn offering specialized variants.

Basic SVD Implementation with NumPy

NumPy’s linalg.svd() function handles the heavy lifting. Let’s decompose a simple matrix and verify the reconstruction:

import numpy as np

# Create a sample matrix
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
    [10, 11, 12]
])

# Perform SVD
U, s, Vt = np.linalg.svd(A, full_matrices=False)

print(f"Original shape: {A.shape}")
print(f"U shape: {U.shape}")
print(f"s shape: {s.shape}")
print(f"Vt shape: {Vt.shape}")

# Reconstruct the original matrix
S = np.diag(s)
A_reconstructed = U @ S @ Vt

print(f"\nReconstruction error: {np.linalg.norm(A - A_reconstructed)}")

The full_matrices=False parameter returns the economical SVD, which is usually what you want. The singular values s come as a 1D array rather than a full matrix—you need to convert it to a diagonal matrix for reconstruction.

The singular values tell you how much variance each component explains. You can calculate the explained variance ratio:

variance_explained = s**2 / np.sum(s**2)
cumulative_variance = np.cumsum(variance_explained)

for i, (var, cum_var) in enumerate(zip(variance_explained, cumulative_variance)):
    print(f"Component {i}: {var:.3f} ({cum_var:.3f} cumulative)")

Practical Application: Image Compression

SVD excels at image compression because images often have redundant information. By keeping only the largest singular values, you can reconstruct a visually similar image with significantly less data:

import numpy as np
import matplotlib.pyplot as plt
from PIL import Image

# Load and prepare image
img = Image.open('example.jpg').convert('L')  # Convert to grayscale
img_array = np.array(img, dtype=float)

# Perform SVD
U, s, Vt = np.linalg.svd(img_array, full_matrices=False)

# Compress with different k values
k_values = [5, 20, 50, 100]
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

# Original image
axes[0, 0].imshow(img_array, cmap='gray')
axes[0, 0].set_title('Original')
axes[0, 0].axis('off')

for idx, k in enumerate(k_values, 1):
    # Reconstruct using only top k singular values
    S_k = np.diag(s[:k])
    img_compressed = U[:, :k] @ S_k @ Vt[:k, :]
    
    # Calculate compression ratio
    original_size = img_array.size
    compressed_size = U[:, :k].size + k + Vt[:k, :].size
    compression_ratio = original_size / compressed_size
    
    row, col = divmod(idx, 3)
    axes[row, col].imshow(img_compressed, cmap='gray')
    axes[row, col].set_title(f'k={k} (ratio: {compression_ratio:.1f}x)')
    axes[row, col].axis('off')

plt.tight_layout()
plt.savefig('svd_compression.png', dpi=150, bbox_inches='tight')

The compression ratio improves dramatically with lower k values, but visual quality degrades. For most natural images, k=50-100 provides excellent quality at 10-20x compression.

SVD for Dimensionality Reduction

SVD is the mathematical foundation of Principal Component Analysis (PCA). When you perform PCA, you’re essentially doing SVD on the centered data matrix. Scikit-learn’s TruncatedSVD is particularly useful for large datasets:

from sklearn.decomposition import TruncatedSVD
from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

# Load dataset
digits = load_digits()
X = digits.data
y = digits.target

# Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply Truncated SVD
n_components = 2
svd = TruncatedSVD(n_components=n_components, random_state=42)
X_reduced = svd.fit_transform(X_scaled)

# Visualize
plt.figure(figsize=(10, 8))
scatter = plt.scatter(X_reduced[:, 0], X_reduced[:, 1], 
                     c=y, cmap='tab10', alpha=0.6)
plt.colorbar(scatter)
plt.xlabel(f'First Component ({svd.explained_variance_ratio_[0]:.2%} variance)')
plt.ylabel(f'Second Component ({svd.explained_variance_ratio_[1]:.2%} variance)')
plt.title('Digits Dataset - SVD Dimensionality Reduction')
plt.savefig('svd_dimensionality_reduction.png')

print(f"Total variance explained: {svd.explained_variance_ratio_.sum():.2%}")

TruncatedSVD doesn’t center the data (unlike PCA), making it suitable for sparse matrices where centering would destroy sparsity.

Advanced: Sparse Matrix SVD

For large sparse matrices like document-term matrices, using dense SVD would be computationally prohibitive. SciPy’s svds() function computes only the k largest singular values:

from scipy.sparse import random as sparse_random
from scipy.sparse.linalg import svds
import time

# Create a large sparse matrix (10000 x 5000, 1% density)
n_rows, n_cols = 10000, 5000
density = 0.01
sparse_matrix = sparse_random(n_rows, n_cols, density=density, format='csr')

# Sparse SVD (only compute top 50 singular values)
k = 50
start = time.time()
U_sparse, s_sparse, Vt_sparse = svds(sparse_matrix, k=k)
sparse_time = time.time() - start

print(f"Sparse SVD time: {sparse_time:.2f}s")
print(f"Matrix size: {n_rows}x{n_cols}")
print(f"Memory usage: {sparse_matrix.data.nbytes / 1024**2:.2f} MB")

# Note: svds returns singular values in ascending order, reverse them
idx = np.argsort(s_sparse)[::-1]
s_sparse = s_sparse[idx]
U_sparse = U_sparse[:, idx]
Vt_sparse = Vt_sparse[idx, :]

print(f"Top 5 singular values: {s_sparse[:5]}")

The performance difference is dramatic. Dense SVD on this matrix would require ~400MB just to store the matrix, while the sparse version uses ~4MB.

Real-World Use Case: Collaborative Filtering

SVD powers recommendation systems by factorizing user-item rating matrices. Missing ratings can be predicted from the low-rank approximation:

import numpy as np
from numpy.linalg import svd

# User-item rating matrix (users x movies)
# 0 indicates missing rating
ratings = np.array([
    [5, 3, 0, 1, 0],
    [4, 0, 0, 1, 2],
    [1, 1, 0, 5, 0],
    [0, 0, 5, 4, 0],
    [0, 1, 4, 0, 3],
])

# Create mask for known ratings
mask = ratings > 0

# Replace zeros with mean rating for SVD
mean_rating = ratings[mask].mean()
ratings_filled = np.where(mask, ratings, mean_rating)

# Perform SVD with k=2 latent factors
k = 2
U, s, Vt = svd(ratings_filled, full_matrices=False)
S_k = np.diag(s[:k])
predicted_ratings = U[:, :k] @ S_k @ Vt[:k, :]

# Denormalize predictions
predicted_ratings = np.clip(predicted_ratings, 1, 5)

print("Original ratings:")
print(ratings)
print("\nPredicted ratings:")
print(predicted_ratings.round(2))

# Show predictions for missing ratings
for i in range(ratings.shape[0]):
    for j in range(ratings.shape[1]):
        if ratings[i, j] == 0:
            print(f"User {i}, Item {j}: Predicted rating = {predicted_ratings[i, j]:.2f}")

This simple approach forms the basis of more sophisticated methods like matrix factorization with gradient descent, which handle missing data more elegantly.

Best Practices and Troubleshooting

Choose the right implementation: Use numpy.linalg.svd() for dense matrices under 10,000 elements. For larger dense matrices or when you only need top k components, use TruncatedSVD from scikit-learn. For sparse matrices, always use scipy.sparse.linalg.svds().

Determine optimal k: Plot cumulative explained variance and look for an “elbow” where adding more components gives diminishing returns. For many applications, 90-95% explained variance is sufficient.

Handle numerical instability: Very small singular values (< 1e-10) often represent numerical noise. You can safely threshold them to zero. If you encounter convergence issues with sparse SVD, try increasing the number of iterations or using a different random seed.

Memory management: Full SVD of an m×n matrix requires O(min(m,n)²) memory. For large matrices, this becomes prohibitive. Always prefer truncated SVD when you don’t need all components.

Common errors: If reconstruction error is unexpectedly large, verify you’re using all singular values and that matrix dimensions align correctly. The full_matrices=False parameter prevents dimension mismatches in most cases.

SVD is a fundamental tool that deserves a place in every data scientist’s toolkit. Master these implementations, and you’ll find applications everywhere from computer vision to natural language processing.