How to Perform Matrix Factorization in Python

Key Insights

Matrix factorization decomposes matrices into products of simpler matrices, enabling efficient solutions to linear systems, dimensionality reduction, and data compression problems
Different factorization methods serve distinct purposes: LU for linear systems, QR for least squares, SVD for general-purpose decomposition, NMF for non-negative data, and eigendecomposition for square matrices
Performance varies dramatically by method and matrix properties—sparse matrix implementations can reduce computation time by orders of magnitude for large-scale problems

Introduction to Matrix Factorization

Matrix factorization breaks down a matrix into a product of two or more matrices with specific properties. This decomposition reveals the underlying structure of data and enables efficient computation for numerous tasks in machine learning, scientific computing, and data analysis.

The primary applications include recommender systems (predicting user preferences), dimensionality reduction (compressing high-dimensional data), solving linear equations, and image compression. Each factorization method makes different assumptions about the data and produces matrices with distinct mathematical properties.

This article covers five essential factorization techniques with practical Python implementations: LU decomposition for linear systems, QR decomposition for least squares problems, Singular Value Decomposition (SVD) for general-purpose analysis, Non-Negative Matrix Factorization (NMF) for parts-based learning, and eigendecomposition for spectral analysis.

LU Decomposition

LU decomposition factors a matrix A into the product of a lower triangular matrix L and an upper triangular matrix U, where A = PLU (P is a permutation matrix for numerical stability). This method excels at solving systems of linear equations efficiently, especially when solving multiple systems with the same coefficient matrix.

Use LU decomposition when you need to solve Ax = b for multiple different b vectors, compute matrix determinants, or invert matrices. The factorization cost is O(n³), but subsequent solves are only O(n²).

import numpy as np
from scipy.linalg import lu, solve

# Create a sample matrix
A = np.array([[4, 3, -1],
              [2, 1, 3],
              [-1, 2, 2]], dtype=float)

# Perform LU decomposition
P, L, U = lu(A)

print("Original matrix A:")
print(A)
print("\nLower triangular L:")
print(L)
print("\nUpper triangular U:")
print(U)

# Verify A = PLU
reconstructed = P @ L @ U
print("\nReconstructed matrix (PLU):")
print(reconstructed)
print(f"\nReconstruction error: {np.linalg.norm(A - reconstructed):.2e}")

# Solve a linear system efficiently
b = np.array([1, 2, 3])
x = solve(A, b)
print(f"\nSolution to Ax=b: {x}")
print(f"Verification Ax: {A @ x}")

The permutation matrix P reorders rows for numerical stability during Gaussian elimination. For well-conditioned matrices, LU decomposition provides accurate solutions with minimal computational overhead.

QR Decomposition

QR decomposition factors a matrix A into an orthogonal matrix Q and an upper triangular matrix R, where A = QR. The orthogonal property (Q^T Q = I) makes this factorization numerically stable and ideal for least squares problems.

QR decomposition is the workhorse for solving overdetermined linear systems (more equations than unknowns), computing eigenvalues via the QR algorithm, and orthogonalizing vectors in the Gram-Schmidt process.

import numpy as np

# Create a rectangular matrix (more rows than columns)
A = np.array([[1, 2],
              [3, 4],
              [5, 6],
              [7, 8]], dtype=float)

# Perform QR decomposition
Q, R = np.linalg.qr(A)

print("Original matrix A:")
print(A)
print(f"\nShape: {A.shape}")
print("\nOrthogonal matrix Q:")
print(Q)
print("\nUpper triangular R:")
print(R)

# Verify orthogonality: Q^T Q should be identity
orthogonality_check = Q.T @ Q
print("\nQ^T Q (should be identity):")
print(orthogonality_check)

# Verify A = QR
reconstructed = Q @ R
print("\nReconstructed A:")
print(reconstructed)
print(f"Reconstruction error: {np.linalg.norm(A - reconstructed):.2e}")

# Solve least squares problem: minimize ||Ax - b||
b = np.array([1, 2, 3, 4])
x_ls = np.linalg.lstsq(A, b, rcond=None)[0]
print(f"\nLeast squares solution: {x_ls}")

The orthogonality of Q ensures numerical stability even for ill-conditioned problems. This makes QR decomposition more robust than normal equations (A^T A x = A^T b) for least squares.

Singular Value Decomposition (SVD)

SVD is the most powerful matrix factorization technique, decomposing any m×n matrix A into A = UΣV^T, where U and V are orthogonal matrices and Σ is diagonal with non-negative singular values. SVD works for any matrix—square, rectangular, singular, or non-singular.

Applications include image compression, principal component analysis (PCA), noise filtering, and computing matrix pseudo-inverses. SVD reveals the rank and fundamental subspaces of a matrix.

import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

# Load and prepare an image
# For demonstration, create a simple grayscale image
image = np.random.rand(100, 100) * 255
image = image.astype(np.uint8)

# Perform SVD
U, s, Vt = np.linalg.svd(image, full_matrices=False)

print(f"Original image shape: {image.shape}")
print(f"U shape: {U.shape}, s shape: {s.shape}, Vt shape: {Vt.shape}")
print(f"Singular values (first 10): {s[:10]}")

# Reconstruct with different ranks
def reconstruct_image(U, s, Vt, k):
    """Reconstruct image using k largest singular values"""
    return U[:, :k] @ np.diag(s[:k]) @ Vt[:k, :]

# Compare compression levels
ranks = [5, 10, 20, 50]
for k in ranks:
    reconstructed = reconstruct_image(U, s, Vt, k)
    compression_ratio = (k * (image.shape[0] + image.shape[1]) + k) / image.size
    error = np.linalg.norm(image - reconstructed) / np.linalg.norm(image)
    print(f"Rank {k}: Compression ratio: {compression_ratio:.2%}, "
          f"Relative error: {error:.2%}")

# Full reconstruction
full_reconstruction = U @ np.diag(s) @ Vt
print(f"\nFull reconstruction error: {np.linalg.norm(image - full_reconstruction):.2e}")

Low-rank approximations capture the most important features while discarding noise. This principle underlies PCA and many dimensionality reduction techniques.

Non-Negative Matrix Factorization (NMF)

NMF decomposes a non-negative matrix V into two non-negative matrices W and H such that V ≈ WH. The non-negativity constraint produces parts-based representations where components add together rather than cancel out.

NMF excels at topic modeling (documents as mixtures of topics), collaborative filtering (users and items as latent factors), and audio source separation. The interpretability of non-negative factors makes NMF valuable for exploratory data analysis.

import numpy as np
from sklearn.decomposition import NMF

# Create a sample document-term matrix
# Rows: documents, Columns: terms, Values: term frequencies
doc_term_matrix = np.array([
    [3, 4, 0, 0, 1],  # Document 1
    [2, 3, 0, 0, 0],  # Document 2
    [0, 0, 5, 4, 2],  # Document 3
    [0, 0, 4, 3, 1],  # Document 4
    [1, 1, 2, 2, 3],  # Document 5
], dtype=float)

# Perform NMF with 2 topics
n_components = 2
model = NMF(n_components=n_components, init='random', random_state=42, max_iter=500)
W = model.fit_transform(doc_term_matrix)
H = model.components_

print("Original matrix V:")
print(doc_term_matrix)
print(f"\nDocument-topic matrix W (shape {W.shape}):")
print(W)
print(f"\nTopic-term matrix H (shape {H.shape}):")
print(H)

# Reconstruct
reconstructed = W @ H
print("\nReconstructed matrix:")
print(reconstructed)
print(f"Reconstruction error: {np.linalg.norm(doc_term_matrix - reconstructed):.3f}")

# Interpret topics
print("\nTopic interpretation:")
for i in range(n_components):
    print(f"Topic {i}: {H[i, :]}")

Unlike SVD, NMF produces additive models where each component contributes positively. This matches human intuition for many domains—a document contains multiple topics, not positive and negative topic contributions.

Eigenvalue Decomposition

Eigendecomposition factors a square matrix A into A = QΛQ^(-1), where Q contains eigenvectors and Λ is a diagonal matrix of eigenvalues. This reveals the fundamental directions (eigenvectors) and scaling factors (eigenvalues) of the linear transformation represented by A.

Applications include PCA (eigenvectors of covariance matrix), stability analysis (eigenvalues indicate system behavior), and Markov chain analysis (steady-state distributions). Eigendecomposition only exists for square matrices and may involve complex numbers.

import numpy as np

# Create a symmetric matrix (guarantees real eigenvalues)
A = np.array([[4, 2, 1],
              [2, 5, 3],
              [1, 3, 6]], dtype=float)

# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)

# Sort by eigenvalue magnitude
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

print("Original matrix A:")
print(A)
print(f"\nEigenvalues: {eigenvalues}")
print("\nEigenvectors (columns):")
print(eigenvectors)

# Verify A v = λ v for each eigenvector
for i in range(len(eigenvalues)):
    v = eigenvectors[:, i]
    lhs = A @ v
    rhs = eigenvalues[i] * v
    print(f"\nEigenvector {i}: ||Av - λv|| = {np.linalg.norm(lhs - rhs):.2e}")

# Reconstruct matrix
Lambda = np.diag(eigenvalues)
reconstructed = eigenvectors @ Lambda @ np.linalg.inv(eigenvectors)
print("\nReconstructed matrix:")
print(reconstructed)
print(f"Reconstruction error: {np.linalg.norm(A - reconstructed):.2e}")

For symmetric matrices, use np.linalg.eigh() instead—it’s faster and guarantees real eigenvalues with orthogonal eigenvectors.

Practical Considerations and Performance Tips

Choose your factorization based on the problem structure. Use LU for solving multiple linear systems with the same coefficient matrix. Use QR for least squares and when numerical stability is critical. Use SVD for general-purpose decomposition, rank estimation, and when the matrix isn’t square. Use NMF when data is non-negative and interpretability matters. Use eigendecomposition for spectral analysis of square matrices.

Computational complexity varies significantly. LU, QR, and eigendecomposition are O(n³) for n×n matrices. SVD is O(min(mn², m²n)) for m×n matrices. For large matrices, consider truncated methods that compute only the k largest components.

Sparse matrices require specialized implementations. SciPy’s sparse linear algebra routines can reduce memory usage and computation time by orders of magnitude:

import numpy as np
from scipy.sparse import random, linalg as sparse_linalg
import time

# Create a large sparse matrix
n = 5000
density = 0.01
A_sparse = random(n, n, density=density, format='csr')
A_dense = A_sparse.toarray()

# Compare SVD performance (computing only top 10 components)
k = 10

# Sparse SVD
start = time.time()
u_sparse, s_sparse, vt_sparse = sparse_linalg.svds(A_sparse, k=k)
sparse_time = time.time() - start

# Dense SVD would be prohibitively slow for this size
print(f"Sparse SVD (top {k} components): {sparse_time:.3f} seconds")
print(f"Matrix size: {n}×{n} with {density:.1%} density")
print(f"Memory saved: {(1 - A_sparse.data.nbytes / A_dense.nbytes):.1%}")

For matrices larger than a few thousand dimensions, always use sparse implementations when possible. The performance difference becomes dramatic as matrix size increases—sparse methods can be 100x faster and use 99% less memory for typical sparse matrices.

When working with real-world data, preprocessing matters. Center and scale your data before PCA/eigendecomposition. Check for and handle missing values before factorization. Consider regularization for ill-conditioned problems. Always validate reconstruction error to ensure the factorization captures sufficient information for your application.