How to Calculate Eigenvalues in NumPy

Key Insights

NumPy provides three main eigenvalue functions: np.linalg.eig() for general matrices, np.linalg.eigvals() when you only need eigenvalues, and np.linalg.eigh() for symmetric/Hermitian matrices with better performance and numerical stability.
Always use np.linalg.eigh() for symmetric matrices—it’s faster, more numerically stable, and guarantees real eigenvalues in sorted order.
Eigenvalue calculations are foundational to many algorithms you already use, including PCA, Google’s PageRank, and stability analysis in control systems.

Introduction to Eigenvalues

Eigenvalues are scalar values that characterize how a linear transformation stretches or compresses space along specific directions. For a square matrix A, an eigenvalue λ and its corresponding eigenvector v satisfy the equation Av = λv. The eigenvector points in a direction that remains unchanged by the transformation (except for scaling), and the eigenvalue tells you the scaling factor.

This isn’t just abstract linear algebra. Eigenvalues power critical algorithms across engineering and data science:

Principal Component Analysis (PCA): Eigenvalues of the covariance matrix determine how much variance each principal component captures.
Stability analysis: In control systems and differential equations, eigenvalues determine whether a system converges, diverges, or oscillates.
Graph algorithms: The eigenvalues of adjacency and Laplacian matrices reveal community structure, connectivity, and centrality measures.
Quantum mechanics: Energy levels of quantum systems are eigenvalues of the Hamiltonian operator.

NumPy makes eigenvalue computation straightforward, but choosing the right function and understanding the output requires some care.

NumPy’s Eigenvalue Functions Overview

NumPy’s linalg module provides three primary functions for eigenvalue computation. Each serves a different use case:

import numpy as np

# Sample matrix
A = np.array([[4, -2], 
              [1,  1]])

# General eigenvalue decomposition (eigenvalues + eigenvectors)
eigenvalues, eigenvectors = np.linalg.eig(A)

# Eigenvalues only (slightly faster if you don't need eigenvectors)
eigenvalues_only = np.linalg.eigvals(A)

# For symmetric/Hermitian matrices (faster, more stable)
S = np.array([[4, 2], 
              [2, 1]])
eigenvalues_sym, eigenvectors_sym = np.linalg.eigh(S)

print(f"eig() eigenvalues: {eigenvalues}")
print(f"eigvals() eigenvalues: {eigenvalues_only}")
print(f"eigh() eigenvalues: {eigenvalues_sym}")

When to use each:

Function	Use Case	Returns
`np.linalg.eig()`	General square matrices	Eigenvalues and eigenvectors
`np.linalg.eigvals()`	When you only need eigenvalues	Eigenvalues only
`np.linalg.eigh()`	Symmetric or Hermitian matrices	Real eigenvalues (sorted) and orthonormal eigenvectors

The eigh() function is the workhorse for most data science applications because covariance matrices, correlation matrices, and graph Laplacians are all symmetric.

Computing Eigenvalues and Eigenvectors

Let’s walk through a complete example with np.linalg.eig() and verify our results:

import numpy as np

# Define a 3x3 matrix
A = np.array([
    [2, 0, 0],
    [1, 2, 1],
    [-1, 0, 1]
])

# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)

print("Eigenvalues:", eigenvalues)
print("\nEigenvectors (as columns):")
print(eigenvectors)

# Verify: Av should equal λv for each eigenvalue-eigenvector pair
print("\nVerification (Av = λv):")
for i in range(len(eigenvalues)):
    λ = eigenvalues[i]
    v = eigenvectors[:, i]  # i-th column is the i-th eigenvector
    
    Av = A @ v
    λv = λ * v
    
    print(f"\nEigenvalue {i+1}: λ = {λ:.4f}")
    print(f"Av  = {Av}")
    print(f"λv  = {λv}")
    print(f"Match: {np.allclose(Av, λv)}")

Critical detail: eigenvectors are returned as columns, not rows. The i-th eigenvector corresponding to eigenvalues[i] is eigenvectors[:, i]. This trips up many developers.

The eigenvectors are also normalized to unit length. If you need a different normalization (like having the first component equal to 1), you’ll need to rescale them yourself.

Special Cases: Symmetric and Hermitian Matrices

Symmetric matrices (where A = Aᵀ) have special properties: their eigenvalues are always real, and their eigenvectors are orthogonal. NumPy’s eigh() exploits these properties for significant performance and accuracy gains.

import numpy as np
import time

# Create a symmetric covariance-like matrix
np.random.seed(42)
X = np.random.randn(100, 50)
cov_matrix = X.T @ X / 100  # 50x50 symmetric positive semi-definite

# Compare eig() vs eigh()
start = time.perf_counter()
for _ in range(100):
    vals_eig, vecs_eig = np.linalg.eig(cov_matrix)
time_eig = time.perf_counter() - start

start = time.perf_counter()
for _ in range(100):
    vals_eigh, vecs_eigh = np.linalg.eigh(cov_matrix)
time_eigh = time.perf_counter() - start

print(f"eig() time:  {time_eig:.4f}s")
print(f"eigh() time: {time_eigh:.4f}s")
print(f"Speedup: {time_eig/time_eigh:.2f}x")

# Check eigenvalue types
print(f"\neig() eigenvalue dtype: {vals_eig.dtype}")
print(f"eigh() eigenvalue dtype: {vals_eigh.dtype}")

# eigh() returns sorted eigenvalues (ascending order)
print(f"\neigh() eigenvalues sorted: {np.all(np.diff(vals_eigh) >= -1e-10)}")

# Verify orthogonality of eigenvectors from eigh()
orthogonality_check = vecs_eigh.T @ vecs_eigh
print(f"Eigenvectors orthonormal: {np.allclose(orthogonality_check, np.eye(50))}")

Key differences with eigh():

Returns real eigenvalues (float64 instead of complex128)
Eigenvalues are sorted in ascending order
Eigenvectors are guaranteed orthonormal
Typically 2-3x faster than eig() for the same matrix size

Always check if your matrix is symmetric before choosing a function. A quick test: np.allclose(A, A.T).

Practical Application: Principal Component Analysis

PCA is the most common real-world application of eigenvalue decomposition. Here’s a from-scratch implementation:

import numpy as np

def pca_from_scratch(X, n_components):
    """
    Perform PCA using eigendecomposition.
    
    Parameters:
    -----------
    X : ndarray of shape (n_samples, n_features)
    n_components : int, number of principal components to keep
    
    Returns:
    --------
    X_transformed : projected data
    components : principal component directions
    explained_variance_ratio : proportion of variance explained
    """
    # Step 1: Center the data
    X_centered = X - X.mean(axis=0)
    
    # Step 2: Compute covariance matrix
    n_samples = X.shape[0]
    cov_matrix = (X_centered.T @ X_centered) / (n_samples - 1)
    
    # Step 3: Eigendecomposition (use eigh since covariance is symmetric)
    eigenvalues, eigenvectors = np.linalg.eigh(cov_matrix)
    
    # Step 4: Sort by eigenvalue (descending) - eigh returns ascending
    sorted_idx = np.argsort(eigenvalues)[::-1]
    eigenvalues = eigenvalues[sorted_idx]
    eigenvectors = eigenvectors[:, sorted_idx]
    
    # Step 5: Select top n_components
    components = eigenvectors[:, :n_components].T  # Shape: (n_components, n_features)
    
    # Step 6: Project data
    X_transformed = X_centered @ components.T
    
    # Calculate explained variance ratio
    explained_variance_ratio = eigenvalues[:n_components] / eigenvalues.sum()
    
    return X_transformed, components, explained_variance_ratio

# Demo with synthetic data
np.random.seed(42)
# Create data with clear principal directions
n_samples = 200
t = np.linspace(0, 4*np.pi, n_samples)
X = np.column_stack([
    3 * np.cos(t) + np.random.randn(n_samples) * 0.5,
    2 * np.sin(t) + np.random.randn(n_samples) * 0.3,
    0.5 * t + np.random.randn(n_samples) * 0.2
])

# Apply PCA
X_pca, components, var_ratio = pca_from_scratch(X, n_components=2)

print("Original shape:", X.shape)
print("Transformed shape:", X_pca.shape)
print("Explained variance ratio:", var_ratio)
print("Total variance explained:", var_ratio.sum())

This implementation mirrors what scikit-learn’s PCA does internally (though sklearn uses SVD for numerical stability with wide matrices).

Common Pitfalls and Best Practices

Eigenvalue computation has several gotchas that can silently corrupt your results:

import numpy as np

# Pitfall 1: Floating-point comparison
A = np.array([[1, 2], [2, 1]])
eigenvalues, _ = np.linalg.eigh(A)

# Wrong: exact comparison
if eigenvalues[0] == -1.0:  # Don't do this
    print("Exact match")

# Right: use tolerance
if np.isclose(eigenvalues[0], -1.0, rtol=1e-10):
    print("Match within tolerance")

# Pitfall 2: Complex eigenvalues from non-symmetric matrices
B = np.array([[0, -1], [1, 0]])  # Rotation matrix
eigenvalues_b = np.linalg.eigvals(B)
print(f"\nRotation matrix eigenvalues: {eigenvalues_b}")
print(f"Dtype: {eigenvalues_b.dtype}")  # complex128!

# Pitfall 3: Eigenvalue ordering differs between eig() and eigh()
C = np.array([[3, 1], [1, 3]])
vals_eig, _ = np.linalg.eig(C)
vals_eigh, _ = np.linalg.eigh(C)
print(f"\neig() order: {vals_eig}")    # No guaranteed order
print(f"eigh() order: {vals_eigh}")   # Always ascending

# Pitfall 4: Near-singular matrices
D = np.array([[1, 1], [1, 1.0000001]])  # Nearly singular
eigenvalues_d, eigenvectors_d = np.linalg.eig(D)
print(f"\nNear-singular eigenvalues: {eigenvalues_d}")
# Small eigenvalue indicates near-singularity

# Best practice: Check condition number before eigendecomposition
cond_number = np.linalg.cond(D)
print(f"Condition number: {cond_number:.2e}")
if cond_number > 1e10:
    print("Warning: Matrix is poorly conditioned")

Best practices summary:

Always use np.allclose() or np.isclose() for comparing eigenvalues
Check matrix symmetry before choosing eig() vs eigh()
Monitor condition numbers for numerical stability
Remember that eig() returns complex dtype even for real matrices with real eigenvalues
For large sparse matrices, use scipy.sparse.linalg.eigs() instead—it’s orders of magnitude faster

Conclusion

NumPy’s eigenvalue functions cover most practical needs:

Use np.linalg.eigh() for symmetric matrices (covariance, correlation, Laplacians)—it’s faster, more stable, and returns sorted real eigenvalues.
Use np.linalg.eig() for general square matrices when you need both eigenvalues and eigenvectors.
Use np.linalg.eigvals() when you only need eigenvalues and want to skip eigenvector computation.

For matrices larger than a few thousand dimensions, consider SciPy’s sparse eigensolvers (scipy.sparse.linalg.eigs and eigsh) which can compute only the k largest or smallest eigenvalues without forming the full decomposition. For GPU acceleration, CuPy provides drop-in replacements with identical APIs.

Eigenvalue decomposition is one of those fundamental operations that appears everywhere once you start looking. Understanding when and how to compute eigenvalues efficiently will serve you well across machine learning, scientific computing, and engineering applications.