Linear Algebra: Rank and Nullity Explained

Key Insights

The Rank-Nullity Theorem (rank + nullity = number of columns) is fundamental to understanding how linear transformations compress or preserve information, directly impacting dimensionality reduction techniques like PCA.
Matrix rank tells you the number of truly independent features in your data—critical for detecting multicollinearity in regression and determining whether linear systems have unique solutions.
Computing rank reliably requires understanding numerical precision: never check if floating-point values equal exactly zero; always use tolerance thresholds with methods like SVD.

Introduction to Rank and Nullity

Matrix rank and nullity are two sides of the same coin. The rank of a matrix is the dimension of its column space—essentially, how many linearly independent columns it contains. The nullity is the dimension of its null space—the number of independent solutions to the equation Ax = 0.

These concepts are connected by the Rank-Nullity Theorem: for any m×n matrix A, rank(A) + nullity(A) = n. This isn’t just mathematical elegance; it tells you how much information a linear transformation preserves versus destroys.

In practical terms, rank determines whether you can solve linear systems uniquely, whether your regression model has multicollinearity issues, and how many principal components you can extract from your data. Let’s see this with a simple example:

import numpy as np

# Create a 3x3 matrix
A = np.array([
    [1, 2, 3],
    [2, 4, 6],
    [1, 1, 2]
])

# Calculate rank using NumPy
rank = np.linalg.matrix_rank(A)
n_cols = A.shape[1]
nullity = n_cols - rank

print(f"Matrix A:\n{A}")
print(f"Rank: {rank}")
print(f"Nullity: {nullity}")
print(f"Rank + Nullity = {rank + nullity} (should equal {n_cols})")

This matrix has rank 2, not 3, because the second row is twice the first. One column is redundant—it can be expressed as a linear combination of the others. The nullity of 1 tells us there’s one dimension of “freedom” in solutions to Ax = 0.

Understanding Matrix Rank

An m×n matrix can have rank at most min(m, n). When rank equals this maximum, the matrix is full rank. Otherwise, it’s rank-deficient.

Here’s a crucial fact: row rank always equals column rank. You can count independent rows or independent columns—you’ll get the same number. This means rank represents fundamental dimensionality, not just a property of how you arrange the matrix.

Geometrically, rank tells you how many dimensions the matrix’s column vectors span. Three linearly independent 3D vectors span all of 3D space. Two independent 3D vectors span a plane. One spans a line.

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Full rank matrix - 3 independent columns
A_full = np.array([
    [1, 0, 1],
    [0, 1, 1],
    [0, 0, 1]
])

# Rank-deficient matrix - columns lie in a plane
A_deficient = np.array([
    [1, 2, 3],
    [0, 1, 2],
    [0, 0, 0]
])

print("Full rank matrix:")
print(f"Rank: {np.linalg.matrix_rank(A_full)}")
print("\nRank-deficient matrix:")
print(f"Rank: {np.linalg.matrix_rank(A_deficient)}")

# Visualize the column vectors
fig = plt.figure(figsize=(12, 5))

# Full rank columns
ax1 = fig.add_subplot(121, projection='3d')
colors = ['r', 'g', 'b']
for i in range(3):
    ax1.quiver(0, 0, 0, A_full[0, i], A_full[1, i], A_full[2, i], 
               color=colors[i], arrow_length_ratio=0.1, linewidth=2)
ax1.set_title('Full Rank: Spans 3D Space')
ax1.set_xlim([0, 2]); ax1.set_ylim([0, 2]); ax1.set_zlim([0, 2])

# Rank-deficient columns
ax2 = fig.add_subplot(122, projection='3d')
for i in range(3):
    ax2.quiver(0, 0, 0, A_deficient[0, i], A_deficient[1, i], A_deficient[2, i],
               color=colors[i], arrow_length_ratio=0.1, linewidth=2)
ax2.set_title('Rank-Deficient: Spans 2D Plane')
ax2.set_xlim([0, 4]); ax2.set_ylim([0, 3]); ax2.set_zlim([0, 1])

plt.tight_layout()
plt.savefig('rank_visualization.png', dpi=150, bbox_inches='tight')

The rank-deficient matrix has all column vectors lying in the xy-plane (z-component is 0 for the third column). No matter how you combine these vectors, you can’t escape that plane.

Understanding Nullity and Null Space

The null space of matrix A contains all vectors x where Ax = 0. The nullity is the dimension of this space—how many independent vectors it takes to span all solutions.

If nullity is 0, only x = 0 solves Ax = 0, meaning the columns are linearly independent. If nullity > 0, there are non-trivial solutions, indicating redundancy in your columns.

from scipy.linalg import null_space

# Matrix with non-trivial null space
A = np.array([
    [1, 2, 3],
    [2, 4, 6],
    [1, 1, 2]
], dtype=float)

# Find null space basis
null_basis = null_space(A)

print(f"Matrix rank: {np.linalg.matrix_rank(A)}")
print(f"Nullity: {null_basis.shape[1]}")
print(f"\nNull space basis vector(s):\n{null_basis}")

# Verify it's actually in the null space
if null_basis.shape[1] > 0:
    result = A @ null_basis
    print(f"\nA @ null_basis =\n{result}")
    print(f"(Should be approximately zero)")
    print(f"Max absolute value: {np.max(np.abs(result))}")

The null space basis vector shows you the “directions” of redundancy. In regression, if your design matrix has nullity > 0, you have perfect multicollinearity—some predictors are exact linear combinations of others.

Practical Applications

Principal Component Analysis

PCA relies heavily on rank. When you compute principal components, you’re finding the directions of maximum variance. The number of non-zero eigenvalues equals the rank of your covariance matrix.

from sklearn.decomposition import PCA
from sklearn.datasets import make_classification

# Create dataset with redundant features
X, y = make_classification(n_samples=200, n_features=10, n_informative=5,
                           n_redundant=5, random_state=42)

# Check rank of the data matrix
rank_X = np.linalg.matrix_rank(X)
print(f"Data matrix rank: {rank_X} out of {X.shape[1]} features")

# Fit PCA
pca = PCA()
pca.fit(X)

# Explained variance ratio
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.plot(range(1, 11), pca.explained_variance_ratio_, 'bo-')
plt.xlabel('Principal Component')
plt.ylabel('Explained Variance Ratio')
plt.title('Scree Plot')
plt.grid(True)

plt.subplot(1, 2, 2)
plt.plot(range(1, 11), np.cumsum(pca.explained_variance_ratio_), 'ro-')
plt.axhline(y=0.95, color='k', linestyle='--', label='95% threshold')
plt.xlabel('Number of Components')
plt.ylabel('Cumulative Explained Variance')
plt.title('Cumulative Variance')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.savefig('pca_rank.png', dpi=150, bbox_inches='tight')

# Count effective components
n_components_95 = np.argmax(np.cumsum(pca.explained_variance_ratio_) >= 0.95) + 1
print(f"\nComponents needed for 95% variance: {n_components_95}")

If your data matrix is rank-deficient, some eigenvalues will be (numerically) zero. You can’t extract more meaningful components than the rank.

Detecting Multicollinearity

In regression, rank-deficiency in your design matrix means you can’t uniquely estimate coefficients.

# Create design matrix with perfect multicollinearity
X_multicol = np.array([
    [1, 2, 3, 5],   # Last column = col2 + col3
    [1, 3, 4, 7],
    [1, 1, 5, 6],
    [1, 4, 2, 6]
])

print(f"Design matrix rank: {np.linalg.matrix_rank(X_multicol)}")
print(f"Number of columns: {X_multicol.shape[1]}")
print(f"Rank-deficient: {np.linalg.matrix_rank(X_multicol) < X_multicol.shape[1]}")

# Try to compute (X'X)^(-1) - will fail or be unstable
XtX = X_multicol.T @ X_multicol
print(f"\nCondition number of X'X: {np.linalg.cond(XtX):.2e}")
print("(Values > 1e10 indicate severe multicollinearity)")

Computing Rank and Nullity

Singular Value Decomposition Approach

SVD is the most numerically stable way to compute rank. It decomposes A = UΣV^T where Σ contains singular values. The rank equals the number of non-zero singular values.

def compute_rank_svd(A, tol=1e-10):
    """
    Compute matrix rank using SVD with specified tolerance.
    """
    # Compute singular values
    singular_values = np.linalg.svd(A, compute_uv=False)
    
    # Count singular values above tolerance
    rank = np.sum(singular_values > tol)
    
    return rank, singular_values

# Test matrix
A = np.array([
    [1, 2, 3],
    [2, 4, 6],
    [1, 1, 2]
], dtype=float)

rank, sv = compute_rank_svd(A)

print(f"Singular values: {sv}")
print(f"Rank (tol=1e-10): {rank}")
print(f"NumPy rank: {np.linalg.matrix_rank(A)}")

# Compare with different tolerances
for tol in [1e-5, 1e-10, 1e-15]:
    rank_tol, _ = compute_rank_svd(A, tol=tol)
    print(f"Rank with tolerance {tol}: {rank_tol}")

Common Pitfalls and Best Practices

Floating-Point Precision

Never check if a floating-point number equals exactly zero when computing rank. Numerical errors accumulate.

# Demonstrate precision issues
A_theory = np.array([
    [1, 2],
    [2, 4]
], dtype=float)

# Add tiny numerical noise
np.random.seed(42)
A_noisy = A_theory + np.random.normal(0, 1e-14, A_theory.shape)

print("Theoretical rank-deficient matrix:")
print(f"Rank: {np.linalg.matrix_rank(A_theory)}")

print("\nSame matrix with tiny noise:")
print(f"Rank (default tol): {np.linalg.matrix_rank(A_noisy)}")

# Show singular values
sv_theory = np.linalg.svd(A_theory, compute_uv=False)
sv_noisy = np.linalg.svd(A_noisy, compute_uv=False)

print(f"\nSingular values (theory): {sv_theory}")
print(f"Singular values (noisy): {sv_noisy}")

# Use appropriate tolerance
print(f"\nRank with tol=1e-10: {np.linalg.matrix_rank(A_noisy, tol=1e-10)}")

The noisy matrix might appear full rank with default tolerance, even though theoretically it should be rank 1. Always choose tolerance relative to your data scale and precision requirements.

Best Practices

Use SVD for rank computation in production code—it’s more stable than row reduction.
Set explicit tolerances based on your data’s scale. For normalized data, 1e-10 is reasonable. For large-scale data, you might need 1e-6 or larger.
Check rank before solving linear systems. If your matrix is rank-deficient, use least-squares or regularization instead of direct inversion.
Monitor condition numbers. A condition number above 1e10 indicates near-singularity, even if rank appears full.
In machine learning pipelines, check for rank-deficiency after feature engineering. Adding interaction terms or polynomial features can create linear dependencies.

Understanding rank and nullity isn’t just academic—it’s essential for building robust numerical algorithms and diagnosing problems in data analysis pipelines. When your regression fails to converge or PCA produces unexpected results, rank is often the first thing to check.