How to Calculate the Rank of a Matrix in Python

Matrix rank is one of the most fundamental concepts in linear algebra. It represents the maximum number of linearly independent row vectors (or equivalently, column vectors) in a matrix. A matrix...

Key Insights

  • Matrix rank reveals the number of linearly independent rows or columns, which is critical for determining if linear systems have unique solutions and whether matrices are invertible.
  • NumPy’s linalg.matrix_rank() uses Singular Value Decomposition (SVD) internally and allows tolerance adjustment for handling floating-point precision issues in near-singular matrices.
  • In data science, checking the rank of feature matrices or correlation matrices helps identify multicollinearity and redundant variables that can destabilize statistical models.

Introduction to Matrix Rank

Matrix rank is one of the most fundamental concepts in linear algebra. It represents the maximum number of linearly independent row vectors (or equivalently, column vectors) in a matrix. A matrix with m rows and n columns can have a rank at most min(m, n).

Understanding rank is essential for several reasons. First, it determines whether a system of linear equations has a unique solution, infinitely many solutions, or no solution at all. Second, a square matrix is invertible if and only if it has full rank. Third, in machine learning and statistics, rank helps identify redundant features and multicollinearity problems that can break regression models.

When you encounter a dataset with perfectly correlated features, the resulting design matrix will be rank-deficient. This causes numerical instability in algorithms that require matrix inversion, like ordinary least squares regression. Checking rank early in your data pipeline can save hours of debugging cryptic numerical errors.

Using NumPy’s linalg.matrix_rank()

The standard way to calculate matrix rank in Python is NumPy’s linalg.matrix_rank() function. This function provides a clean interface and handles the numerical complexities behind the scenes.

import numpy as np

# Full rank 3x3 matrix
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 10]
])

rank_A = np.linalg.matrix_rank(A)
print(f"Rank of A: {rank_A}")  # Output: 3

# Rank-deficient matrix (third row is sum of first two)
B = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [5, 7, 9]
])

rank_B = np.linalg.matrix_rank(B)
print(f"Rank of B: {rank_B}")  # Output: 2

The function accepts a tolerance parameter (tol) that determines the threshold below which singular values are considered zero. This is crucial when working with floating-point arithmetic, where numerical errors can make theoretically zero values appear as very small non-zero numbers.

# Near-singular matrix with small numerical noise
C = np.array([
    [1.0, 2.0],
    [2.0, 4.0000001]
])

# Default tolerance
print(np.linalg.matrix_rank(C))  # Might output: 2

# Stricter tolerance
print(np.linalg.matrix_rank(C, tol=1e-5))  # Output: 1

Understanding Rank Through SVD

NumPy calculates matrix rank using Singular Value Decomposition (SVD), which decomposes any matrix A into three matrices: A = UΣV^T, where Σ is a diagonal matrix containing singular values. The rank equals the number of non-zero singular values.

This approach is numerically stable and works for rectangular matrices. Understanding SVD helps you debug rank calculations and implement custom tolerance schemes.

# Manual rank calculation using SVD
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

# Perform SVD
U, singular_values, Vt = np.linalg.svd(A)

print("Singular values:", singular_values)
# Output: [1.68481034e+01 1.06836951e+00 4.41842475e-16]

# Count non-zero singular values (with tolerance)
tolerance = 1e-10
rank = np.sum(singular_values > tolerance)
print(f"Rank: {rank}")  # Output: 2

# Compare with built-in function
print(f"NumPy rank: {np.linalg.matrix_rank(A)}")  # Output: 2

The third singular value (4.4e-16) is effectively zero—it’s just floating-point roundoff error. The matrix has rank 2 because the third row is a linear combination of the first two rows (specifically, row 3 = 2 × row 2 - row 1).

Practical Examples and Edge Cases

Let’s examine common matrix types and how their rank behaves.

# Identity matrix - always full rank
I = np.eye(4)
print(f"Identity matrix rank: {np.linalg.matrix_rank(I)}")  # Output: 4

# Zero matrix - always rank 0
Z = np.zeros((3, 3))
print(f"Zero matrix rank: {np.linalg.matrix_rank(Z)}")  # Output: 0

# Matrix with linearly dependent columns
D = np.array([
    [1, 2, 3],
    [2, 4, 6],
    [3, 6, 9]
])
print(f"Rank of D: {np.linalg.matrix_rank(D)}")  # Output: 1
# All rows are multiples of [1, 2, 3]

# Rectangular matrix
E = np.array([
    [1, 2, 3, 4],
    [5, 6, 7, 8]
])
print(f"Rank of E: {np.linalg.matrix_rank(E)}")  # Output: 2

Floating-point precision becomes critical when matrices are nearly singular:

# Create a matrix that's numerically close to rank-deficient
F = np.array([
    [1.0, 2.0, 3.0],
    [4.0, 5.0, 6.0],
    [7.0, 8.0, 9.0 + 1e-14]  # Tiny perturbation
])

# Different tolerances give different results
print(np.linalg.matrix_rank(F, tol=1e-10))  # Output: 3
print(np.linalg.matrix_rank(F, tol=1e-15))  # Output: 3
print(np.linalg.matrix_rank(F, tol=1e-16))  # Output: 2

# Default tolerance (based on singular values and matrix size)
print(np.linalg.matrix_rank(F))  # Output: 2

Alternative Methods Using SciPy

SciPy provides complementary tools for rank analysis. The scipy.linalg.orth() function computes an orthonormal basis for the range of a matrix, and the number of basis vectors equals the rank.

from scipy.linalg import orth

G = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

# Get orthonormal basis
basis = orth(G)
rank_G = basis.shape[1]  # Number of columns in basis

print(f"Rank via orth(): {rank_G}")  # Output: 2
print(f"Rank via matrix_rank(): {np.linalg.matrix_rank(G)}")  # Output: 2
print(f"Basis shape: {basis.shape}")  # Output: (3, 2)

This method is useful when you need both the rank and an actual basis for the column space. However, for simply checking rank, np.linalg.matrix_rank() is more direct and efficient.

Real-World Application: Feature Selection in Data Science

One of the most practical uses of matrix rank is detecting multicollinearity in datasets. When features are perfectly correlated, the feature matrix becomes rank-deficient, causing problems for many machine learning algorithms.

import pandas as pd

# Create a dataset with multicollinearity
data = pd.DataFrame({
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [2, 4, 6, 8, 10],  # Perfectly correlated with feature1
    'feature3': [1, 3, 5, 7, 9],
    'feature4': [3, 7, 11, 15, 19]  # feature1 + feature3
})

# Check rank of feature matrix
feature_matrix = data.values
n_features = feature_matrix.shape[1]
rank = np.linalg.matrix_rank(feature_matrix)

print(f"Number of features: {n_features}")  # Output: 4
print(f"Rank of feature matrix: {rank}")  # Output: 2
print(f"Rank deficiency: {n_features - rank}")  # Output: 2

# Check correlation matrix rank
corr_matrix = data.corr().values
print(f"Correlation matrix rank: {np.linalg.matrix_rank(corr_matrix)}")  # Output: 2

When the rank is less than the number of features, you have redundant variables. You should either remove correlated features or use regularization techniques like ridge regression that handle multicollinearity.

# Identify which features to keep using SVD
U, s, Vt = np.linalg.svd(feature_matrix)

# Features corresponding to largest singular values
important_features = np.argsort(s)[-rank:]
print(f"Keep features at indices: {important_features}")

Performance Considerations and Best Practices

For large matrices, SVD-based rank calculation has O(min(m²n, mn²)) complexity. If you’re working with massive datasets, consider these strategies:

Use appropriate tolerance: The default tolerance is max(M.shape) * eps * max(singular_values), where eps is machine epsilon. For most applications, this works well, but domain knowledge might suggest different thresholds.

Check rank before expensive operations: Before attempting matrix inversion or solving linear systems, verify full rank to avoid wasted computation and numerical errors.

def safe_inverse(A):
    """Safely invert a matrix after checking rank."""
    if A.shape[0] != A.shape[1]:
        raise ValueError("Matrix must be square")
    
    rank = np.linalg.matrix_rank(A)
    if rank < A.shape[0]:
        raise np.linalg.LinAlgError(
            f"Matrix is rank-deficient: rank {rank} < size {A.shape[0]}"
        )
    
    return np.linalg.inv(A)

For sparse matrices: Use scipy.sparse.linalg.svds() instead of dense SVD for better performance on large sparse matrices.

The recommended approach is straightforward: use np.linalg.matrix_rank() for most cases, adjust tolerance when dealing with numerically sensitive problems, and understand that SVD provides the theoretical foundation. For data science applications, always check feature matrix rank before training models that assume full rank, and use rank analysis to guide feature engineering decisions.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.