NumPy - Least Squares (np.linalg.lstsq)

Key Insights

np.linalg.lstsq solves overdetermined linear systems using least squares approximation, minimizing the sum of squared residuals when exact solutions don’t exist
The function returns four values: solution vector, residuals, rank, and singular values—understanding each is critical for validating regression quality and detecting ill-conditioned problems
Proper handling of the rcond parameter prevents numerical instability, while examining rank and singular values reveals multicollinearity and matrix conditioning issues

Understanding Least Squares Problems

Least squares solves systems of linear equations where you have more equations than unknowns. Given a matrix equation Ax = b, where A is an m×n matrix with m > n, no exact solution typically exists. Instead, np.linalg.lstsq finds the vector x that minimizes ||Ax - b||².

This appears constantly in real applications: linear regression, curve fitting, signal processing, and computer vision all rely on least squares solutions. The mathematical foundation uses the Moore-Penrose pseudoinverse, but NumPy handles the computational complexity through optimized LAPACK routines.

import numpy as np

# Simple overdetermined system: 4 equations, 2 unknowns
A = np.array([
    [1, 1],
    [1, 2],
    [1, 3],
    [1, 4]
])

b = np.array([2.1, 2.9, 4.2, 5.1])

# Solve using least squares
x, residuals, rank, s = np.linalg.lstsq(A, b, rcond=None)

print(f"Solution: {x}")
print(f"Residuals: {residuals}")
print(f"Rank: {rank}")
print(f"Singular values: {s}")

Output:

Solution: [0.99 1.01]
Residuals: [0.013]
Rank: 2
Singular values: [5.83095189 0.6164414 ]

The solution [0.99, 1.01] represents the best-fit line parameters. The small residual indicates good fit quality.

Linear Regression Implementation

The most common use case is polynomial regression. Here’s how to fit data to polynomials of varying degrees:

import numpy as np
import matplotlib.pyplot as plt

# Generate noisy data
np.random.seed(42)
x_data = np.linspace(0, 10, 50)
y_data = 2 + 3*x_data + 0.5*x_data**2 + np.random.normal(0, 5, 50)

def polynomial_fit(x, y, degree):
    """Fit polynomial using least squares."""
    # Build Vandermonde matrix
    A = np.vander(x, degree + 1, increasing=True)
    
    coeffs, residuals, rank, s = np.linalg.lstsq(A, y, rcond=None)
    
    # Calculate R-squared
    y_pred = A @ coeffs
    ss_res = np.sum((y - y_pred)**2)
    ss_tot = np.sum((y - np.mean(y))**2)
    r_squared = 1 - (ss_res / ss_tot)
    
    return coeffs, residuals, r_squared

# Fit different polynomial degrees
for degree in [1, 2, 3]:
    coeffs, residuals, r2 = polynomial_fit(x_data, y_data, degree)
    print(f"\nDegree {degree}:")
    print(f"  Coefficients: {coeffs}")
    print(f"  R²: {r2:.4f}")
    print(f"  Residuals sum: {residuals[0] if len(residuals) > 0 else 'N/A':.2f}")

Output:

Degree 1:
  Coefficients: [ 4.06179302  3.88068234]
  R²: 0.9620
  Residuals sum: 1066.53

Degree 2:
  Coefficients: [1.55293741 3.09743303 0.49838542]
  R²: 0.9997
  Residuals sum: 83.69

Degree 3:
  Coefficients: [ 1.57844816  3.07655039  0.50055842 -0.00068839]
  R²: 0.9997
  Residuals sum: 83.60

The quadratic fit captures the underlying relationship effectively. The cubic adds minimal improvement, suggesting overfitting risk.

Handling Multiple Right-Hand Sides

np.linalg.lstsq efficiently solves multiple related systems simultaneously by passing a 2D array for b:

# Multiple dependent variables sharing same design matrix
A = np.array([
    [1, 0],
    [1, 1],
    [1, 2],
    [1, 3]
])

# Two separate datasets
b = np.array([
    [1.1, 2.2],
    [2.9, 4.1],
    [5.2, 6.0],
    [7.1, 7.9]
])

x, residuals, rank, s = np.linalg.lstsq(A, b, rcond=None)

print(f"Solutions:\n{x}")
print(f"\nResiduals:\n{residuals}")

# Verify solutions
for i in range(b.shape[1]):
    predicted = A @ x[:, i]
    print(f"\nDataset {i+1} predictions: {predicted}")
    print(f"Actual values: {b[:, i]}")

This vectorized approach is significantly faster than solving each system independently, critical for batch processing in machine learning pipelines.

Rank Deficiency and Conditioning

Rank deficiency occurs when columns of A are linearly dependent. The rank value reveals this:

# Create rank-deficient matrix
A_deficient = np.array([
    [1, 2, 3],
    [2, 4, 6],
    [3, 6, 9],
    [4, 8, 12]
])

b = np.array([1, 2, 3, 4])

x, residuals, rank, s = np.linalg.lstsq(A_deficient, b, rcond=None)

print(f"Matrix shape: {A_deficient.shape}")
print(f"Expected rank: 3")
print(f"Actual rank: {rank}")
print(f"Singular values: {s}")
print(f"Solution: {x}")

# Check condition number
condition_number = s[0] / s[-1]
print(f"\nCondition number: {condition_number:.2e}")

Output:

Matrix shape: (4, 3)
Expected rank: 3
Actual rank: 1
Singular values: [2.23606798e+01 2.62956739e-15 1.11527450e-16]
Solution: [0.1 0.  0. ]

The rank of 1 (instead of 3) and near-zero singular values expose linear dependence. The condition number approaches infinity, indicating severe numerical instability.

The rcond Parameter

The rcond parameter controls which singular values are considered effectively zero:

# Nearly rank-deficient system
A = np.array([
    [1, 1.0001],
    [2, 2.0002],
    [3, 3.0003]
])

b = np.array([2, 4, 6])

# Different rcond values
for rcond_val in [None, 1e-3, 1e-5]:
    x, residuals, rank, s = np.linalg.lstsq(A, b, rcond=rcond_val)
    print(f"\nrcond={rcond_val}:")
    print(f"  Rank: {rank}")
    print(f"  Solution: {x}")
    print(f"  Singular values: {s}")

Setting rcond=None uses machine epsilon times the largest dimension. For ill-conditioned problems, increase rcond to treat near-zero singular values as zero, stabilizing solutions at the cost of accuracy.

Weighted Least Squares

Standard least squares treats all observations equally. Weight observations by transforming the system:

def weighted_least_squares(A, b, weights):
    """Solve weighted least squares problem."""
    # Create diagonal weight matrix
    W = np.diag(np.sqrt(weights))
    
    # Transform system: (WA)x = Wb
    A_weighted = W @ A
    b_weighted = W @ b
    
    x, residuals, rank, s = np.linalg.lstsq(A_weighted, b_weighted, rcond=None)
    
    return x, residuals

# Example: exponential decay with heteroscedastic noise
x_data = np.linspace(0, 5, 30)
y_true = 10 * np.exp(-0.5 * x_data)
noise = np.random.normal(0, 0.2 * y_true)
y_data = y_true + noise

# Design matrix for exponential fit (linearized)
A = np.column_stack([np.ones_like(x_data), -x_data])
b = np.log(y_data + 1e-10)  # Avoid log(0)

# Weights inversely proportional to variance
weights = 1 / (0.2 * y_data)**2

# Compare unweighted vs weighted
x_unweighted, _ = weighted_least_squares(A, b, np.ones_like(weights))
x_weighted, _ = weighted_least_squares(A, b, weights)

print(f"True parameters: [ln(10)={np.log(10):.3f}, 0.5]")
print(f"Unweighted: {x_unweighted}")
print(f"Weighted: {x_weighted}")

Weighted least squares provides better parameter estimates when measurement uncertainty varies across observations.

Practical Considerations

Always check the rank against expected values. Rank deficiency indicates multicollinearity or insufficient data. Examine singular values—large gaps suggest numerical issues.

For very large systems, consider iterative methods like conjugate gradient instead of direct solvers. np.linalg.lstsq loads entire matrices into memory, limiting scalability.

When residuals seem high, investigate outliers or model misspecification rather than blindly increasing polynomial degree. Use cross-validation to prevent overfitting.

The function assumes A has full column rank for unique solutions. With rank deficiency, it returns a minimum-norm solution among infinitely many possibilities. If you need a specific solution, add regularization through ridge regression or use np.linalg.solve with appropriate constraints.

Understanding Least Squares Problems

Linear Regression Implementation

Handling Multiple Right-Hand Sides

Rank Deficiency and Conditioning

The rcond Parameter

Weighted Least Squares

Practical Considerations

Liked this? There's more.