NumPy - Least Squares (np.linalg.lstsq)
Least squares solves systems of linear equations where you have more equations than unknowns. Given a matrix equation `Ax = b`, where `A` is an m×n matrix with m > n, no exact solution typically...
Key Insights
np.linalg.lstsqsolves overdetermined linear systems using least squares approximation, minimizing the sum of squared residuals when exact solutions don’t exist- The function returns four values: solution vector, residuals, rank, and singular values—understanding each is critical for validating regression quality and detecting ill-conditioned problems
- Proper handling of the
rcondparameter prevents numerical instability, while examining rank and singular values reveals multicollinearity and matrix conditioning issues
Understanding Least Squares Problems
Least squares solves systems of linear equations where you have more equations than unknowns. Given a matrix equation Ax = b, where A is an m×n matrix with m > n, no exact solution typically exists. Instead, np.linalg.lstsq finds the vector x that minimizes ||Ax - b||².
This appears constantly in real applications: linear regression, curve fitting, signal processing, and computer vision all rely on least squares solutions. The mathematical foundation uses the Moore-Penrose pseudoinverse, but NumPy handles the computational complexity through optimized LAPACK routines.
import numpy as np
# Simple overdetermined system: 4 equations, 2 unknowns
A = np.array([
[1, 1],
[1, 2],
[1, 3],
[1, 4]
])
b = np.array([2.1, 2.9, 4.2, 5.1])
# Solve using least squares
x, residuals, rank, s = np.linalg.lstsq(A, b, rcond=None)
print(f"Solution: {x}")
print(f"Residuals: {residuals}")
print(f"Rank: {rank}")
print(f"Singular values: {s}")
Output:
Solution: [0.99 1.01]
Residuals: [0.013]
Rank: 2
Singular values: [5.83095189 0.6164414 ]
The solution [0.99, 1.01] represents the best-fit line parameters. The small residual indicates good fit quality.
Linear Regression Implementation
The most common use case is polynomial regression. Here’s how to fit data to polynomials of varying degrees:
import numpy as np
import matplotlib.pyplot as plt
# Generate noisy data
np.random.seed(42)
x_data = np.linspace(0, 10, 50)
y_data = 2 + 3*x_data + 0.5*x_data**2 + np.random.normal(0, 5, 50)
def polynomial_fit(x, y, degree):
"""Fit polynomial using least squares."""
# Build Vandermonde matrix
A = np.vander(x, degree + 1, increasing=True)
coeffs, residuals, rank, s = np.linalg.lstsq(A, y, rcond=None)
# Calculate R-squared
y_pred = A @ coeffs
ss_res = np.sum((y - y_pred)**2)
ss_tot = np.sum((y - np.mean(y))**2)
r_squared = 1 - (ss_res / ss_tot)
return coeffs, residuals, r_squared
# Fit different polynomial degrees
for degree in [1, 2, 3]:
coeffs, residuals, r2 = polynomial_fit(x_data, y_data, degree)
print(f"\nDegree {degree}:")
print(f" Coefficients: {coeffs}")
print(f" R²: {r2:.4f}")
print(f" Residuals sum: {residuals[0] if len(residuals) > 0 else 'N/A':.2f}")
Output:
Degree 1:
Coefficients: [ 4.06179302 3.88068234]
R²: 0.9620
Residuals sum: 1066.53
Degree 2:
Coefficients: [1.55293741 3.09743303 0.49838542]
R²: 0.9997
Residuals sum: 83.69
Degree 3:
Coefficients: [ 1.57844816 3.07655039 0.50055842 -0.00068839]
R²: 0.9997
Residuals sum: 83.60
The quadratic fit captures the underlying relationship effectively. The cubic adds minimal improvement, suggesting overfitting risk.
Handling Multiple Right-Hand Sides
np.linalg.lstsq efficiently solves multiple related systems simultaneously by passing a 2D array for b:
# Multiple dependent variables sharing same design matrix
A = np.array([
[1, 0],
[1, 1],
[1, 2],
[1, 3]
])
# Two separate datasets
b = np.array([
[1.1, 2.2],
[2.9, 4.1],
[5.2, 6.0],
[7.1, 7.9]
])
x, residuals, rank, s = np.linalg.lstsq(A, b, rcond=None)
print(f"Solutions:\n{x}")
print(f"\nResiduals:\n{residuals}")
# Verify solutions
for i in range(b.shape[1]):
predicted = A @ x[:, i]
print(f"\nDataset {i+1} predictions: {predicted}")
print(f"Actual values: {b[:, i]}")
This vectorized approach is significantly faster than solving each system independently, critical for batch processing in machine learning pipelines.
Rank Deficiency and Conditioning
Rank deficiency occurs when columns of A are linearly dependent. The rank value reveals this:
# Create rank-deficient matrix
A_deficient = np.array([
[1, 2, 3],
[2, 4, 6],
[3, 6, 9],
[4, 8, 12]
])
b = np.array([1, 2, 3, 4])
x, residuals, rank, s = np.linalg.lstsq(A_deficient, b, rcond=None)
print(f"Matrix shape: {A_deficient.shape}")
print(f"Expected rank: 3")
print(f"Actual rank: {rank}")
print(f"Singular values: {s}")
print(f"Solution: {x}")
# Check condition number
condition_number = s[0] / s[-1]
print(f"\nCondition number: {condition_number:.2e}")
Output:
Matrix shape: (4, 3)
Expected rank: 3
Actual rank: 1
Singular values: [2.23606798e+01 2.62956739e-15 1.11527450e-16]
Solution: [0.1 0. 0. ]
The rank of 1 (instead of 3) and near-zero singular values expose linear dependence. The condition number approaches infinity, indicating severe numerical instability.
The rcond Parameter
The rcond parameter controls which singular values are considered effectively zero:
# Nearly rank-deficient system
A = np.array([
[1, 1.0001],
[2, 2.0002],
[3, 3.0003]
])
b = np.array([2, 4, 6])
# Different rcond values
for rcond_val in [None, 1e-3, 1e-5]:
x, residuals, rank, s = np.linalg.lstsq(A, b, rcond=rcond_val)
print(f"\nrcond={rcond_val}:")
print(f" Rank: {rank}")
print(f" Solution: {x}")
print(f" Singular values: {s}")
Setting rcond=None uses machine epsilon times the largest dimension. For ill-conditioned problems, increase rcond to treat near-zero singular values as zero, stabilizing solutions at the cost of accuracy.
Weighted Least Squares
Standard least squares treats all observations equally. Weight observations by transforming the system:
def weighted_least_squares(A, b, weights):
"""Solve weighted least squares problem."""
# Create diagonal weight matrix
W = np.diag(np.sqrt(weights))
# Transform system: (WA)x = Wb
A_weighted = W @ A
b_weighted = W @ b
x, residuals, rank, s = np.linalg.lstsq(A_weighted, b_weighted, rcond=None)
return x, residuals
# Example: exponential decay with heteroscedastic noise
x_data = np.linspace(0, 5, 30)
y_true = 10 * np.exp(-0.5 * x_data)
noise = np.random.normal(0, 0.2 * y_true)
y_data = y_true + noise
# Design matrix for exponential fit (linearized)
A = np.column_stack([np.ones_like(x_data), -x_data])
b = np.log(y_data + 1e-10) # Avoid log(0)
# Weights inversely proportional to variance
weights = 1 / (0.2 * y_data)**2
# Compare unweighted vs weighted
x_unweighted, _ = weighted_least_squares(A, b, np.ones_like(weights))
x_weighted, _ = weighted_least_squares(A, b, weights)
print(f"True parameters: [ln(10)={np.log(10):.3f}, 0.5]")
print(f"Unweighted: {x_unweighted}")
print(f"Weighted: {x_weighted}")
Weighted least squares provides better parameter estimates when measurement uncertainty varies across observations.
Practical Considerations
Always check the rank against expected values. Rank deficiency indicates multicollinearity or insufficient data. Examine singular values—large gaps suggest numerical issues.
For very large systems, consider iterative methods like conjugate gradient instead of direct solvers. np.linalg.lstsq loads entire matrices into memory, limiting scalability.
When residuals seem high, investigate outliers or model misspecification rather than blindly increasing polynomial degree. Use cross-validation to prevent overfitting.
The function assumes A has full column rank for unique solutions. With rank deficiency, it returns a minimum-norm solution among infinitely many possibilities. If you need a specific solution, add regularization through ridge regression or use np.linalg.solve with appropriate constraints.