How to Solve Least Squares Problems in Python

Key Insights

NumPy’s lstsq() solves linear least squares efficiently using SVD decomposition, while SciPy’s least_squares() handles non-linear problems with bounds and constraints
Always examine residuals and condition numbers—large residuals indicate poor model fit, while high condition numbers signal numerical instability
Weighted least squares is essential when measurement uncertainties vary across observations, preventing noisy data from dominating your fit

Introduction to Least Squares

Least squares is the workhorse of data fitting and parameter estimation. The core idea is simple: find model parameters that minimize the sum of squared differences between observed data and predicted values. This approach appears everywhere—linear regression, curve fitting, sensor calibration, computer vision, and signal processing.

Python offers multiple tools for least squares problems, each suited to different scenarios. NumPy handles standard linear problems efficiently. SciPy extends this to non-linear cases with constraints. Understanding when to use each tool and how to interpret results separates competent practitioners from those who blindly fit curves.

Mathematical Foundation

The least squares objective minimizes:

S = Σ(yᵢ - f(xᵢ, β))²

where yᵢ are observed values, f(xᵢ, β) is your model with parameters β, and the sum runs over all data points. Each term (yᵢ - f(xᵢ, β)) is a residual—the vertical distance from data to model.

For linear problems where f(x, β) = Xβ, the solution satisfies the normal equations: XᵀXβ = Xᵀy. Modern implementations use SVD decomposition instead of directly solving normal equations, providing better numerical stability.

Let’s visualize what we’re minimizing:

import numpy as np
import matplotlib.pyplot as plt

# Generate noisy data
np.random.seed(42)
x = np.linspace(0, 10, 20)
y_true = 2.5 * x + 1.0
y_noisy = y_true + np.random.normal(0, 2, len(x))

# Fit line
coeffs = np.polyfit(x, y_noisy, 1)
y_fit = coeffs[0] * x + coeffs[1]

# Plot with residuals
plt.figure(figsize=(10, 6))
plt.scatter(x, y_noisy, label='Data', alpha=0.6)
plt.plot(x, y_fit, 'r-', label='Fitted line', linewidth=2)

# Draw residual lines
for i in range(len(x)):
    plt.plot([x[i], x[i]], [y_noisy[i], y_fit[i]], 'g--', alpha=0.3)

plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.title('Least Squares: Minimizing Residuals (green lines)')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('residuals.png', dpi=150)

The green dashed lines represent residuals. Least squares finds parameters that minimize the sum of their squared lengths.

Linear Least Squares with NumPy

NumPy’s linalg.lstsq() is your first choice for linear problems. It’s fast, numerically stable, and returns diagnostic information.

Simple Linear Regression

import numpy as np

# Generate data
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
y = np.array([2.1, 3.9, 6.2, 8.1, 9.8, 12.2, 14.1, 15.9, 18.2, 20.1])

# Construct design matrix for y = mx + b
A = np.vstack([x, np.ones(len(x))]).T

# Solve least squares
result = np.linalg.lstsq(A, y, rcond=None)
m, b = result[0]
residuals_sum = result[1][0]
rank = result[2]
singular_values = result[3]

print(f"Slope: {m:.4f}, Intercept: {b:.4f}")
print(f"Sum of squared residuals: {residuals_sum:.4f}")
print(f"Condition number: {singular_values[0]/singular_values[-1]:.2f}")

The condition number (ratio of largest to smallest singular value) indicates numerical stability. Values above 10⁸ suggest ill-conditioning—your problem may be sensitive to small data changes.

Polynomial Fitting

For polynomials, np.polyfit() provides a convenient wrapper:

# Fit cubic polynomial
x = np.linspace(0, 10, 50)
y_true = 0.5 * x**3 - 2 * x**2 + x + 5
y_noisy = y_true + np.random.normal(0, 10, len(x))

# Fit different polynomial degrees
degrees = [1, 2, 3, 5]
plt.figure(figsize=(12, 8))

for i, deg in enumerate(degrees, 1):
    coeffs = np.polyfit(x, y_noisy, deg)
    y_fit = np.polyval(coeffs, x)
    
    plt.subplot(2, 2, i)
    plt.scatter(x, y_noisy, alpha=0.3, s=20)
    plt.plot(x, y_fit, 'r-', linewidth=2)
    plt.title(f'Degree {deg}')
    plt.grid(True, alpha=0.3)

plt.tight_layout()

Higher-degree polynomials fit training data better but risk overfitting. Always validate on held-out data.

SciPy’s Optimization Approach

Non-linear problems require iterative optimization. SciPy’s optimize.least_squares() implements trust-region algorithms that handle bounds and constraints.

Non-linear Curve Fitting

from scipy.optimize import least_squares

# Exponential decay model
def exponential_residuals(params, x, y):
    a, b, c = params
    return y - (a * np.exp(-b * x) + c)

# Generate synthetic data
x_data = np.linspace(0, 5, 50)
y_true = 3 * np.exp(-0.5 * x_data) + 1
y_data = y_true + np.random.normal(0, 0.1, len(x_data))

# Initial guess
params_init = [1.0, 1.0, 0.0]

# Solve with bounds
result = least_squares(
    exponential_residuals,
    params_init,
    args=(x_data, y_data),
    bounds=([0, 0, -np.inf], [np.inf, np.inf, np.inf])
)

a_fit, b_fit, c_fit = result.x
print(f"Fitted parameters: a={a_fit:.3f}, b={b_fit:.3f}, c={c_fit:.3f}")
print(f"Success: {result.success}, Cost: {result.cost:.6f}")

# Plot results
y_fit = a_fit * np.exp(-b_fit * x_data) + c_fit
plt.figure(figsize=(10, 6))
plt.scatter(x_data, y_data, label='Data', alpha=0.6)
plt.plot(x_data, y_fit, 'r-', label='Fit', linewidth=2)
plt.plot(x_data, y_true, 'g--', label='True', linewidth=2)
plt.legend()
plt.grid(True, alpha=0.3)

The bounds parameter prevents physically meaningless solutions (like negative decay rates). Always use bounds when you have domain knowledge about valid parameter ranges.

Practical Example: Sensor Calibration

Real sensors often have non-linear response curves. Let’s calibrate a temperature sensor with a known reference:

import numpy as np
from scipy.optimize import least_squares
import matplotlib.pyplot as plt

# Simulated sensor readings vs true temperature
true_temp = np.array([0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
sensor_reading = np.array([
    -2.1, 8.5, 18.2, 28.9, 38.1, 48.5, 57.8, 67.2, 76.9, 86.1, 95.8
])

# Non-linear calibration model: T_true = a*R^2 + b*R + c
def calibration_residuals(params, readings, true_temps):
    a, b, c = params
    predicted = a * readings**2 + b * readings + c
    return true_temps - predicted

# Fit calibration curve
params_init = [0.0, 1.0, 0.0]
result = least_squares(
    calibration_residuals,
    params_init,
    args=(sensor_reading, true_temp)
)

a, b, c = result.x

# Evaluate fit quality
predicted_temp = a * sensor_reading**2 + b * sensor_reading + c
residuals = true_temp - predicted_temp
rmse = np.sqrt(np.mean(residuals**2))
max_error = np.max(np.abs(residuals))

print(f"Calibration coefficients: a={a:.6f}, b={b:.6f}, c={c:.6f}")
print(f"RMSE: {rmse:.3f}°C, Max error: {max_error:.3f}°C")

# Visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Calibration curve
readings_smooth = np.linspace(sensor_reading.min(), sensor_reading.max(), 200)
temp_smooth = a * readings_smooth**2 + b * readings_smooth + c
ax1.scatter(sensor_reading, true_temp, s=100, alpha=0.6, label='Data')
ax1.plot(readings_smooth, temp_smooth, 'r-', linewidth=2, label='Calibration')
ax1.set_xlabel('Sensor Reading')
ax1.set_ylabel('True Temperature (°C)')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Residual plot
ax2.scatter(true_temp, residuals, s=100, alpha=0.6)
ax2.axhline(y=0, color='r', linestyle='--', linewidth=2)
ax2.set_xlabel('True Temperature (°C)')
ax2.set_ylabel('Residual (°C)')
ax2.grid(True, alpha=0.3)

plt.tight_layout()

The residual plot is crucial. Random scatter around zero indicates good fit. Patterns suggest model inadequacy or outliers.

Weighted Least Squares

When measurement uncertainties vary, weighted least squares prevents noisy data from dominating. Each residual is divided by its uncertainty:

# Data with varying uncertainties
x = np.array([1, 2, 3, 4, 5])
y = np.array([2.1, 4.0, 5.8, 8.2, 9.9])
uncertainties = np.array([0.1, 0.2, 0.5, 0.2, 0.1])  # Measurement errors

# Construct weighted design matrix
weights = 1.0 / uncertainties
A = np.vstack([x, np.ones(len(x))]).T
A_weighted = A * weights[:, np.newaxis]
y_weighted = y * weights

# Solve weighted least squares
result = np.linalg.lstsq(A_weighted, y_weighted, rcond=None)
m, b = result[0]

print(f"Weighted fit: y = {m:.3f}x + {b:.3f}")

# Compare with unweighted
result_unweighted = np.linalg.lstsq(A, y, rcond=None)
m_uw, b_uw = result_unweighted[0]
print(f"Unweighted fit: y = {m_uw:.3f}x + {b_uw:.3f}")

Weighted least squares gives more influence to precise measurements, producing more reliable parameter estimates.

Best Practices and Common Pitfalls

Scale your data. Features with different magnitudes cause numerical issues. Normalize inputs to similar ranges before fitting.

Check condition numbers. High values (>10⁸) indicate near-singular systems. Consider regularization or removing redundant features.

Examine residuals. Plot residuals vs. predicted values and vs. each input variable. Patterns indicate model misspecification.

Use appropriate solvers. NumPy’s lstsq() for linear problems, SciPy’s least_squares() for non-linear or constrained problems. Don’t use iterative methods when analytical solutions exist.

Validate assumptions. Least squares assumes independent, normally distributed errors with constant variance. Violations require different approaches (robust regression, generalized least squares).

Avoid overfitting. More parameters always reduce training error. Use cross-validation or information criteria (AIC, BIC) to select model complexity.

Set sensible bounds. Physical constraints prevent nonsensical solutions and improve convergence in non-linear problems.

Least squares is powerful but not universal. When outliers dominate, consider robust methods like RANSAC or Huber loss. When errors aren’t normally distributed, maximum likelihood estimation may be more appropriate. But for most engineering and scientific applications, least squares remains the first tool to reach for—fast, interpretable, and remarkably effective.