How to Solve Systems of Linear Equations in Python

Key Insights

NumPy’s linalg.solve() works for square systems, but SciPy’s lstsq() handles overdetermined systems common in regression and curve fitting problems
For systems you need to solve repeatedly with different right-hand sides, LU decomposition reduces computation from O(n³) to O(n²) per solution
Sparse iterative solvers like spsolve() or cg() can be 100x faster than direct methods for large systems with mostly zero coefficients

Introduction to Systems of Linear Equations

Systems of linear equations appear everywhere in data science: linear regression, optimization, computer graphics, and network analysis all rely on solving Ax = b efficiently. The equation represents a matrix A multiplied by an unknown vector x to produce a known result vector b.

Consider a simple example: you’re analyzing a dataset and need to fit coefficients for a linear model, or you’re balancing chemical equations, or calculating electrical currents in a circuit. All of these reduce to solving systems of linear equations.

Python offers multiple approaches depending on your system’s characteristics: is it square or rectangular? Dense or sparse? Do you need one solution or many? Let’s explore the practical tools you need.

import numpy as np

# A simple 2x2 system: 2x + y = 5, x - y = 1
A = np.array([[2, 1],
              [1, -1]])
b = np.array([5, 1])

# Solution: x = 2, y = 1
x = np.linalg.solve(A, b)
print(f"Solution: x = {x[0]}, y = {x[1]}")

Using NumPy’s Linear Algebra Module

For square systems where the number of equations equals the number of unknowns, numpy.linalg.solve() is your first choice. It’s fast, built on optimized LAPACK routines, and handles most standard cases.

The method assumes your matrix is non-singular (invertible). If it’s singular or nearly singular, you’ll get a LinAlgError or numerical garbage. Always validate your inputs.

import numpy as np

# A 3x3 system representing three planes intersecting at a point
# 2x + y - z = 8
# -3x - y + 2z = -11
# -2x + y + 2z = -3

A = np.array([[2, 1, -1],
              [-3, -1, 2],
              [-2, 1, 2]])
b = np.array([8, -11, -3])

try:
    x = np.linalg.solve(A, b)
    print(f"Solution: {x}")
    
    # Verify the solution
    residual = np.allclose(A @ x, b)
    print(f"Solution verified: {residual}")
    
except np.linalg.LinAlgError:
    print("Matrix is singular - no unique solution exists")

This works beautifully for well-conditioned square systems. But real-world problems often aren’t square—you might have more equations than unknowns (overdetermined) or fewer equations than unknowns (underdetermined).

Solving with SciPy for Advanced Cases

SciPy’s linear algebra module extends NumPy with better numerical stability and handles non-square systems. For overdetermined systems—common in regression where you have more data points than parameters—scipy.linalg.lstsq() finds the least squares solution that minimizes the error.

import numpy as np
from scipy import linalg

# Overdetermined system: fitting a line y = mx + b to 5 points
# More equations (5) than unknowns (2)
x_data = np.array([1, 2, 3, 4, 5])
y_data = np.array([2.1, 3.9, 6.1, 7.9, 10.2])

# Build the design matrix for y = mx + b
A = np.column_stack([x_data, np.ones(len(x_data))])
b = y_data

# Least squares solution
solution, residuals, rank, s = linalg.lstsq(A, b)
m, b_intercept = solution

print(f"Line equation: y = {m:.3f}x + {b_intercept:.3f}")
print(f"Residual sum of squares: {residuals[0]:.4f}")
print(f"Matrix rank: {rank}")

The lstsq() function returns four values: the solution, residuals (sum of squared errors), matrix rank, and singular values. These diagnostics help you understand solution quality.

For underdetermined systems (fewer equations than unknowns), lstsq() returns the minimum-norm solution—the solution with the smallest magnitude.

Matrix Decomposition Methods

When you need to solve Ax = b for many different b vectors with the same A, decomposition methods become essential. Computing the full solution each time wastes effort recalculating the same matrix properties.

LU decomposition factors A into lower and upper triangular matrices. You compute this once at O(n³) cost, then solve for each new b at only O(n²) cost.

import numpy as np
from scipy import linalg

# System matrix that we'll solve with multiple right-hand sides
A = np.array([[4, 3, 2],
              [1, 2, 3],
              [2, 1, 4]], dtype=float)

# Perform LU decomposition once
lu, piv = linalg.lu_factor(A)

# Solve for multiple different b vectors efficiently
b1 = np.array([1, 2, 3])
b2 = np.array([4, 5, 6])
b3 = np.array([7, 8, 9])

x1 = linalg.lu_solve((lu, piv), b1)
x2 = linalg.lu_solve((lu, piv), b2)
x3 = linalg.lu_solve((lu, piv), b3)

print(f"Solution 1: {x1}")
print(f"Solution 2: {x2}")
print(f"Solution 3: {x3}")

# Verify
print(f"Verification: {np.allclose(A @ x1, b1)}")

QR decomposition is another option, particularly useful for least squares problems and when numerical stability is critical. It’s slower than LU but more stable for ill-conditioned matrices.

Iterative Solvers for Sparse Systems

Large systems with mostly zero entries (sparse matrices) appear in finite element analysis, graph algorithms, and image processing. Storing and operating on these as dense matrices wastes memory and computation.

Sparse solvers exploit this structure. For a million-by-million matrix that’s 99.9% zeros, sparse methods can be orders of magnitude faster.

import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse.linalg import spsolve, cg

# Create a sparse tridiagonal system (common in differential equations)
n = 1000
main_diag = 4 * np.ones(n)
off_diag = -1 * np.ones(n-1)

diagonals = [main_diag, off_diag, off_diag]
A_sparse = csr_matrix(
    np.diag(main_diag) + 
    np.diag(off_diag, 1) + 
    np.diag(off_diag, -1)
)

b = np.ones(n)

# Direct sparse solver
x_direct = spsolve(A_sparse, b)

# Iterative conjugate gradient (for symmetric positive definite)
x_cg, info = cg(A_sparse, b, tol=1e-6)

print(f"Direct solver residual: {np.linalg.norm(A_sparse @ x_direct - b):.2e}")
print(f"CG solver residual: {np.linalg.norm(A_sparse @ x_cg - b):.2e}")
print(f"CG convergence info: {info}")  # 0 means successful convergence

Use spsolve() for direct sparse solving, cg() for symmetric positive definite systems, or gmres() for general non-symmetric systems. These iterative methods trade some accuracy (controlled by tolerance) for massive speed improvements on large sparse systems.

Practical Example: Linear Regression from Scratch

Linear regression is fundamentally about solving the normal equations: X^T X β = X^T y. This is a system of linear equations where β contains your regression coefficients.

import numpy as np
from scipy import linalg
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression

# Generate synthetic regression data
X, y = make_regression(n_samples=100, n_features=3, noise=10, random_state=42)

# Add intercept column
X_with_intercept = np.column_stack([np.ones(len(X)), X])

# Solve normal equations: (X^T X) β = X^T y
XtX = X_with_intercept.T @ X_with_intercept
Xty = X_with_intercept.T @ y

# Method 1: Direct solve
beta_direct = linalg.solve(XtX, Xty)

# Method 2: Least squares (more stable)
beta_lstsq, *_ = linalg.lstsq(X_with_intercept, y)

# Method 3: sklearn for comparison
model = LinearRegression()
model.fit(X, y)
beta_sklearn = np.concatenate([[model.intercept_], model.coef_])

print("Direct solve coefficients:", beta_direct)
print("Least squares coefficients:", beta_lstsq)
print("Sklearn coefficients:", beta_sklearn)

# Predictions
y_pred = X_with_intercept @ beta_lstsq
mse = np.mean((y - y_pred) ** 2)
print(f"\nMean squared error: {mse:.2f}")

The least squares approach via lstsq() is more numerically stable than forming X^T X explicitly, especially when X is ill-conditioned. This matters for real datasets with correlated features.

Best Practices and Performance Tips

Choose your solver based on matrix properties. Here’s a decision framework:

Square, well-conditioned, dense: Use np.linalg.solve() Overdetermined (regression): Use scipy.linalg.lstsq() Repeated solves, same A: Use LU decomposition Large and sparse: Use scipy.sparse.linalg.spsolve() or iterative solvers Ill-conditioned: Consider regularization or SVD-based methods

Always check matrix conditioning before solving. The condition number indicates how sensitive your solution is to small input changes:

import numpy as np
from scipy import linalg

def solve_with_diagnostics(A, b):
    """Solve Ax=b with numerical stability checks"""
    
    # Check condition number
    cond = np.linalg.cond(A)
    print(f"Condition number: {cond:.2e}")
    
    if cond > 1e10:
        print("Warning: Matrix is ill-conditioned")
        print("Consider regularization or using lstsq()")
        # Use least squares for stability
        x, *_ = linalg.lstsq(A, b)
    else:
        # Standard solve is fine
        x = linalg.solve(A, b)
    
    # Verify solution quality
    residual = np.linalg.norm(A @ x - b)
    print(f"Residual norm: {residual:.2e}")
    
    return x

# Test with a well-conditioned matrix
A_good = np.array([[4, 1], [1, 3]], dtype=float)
b = np.array([1, 2])
x = solve_with_diagnostics(A_good, b)

# Test with an ill-conditioned matrix
A_bad = np.array([[1, 1], [1, 1.0001]], dtype=float)
x = solve_with_diagnostics(A_bad, b)

A condition number above 10^12 means you’re likely to lose precision in double-precision arithmetic. Consider using lstsq() or adding regularization.

For production code, always verify solutions by computing the residual ||Ax - b|| and checking it’s within acceptable bounds. Numerical errors accumulate, and validation catches problems before they propagate.