How to Project a Vector onto a Subspace in Python

Key Insights

Vector projection onto a subspace is the foundation of linear regression, PCA, and many other statistical methods—understanding it deeply will clarify how these algorithms actually work under the hood.
The projection matrix P = A(A^T A)^(-1)A^T is elegant but numerically unstable; QR decomposition provides the same results with better performance and accuracy for real-world data.
Implementing projection from scratch reveals that fitted values in regression are simply projections onto the column space, while residuals are projections onto the orthogonal complement—two sides of the same mathematical coin.

Introduction to Vector Projection and Subspaces

Vector projection onto a subspace is one of those fundamental operations that appears everywhere in statistics and machine learning, yet many practitioners treat it as a black box. When you fit a linear regression model, you’re projecting your response vector onto the column space of your design matrix. When you run PCA, you’re projecting data onto principal component subspaces. Understanding projection mechanically gives you insight into what these algorithms are actually doing.

A subspace is simply a subset of a vector space that is closed under addition and scalar multiplication. In practical terms, if you have a matrix A with linearly independent columns, those columns span a subspace. Projecting a vector onto this subspace means finding the point in the subspace that’s closest to your original vector—closest in the sense of Euclidean distance.

The key insight is that this closest point is found by dropping a perpendicular from your vector to the subspace. This orthogonality property is what makes projection so useful in statistics: it naturally decomposes vectors into signal (the projection) and noise (the residual).

The Mathematics Behind Subspace Projection

The projection of a vector b onto the column space of a matrix A is given by:

proj = A(A^T A)^(-1)A^T b

The matrix P = A(A^T A)^(-1)A^T is called the projection matrix. It has several important properties:

P is idempotent: P^2 = P (projecting twice is the same as projecting once)
P is symmetric: P^T = P
The residual (I - P)b is orthogonal to the column space of A

Let’s visualize a simple 2D projection to build intuition:

import numpy as np
import matplotlib.pyplot as plt

# Define a vector to project
b = np.array([3, 2])

# Define a line (1D subspace) to project onto
a = np.array([1, 0.5])

# Compute projection using the formula: proj = (a^T b / a^T a) * a
projection = (np.dot(a, b) / np.dot(a, a)) * a

# Visualize
plt.figure(figsize=(8, 6))
plt.arrow(0, 0, b[0], b[1], head_width=0.15, head_length=0.15, 
          fc='blue', ec='blue', label='Original vector')
plt.arrow(0, 0, projection[0], projection[1], head_width=0.15, 
          head_length=0.15, fc='red', ec='red', label='Projection')
plt.arrow(projection[0], projection[1], b[0]-projection[0], 
          b[1]-projection[1], head_width=0.15, head_length=0.15, 
          fc='green', ec='green', linestyle='--', label='Residual')

# Draw the subspace line
t = np.linspace(-1, 4, 100)
plt.plot(t * a[0], t * a[1], 'k--', alpha=0.3, label='Subspace')

plt.xlim(-0.5, 4)
plt.ylim(-0.5, 3)
plt.grid(True, alpha=0.3)
plt.legend()
plt.axis('equal')
plt.title('Vector Projection onto a 1D Subspace')
plt.show()

Implementing Projection with NumPy

Now let’s implement projection for higher-dimensional cases. The direct approach uses the projection matrix formula:

import numpy as np

def project_onto_subspace(b, A):
    """
    Project vector b onto the column space of matrix A.
    
    Parameters:
    b : array-like, shape (n,)
        Vector to project
    A : array-like, shape (n, k)
        Matrix whose column space defines the subspace
        
    Returns:
    projection : array, shape (n,)
        Projection of b onto col(A)
    """
    A = np.asarray(A)
    b = np.asarray(b)
    
    # Compute projection matrix P = A(A^T A)^(-1)A^T
    ATA = A.T @ A
    ATA_inv = np.linalg.inv(ATA)
    P = A @ ATA_inv @ A.T
    
    return P @ b

# Example: Project a 3D vector onto a 2D subspace
b = np.array([1, 2, 3])
A = np.array([
    [1, 0],
    [0, 1],
    [1, 1]
])

projection = project_onto_subspace(b, A)
residual = b - projection

print(f"Original vector: {b}")
print(f"Projection: {projection}")
print(f"Residual: {residual}")
print(f"Residual is orthogonal to A: {np.allclose(A.T @ residual, 0)}")

We can also create a function that returns the projection matrix itself:

def projection_matrix(A):
    """
    Compute the projection matrix onto the column space of A.
    
    Parameters:
    A : array-like, shape (n, k)
        Matrix defining the subspace
        
    Returns:
    P : array, shape (n, n)
        Projection matrix
    """
    A = np.asarray(A)
    ATA_inv = np.linalg.inv(A.T @ A)
    return A @ ATA_inv @ A.T

# Verify idempotency
A = np.random.randn(5, 3)
P = projection_matrix(A)
print(f"P is idempotent: {np.allclose(P @ P, P)}")
print(f"P is symmetric: {np.allclose(P, P.T)}")

Using SciPy for Orthogonal Decomposition

The direct matrix inversion approach has a problem: it’s numerically unstable when A^T A is nearly singular. QR decomposition provides a more robust alternative. If A = QR where Q has orthonormal columns, then the projection simplifies to P = QQ^T.

from scipy.linalg import qr
import numpy as np

def project_qr(b, A):
    """
    Project vector b onto col(A) using QR decomposition.
    More numerically stable than direct inversion.
    """
    Q, R = qr(A, mode='economic')
    return Q @ (Q.T @ b)

# Compare with direct method
np.random.seed(42)
A = np.random.randn(100, 10)
b = np.random.randn(100)

proj_direct = project_onto_subspace(b, A)
proj_qr = project_qr(b, A)

print(f"Methods agree: {np.allclose(proj_direct, proj_qr)}")
print(f"Max difference: {np.max(np.abs(proj_direct - proj_qr))}")

Practical Applications in Statistics

Linear regression is fundamentally a projection operation. When you fit y = Xβ, the fitted values ŷ are the projection of y onto the column space of X:

def linear_regression_projection(X, y):
    """
    Fit linear regression by projecting y onto col(X).
    Returns fitted values and residuals.
    """
    # Add intercept column
    X_with_intercept = np.column_stack([np.ones(len(X)), X])
    
    # Project y onto column space of X
    y_hat = project_qr(y, X_with_intercept)
    residuals = y - y_hat
    
    return y_hat, residuals

# Generate synthetic data
np.random.seed(42)
n = 100
X = np.random.randn(n, 2)
true_beta = np.array([3, -2, 1])  # intercept, beta1, beta2
y = np.column_stack([np.ones(n), X]) @ true_beta + np.random.randn(n) * 0.5

# Fit using projection
y_hat, residuals = linear_regression_projection(X, y)

# Verify residuals are orthogonal to column space
X_with_intercept = np.column_stack([np.ones(len(X)), X])
print(f"Residuals orthogonal to X: {np.allclose(X_with_intercept.T @ residuals, 0)}")
print(f"R-squared: {1 - np.var(residuals) / np.var(y):.4f}")

Let’s visualize the decomposition:

import matplotlib.pyplot as plt

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Plot fitted values vs actual
ax1.scatter(y, y_hat, alpha=0.5)
ax1.plot([y.min(), y.max()], [y.min(), y.max()], 'r--', lw=2)
ax1.set_xlabel('Actual y')
ax1.set_ylabel('Fitted values (projection)')
ax1.set_title('Fitted Values: Projection onto Column Space')
ax1.grid(True, alpha=0.3)

# Plot residuals
ax2.scatter(y_hat, residuals, alpha=0.5)
ax2.axhline(y=0, color='r', linestyle='--', lw=2)
ax2.set_xlabel('Fitted values')
ax2.set_ylabel('Residuals (orthogonal component)')
ax2.set_title('Residuals: Projection onto Orthogonal Complement')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Performance Optimization and Best Practices

Different projection methods have different performance characteristics. Here’s a benchmark:

import time

def benchmark_projection_methods(n, k, n_trials=10):
    """Compare performance of different projection methods."""
    results = {}
    
    for trial in range(n_trials):
        A = np.random.randn(n, k)
        b = np.random.randn(n)
        
        # Method 1: Direct inversion
        start = time.time()
        P = A @ np.linalg.inv(A.T @ A) @ A.T
        proj1 = P @ b
        results.setdefault('direct', []).append(time.time() - start)
        
        # Method 2: QR decomposition
        start = time.time()
        Q, R = qr(A, mode='economic')
        proj2 = Q @ (Q.T @ b)
        results.setdefault('qr', []).append(time.time() - start)
        
        # Method 3: Solve linear system
        start = time.time()
        proj3 = A @ np.linalg.solve(A.T @ A, A.T @ b)
        results.setdefault('solve', []).append(time.time() - start)
    
    for method, times in results.items():
        print(f"{method:10s}: {np.mean(times)*1000:.3f} ms (±{np.std(times)*1000:.3f})")

print("Projection benchmark (n=1000, k=50):")
benchmark_projection_methods(1000, 50)

Best practices:

Use QR decomposition for numerical stability, especially when columns of A are nearly collinear
Use np.linalg.solve instead of explicit inversion when you only need to project one vector
Pre-compute the projection matrix if you’re projecting many vectors onto the same subspace
Check the condition number of A^T A to detect numerical issues: np.linalg.cond(A.T @ A)
Consider SVD for very ill-conditioned problems or when you need to handle rank-deficient matrices

Conclusion and Further Resources

Vector projection onto subspaces is the geometric heart of linear statistics. Once you understand that regression is projection, that residuals are orthogonal complements, and that the projection matrix encodes this entire relationship, many statistical concepts become clearer.

The same projection framework extends to PCA (projecting onto principal component subspaces), partial least squares (iterative projections), and even neural network interpretability (projecting activations onto concept subspaces). Master projection, and you’ve mastered a fundamental building block of modern data science.

For further exploration, investigate how projection relates to the Moore-Penrose pseudoinverse, how kernel methods generalize projection to infinite-dimensional spaces, and how randomized algorithms can approximate projections for massive datasets where exact computation is infeasible.