Linear Algebra: Projections Explained

Key Insights

Projections transform vectors into their closest representation in a subspace, forming the mathematical foundation of linear regression where predictions are projections of outcomes onto the column space of features
The projection matrix P = A(A^T A)^(-1)A^T is idempotent (P² = P), meaning projecting twice gives the same result as projecting once—a property that ensures predictions remain in the model space
For numerical stability with large or ill-conditioned matrices, use QR decomposition instead of computing (A^T A)^(-1) directly, as this avoids catastrophic cancellation errors

Introduction to Projections

Projections are fundamental operations in linear algebra that map vectors onto subspaces. When you project a vector onto a subspace, you find the closest point in that subspace to your original vector. This “closest point” is measured using the Euclidean distance.

In statistics, projections appear everywhere. Linear regression finds the projection of your outcome variable onto the space spanned by your predictors. Principal Component Analysis projects high-dimensional data onto lower-dimensional subspaces that capture maximum variance. Understanding projections deeply will make these statistical methods intuitive rather than mysterious.

The simplest case is projecting a vector onto a line. Imagine shining a light perpendicular to a line—the shadow cast by a vector is its projection onto that line.

import numpy as np
import matplotlib.pyplot as plt

# Create a 2D vector and a line to project onto
v = np.array([3, 2])
u = np.array([4, 1])

# Calculate projection of v onto u
proj_v = (np.dot(v, u) / np.dot(u, u)) * u

# Visualize
fig, ax = plt.subplots(figsize=(8, 6))
ax.quiver(0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, 
          color='blue', width=0.006, label='Original vector v')
ax.quiver(0, 0, u[0], u[1], angles='xy', scale_units='xy', scale=1, 
          color='green', width=0.006, label='Direction u')
ax.quiver(0, 0, proj_v[0], proj_v[1], angles='xy', scale_units='xy', scale=1, 
          color='red', width=0.006, label='Projection of v onto u')
ax.plot([v[0], proj_v[0]], [v[1], proj_v[1]], 'k--', linewidth=1, alpha=0.5)

ax.set_xlim(-0.5, 4.5)
ax.set_ylim(-0.5, 3)
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)
ax.legend()
ax.set_title('Vector Projection in 2D')
plt.show()

print(f"Original vector: {v}")
print(f"Projection: {proj_v}")
print(f"Projection length: {np.linalg.norm(proj_v):.3f}")

Mathematical Foundation

The projection formula for projecting vector v onto vector u is:

proj_u(v) = (v·u / u·u) × u

This formula gives you the component of v that lies in the direction of u. The scalar (v·u / u·u) tells you how much of u you need, and multiplying by u gives you the actual vector.

For projecting onto subspaces (not just lines), we use projection matrices. If A is a matrix whose columns span the subspace, the projection matrix is:

P = A(A^T A)^(-1)A^T

This matrix has two critical properties:

Idempotency: P² = P (projecting twice equals projecting once)
Symmetry: P^T = P (for orthogonal projections)

import numpy as np

def project_onto_vector(v, u):
    """Project vector v onto vector u."""
    return (np.dot(v, u) / np.dot(u, u)) * u

def projection_matrix(A):
    """
    Compute projection matrix onto column space of A.
    P = A(A^T A)^(-1)A^T
    """
    ATA = A.T @ A
    ATA_inv = np.linalg.inv(ATA)
    P = A @ ATA_inv @ A.T
    return P

# Example: project onto a 2D subspace in 3D
A = np.array([
    [1, 0],
    [0, 1],
    [1, 1]
])

P = projection_matrix(A)
print("Projection matrix P:")
print(P)

# Verify idempotency: P² = P
P_squared = P @ P
print("\nIdempotency check (P² - P should be ~0):")
print(np.max(np.abs(P_squared - P)))

# Verify symmetry
print("\nSymmetry check (P - P^T should be ~0):")
print(np.max(np.abs(P - P.T)))

# Project a vector
v = np.array([1, 2, 3])
projected = P @ v
print(f"\nOriginal vector: {v}")
print(f"Projected vector: {projected}")

Projections in Linear Regression

Linear regression is geometrically a projection operation. When you fit a model y = Xβ + ε, you’re projecting the outcome vector y onto the column space of X (the space spanned by your predictors).

The fitted values ŷ = X(X^T X)^(-1)X^T y are exactly the projection of y onto the column space of X. The matrix H = X(X^T X)^(-1)X^T is called the “hat matrix” because it puts a hat on y.

The residuals e = y - ŷ are orthogonal to the column space of X. This orthogonality is why the normal equations X^T(y - Xβ) = 0 work—the residuals must be perpendicular to every column of X.

import numpy as np
from sklearn.linear_model import LinearRegression

# Generate synthetic data
np.random.seed(42)
n_samples = 100
X = np.random.randn(n_samples, 3)
true_coef = np.array([2.5, -1.3, 0.8])
y = X @ true_coef + np.random.randn(n_samples) * 0.5

# Add intercept column
X_with_intercept = np.column_stack([np.ones(n_samples), X])

# Method 1: Using projection matrix directly
H = X_with_intercept @ np.linalg.inv(X_with_intercept.T @ X_with_intercept) @ X_with_intercept.T
y_pred_projection = H @ y

# Method 2: Using sklearn
model = LinearRegression()
model.fit(X, y)
y_pred_sklearn = model.predict(X)

# Compare
print("Predictions match:", np.allclose(y_pred_projection[1:], y_pred_sklearn))

# Verify residuals are orthogonal to column space
residuals = y - y_pred_projection
orthogonality_check = X_with_intercept.T @ residuals
print("\nOrthogonality check (should be ~0):")
print(orthogonality_check)

# Verify hat matrix is idempotent
H_squared = H @ H
print("\nHat matrix idempotency (H² - H should be ~0):")
print(np.max(np.abs(H_squared - H)))

Applications in Dimensionality Reduction

Principal Component Analysis (PCA) uses projections to reduce dimensionality while preserving maximum variance. The principal components are orthogonal directions in feature space, and PCA projects data onto the top k components.

The projection step in PCA is straightforward: multiply your centered data by the top k eigenvectors. What makes PCA special is that these directions are chosen to maximize the variance of the projected data.

import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Generate high-dimensional data with correlation structure
np.random.seed(42)
n_samples = 200
# Create correlated features
base = np.random.randn(n_samples, 2)
X = np.column_stack([
    base[:, 0],
    base[:, 0] + np.random.randn(n_samples) * 0.5,
    base[:, 1],
    base[:, 0] + base[:, 1] + np.random.randn(n_samples) * 0.3,
    np.random.randn(n_samples) * 0.1
])

# Standardize
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Manual PCA implementation showing projection
X_centered = X_scaled - X_scaled.mean(axis=0)
cov_matrix = (X_centered.T @ X_centered) / (n_samples - 1)
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# Sort by eigenvalue
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

# Project onto top 2 components
n_components = 2
projection_matrix = eigenvectors[:, :n_components]
X_pca_manual = X_centered @ projection_matrix

# Compare with sklearn
pca = PCA(n_components=2)
X_pca_sklearn = pca.fit_transform(X_scaled)

print("Manual vs sklearn PCA match:", np.allclose(np.abs(X_pca_manual), np.abs(X_pca_sklearn)))
print(f"\nVariance explained: {pca.explained_variance_ratio_}")
print(f"Total variance retained: {pca.explained_variance_ratio_.sum():.3f}")

# Show that projection loses information
X_reconstructed = X_pca_manual @ projection_matrix.T
reconstruction_error = np.mean((X_centered - X_reconstructed) ** 2)
print(f"Reconstruction error: {reconstruction_error:.6f}")

Practical Implementation Patterns

For numerical stability, avoid computing (A^T A)^(-1) directly when A is large or ill-conditioned. Use QR decomposition instead. If A = QR where Q has orthonormal columns, then P = QQ^T, which is much more stable.

import numpy as np
import time

def projection_matrix_direct(A):
    """Direct method - can be numerically unstable."""
    return A @ np.linalg.inv(A.T @ A) @ A.T

def projection_matrix_qr(A):
    """QR decomposition method - numerically stable."""
    Q, R = np.linalg.qr(A)
    return Q @ Q.T

def projection_matrix_robust(A, rcond=1e-10):
    """
    Robust projection with error handling.
    Uses QR for stability and checks for edge cases.
    """
    if A.size == 0:
        raise ValueError("Matrix A cannot be empty")
    
    if A.ndim != 2:
        raise ValueError("Matrix A must be 2-dimensional")
    
    # Check for zero columns
    col_norms = np.linalg.norm(A, axis=0)
    if np.any(col_norms < rcond):
        raise ValueError("Matrix A has near-zero columns")
    
    # Use QR decomposition for stability
    Q, R = np.linalg.qr(A)
    return Q @ Q.T

# Performance comparison
n, k = 1000, 50
A = np.random.randn(n, k)

# Time direct method
start = time.time()
P_direct = projection_matrix_direct(A)
time_direct = time.time() - start

# Time QR method
start = time.time()
P_qr = projection_matrix_qr(A)
time_qr = time.time() - start

print(f"Direct method time: {time_direct:.4f}s")
print(f"QR method time: {time_qr:.4f}s")
print(f"Results match: {np.allclose(P_direct, P_qr)}")

# Test numerical stability with ill-conditioned matrix
A_ill = np.random.randn(100, 10)
A_ill[:, -1] = A_ill[:, 0] * 1e-10  # Nearly dependent column

try:
    P_direct_ill = projection_matrix_direct(A_ill)
    print("\nDirect method succeeded on ill-conditioned matrix")
except np.linalg.LinAlgError:
    print("\nDirect method failed on ill-conditioned matrix")

try:
    P_robust_ill = projection_matrix_robust(A_ill)
    print("Robust method succeeded")
except ValueError as e:
    print(f"Robust method caught issue: {e}")

Common Pitfalls and Best Practices

Numerical precision matters. When checking properties like idempotency, use tolerance-based comparisons with np.allclose() rather than exact equality. Floating-point arithmetic introduces small errors.

Choose the right method. For small, well-conditioned matrices, the direct formula is fine. For large matrices or when numerical stability is crucial (like in production systems), use QR decomposition.

Verify your assumptions. Check that your projection matrix is actually idempotent and symmetric. If these properties don’t hold, something is wrong with your computation.

import numpy as np

# Example of numerical instability
A = np.array([[1, 1], [1, 1.0000001], [1, 1.0000002]])

# This should fail the robust check
try:
    P = projection_matrix_robust(A, rcond=1e-6)
except ValueError as e:
    print(f"Caught expected error: {e}")

# Direct method might give nonsensical results
P_direct = projection_matrix_direct(A)
print(f"\nIdempotency error with direct method: {np.max(np.abs(P_direct @ P_direct - P_direct))}")

# QR method handles it better
P_qr = projection_matrix_qr(A)
print(f"Idempotency error with QR method: {np.max(np.abs(P_qr @ P_qr - P_qr))}")

# Best practice: always verify projection properties
def verify_projection_matrix(P, tol=1e-8):
    """Verify that P is a valid projection matrix."""
    is_idempotent = np.allclose(P @ P, P, atol=tol)
    is_symmetric = np.allclose(P, P.T, atol=tol)
    
    if not is_idempotent:
        print(f"WARNING: Matrix is not idempotent (max error: {np.max(np.abs(P @ P - P))})")
    if not is_symmetric:
        print(f"WARNING: Matrix is not symmetric (max error: {np.max(np.abs(P - P.T))})")
    
    return is_idempotent and is_symmetric

print("\nVerifying QR projection matrix:")
verify_projection_matrix(P_qr)

Projections are the geometric heart of linear statistical methods. Master them, and you’ll understand regression, PCA, and many other techniques at a deeper level. Always prioritize numerical stability in production code, verify mathematical properties in your tests, and remember that projections are fundamentally about finding the best approximation in a subspace.