Linear Algebra: Vector Spaces Explained

Key Insights

Vector spaces provide the mathematical foundation for machine learning algorithms, from linear regression to neural networks, by formalizing how we manipulate high-dimensional data
Understanding subspaces and linear independence is critical for dimensionality reduction techniques like PCA, which project data onto lower-dimensional subspaces while preserving variance
Every matrix operation in data science—from feature transformations to embedding layers—represents a linear transformation between vector spaces

Introduction to Vector Spaces

Vector spaces are the backbone of modern data science and machine learning. While the formal definition might seem abstract, every time you work with a dataset, apply a transformation, or train a neural network, you’re operating within vector spaces.

A vector space V over a field F (typically real numbers ℝ) is a set equipped with two operations: vector addition and scalar multiplication. These operations must satisfy ten specific axioms that we’ll explore shortly. Think of a vector space as a playground where vectors can be added together and scaled, with predictable, consistent rules.

Why does this matter? In machine learning, your data lives in vector spaces. A grayscale image is a vector in ℝ^(width×height). Word embeddings like Word2Vec represent words as vectors in ℝ^300 or similar. When you perform PCA, you’re finding a new basis for your vector space that captures maximum variance. Understanding vector spaces isn’t academic—it’s fundamental to reasoning about your data.

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Create simple 2D and 3D vectors
v1_2d = np.array([2, 3])
v2_2d = np.array([1, -1])

v1_3d = np.array([1, 2, 3])
v2_3d = np.array([4, -1, 2])

# Vector addition and scalar multiplication
v_sum = v1_2d + v2_2d
v_scaled = 2.5 * v1_2d

print(f"2D Vector 1: {v1_2d}")
print(f"2D Vector 2: {v2_2d}")
print(f"Sum: {v_sum}")
print(f"Scaled v1 by 2.5: {v_scaled}")

Core Properties and Axioms

Vector spaces must satisfy ten axioms. These aren’t arbitrary rules—they ensure vector spaces behave consistently and predictably. Let me break down the key ones:

Closure axioms:

Adding two vectors yields another vector in the space
Multiplying a vector by a scalar yields another vector in the space

Associativity and commutativity:

(u + v) + w = u + (v + w)
u + v = v + u

Identity and inverse elements:

There exists a zero vector 0 such that v + 0 = v
For every vector v, there exists -v such that v + (-v) = 0

Distributivity:

a(u + v) = au + av
(a + b)v = av + bv

These axioms might seem obvious for familiar spaces like ℝ^n, but they’re powerful because they apply to many non-obvious spaces too.

def verify_vector_space_properties(vectors, scalars):
    """Demonstrate key vector space axioms"""
    v1, v2, v3 = vectors[:3]
    a, b = scalars[:2]
    
    # Closure under addition
    print("Closure under addition:")
    print(f"v1 + v2 = {v1 + v2}")
    
    # Commutativity
    print(f"\nCommutativity: v1 + v2 == v2 + v1")
    print(np.allclose(v1 + v2, v2 + v1))
    
    # Associativity
    print(f"\nAssociativity: (v1 + v2) + v3 == v1 + (v2 + v3)")
    print(np.allclose((v1 + v2) + v3, v1 + (v2 + v3)))
    
    # Zero vector
    zero = np.zeros_like(v1)
    print(f"\nZero vector: v1 + 0 == v1")
    print(np.allclose(v1 + zero, v1))
    
    # Additive inverse
    print(f"\nAdditive inverse: v1 + (-v1) == 0")
    print(np.allclose(v1 + (-v1), zero))
    
    # Distributivity
    print(f"\nDistributivity: a(v1 + v2) == av1 + av2")
    print(np.allclose(a * (v1 + v2), a * v1 + a * v2))

# Test with real vectors
vectors = [np.array([1, 2, 3]), np.array([4, 5, 6]), np.array([7, 8, 9])]
scalars = [2.0, 3.0]
verify_vector_space_properties(vectors, scalars)

Subspaces, Span, and Linear Independence

A subspace is a subset of a vector space that is itself a vector space under the same operations. The most important subspaces are those generated by spanning sets.

The span of a set of vectors is all possible linear combinations of those vectors. If vectors v₁, v₂, …, vₙ span a space V, then every vector in V can be written as a₁v₁ + a₂v₂ + … + aₙvₙ for some scalars aᵢ.

Vectors are linearly independent if no vector can be written as a linear combination of the others. A basis is a linearly independent spanning set—the minimal set of vectors needed to span the space. The number of vectors in a basis is the dimension of the space.

This is crucial for machine learning. When you have highly correlated features, they’re linearly dependent—you can drop some without losing information. PCA finds a new basis of linearly independent directions (principal components) ordered by importance.

def check_linear_independence(vectors):
    """Check if vectors are linearly independent using matrix rank"""
    # Stack vectors as columns
    matrix = np.column_stack(vectors)
    rank = np.linalg.matrix_rank(matrix)
    n_vectors = len(vectors)
    
    print(f"Number of vectors: {n_vectors}")
    print(f"Matrix rank: {rank}")
    print(f"Linearly independent: {rank == n_vectors}")
    
    return rank == n_vectors

# Example: linearly independent vectors
v1 = np.array([1, 0, 0])
v2 = np.array([0, 1, 0])
v3 = np.array([0, 0, 1])
print("Testing standard basis vectors:")
check_linear_independence([v1, v2, v3])

# Example: linearly dependent vectors
v1 = np.array([1, 2, 3])
v2 = np.array([2, 4, 6])  # v2 = 2*v1
v3 = np.array([1, 1, 1])
print("\nTesting dependent vectors:")
check_linear_independence([v1, v2, v3])

# Visualize span in 2D
def visualize_span_2d(v1, v2):
    """Visualize the span of two 2D vectors"""
    fig, ax = plt.subplots(figsize=(8, 8))
    
    # Plot original vectors
    ax.quiver(0, 0, v1[0], v1[1], angles='xy', scale_units='xy', 
              scale=1, color='r', width=0.006, label='v1')
    ax.quiver(0, 0, v2[0], v2[1], angles='xy', scale_units='xy', 
              scale=1, color='b', width=0.006, label='v2')
    
    # Plot linear combinations
    for a in np.linspace(-2, 2, 20):
        for b in np.linspace(-2, 2, 20):
            combo = a * v1 + b * v2
            ax.plot(combo[0], combo[1], 'k.', markersize=1, alpha=0.3)
    
    ax.set_xlim(-10, 10)
    ax.set_ylim(-10, 10)
    ax.grid(True)
    ax.legend()
    ax.set_title('Span of two vectors')
    plt.savefig('span_visualization.png', dpi=150, bbox_inches='tight')

v1 = np.array([2, 1])
v2 = np.array([1, 3])
visualize_span_2d(v1, v2)

Common Vector Spaces in Practice

ℝⁿ (Euclidean spaces): The most familiar vector spaces. ℝ² is the 2D plane, ℝ³ is 3D space, and ℝⁿ extends to n dimensions. Your dataset with n features lives in ℝⁿ.

Function spaces: Functions can be vectors! The space of continuous functions on [0,1] is a vector space where you add functions pointwise: (f + g)(x) = f(x) + g(x). This perspective is fundamental in functional analysis and kernel methods.

Matrix spaces: The set of all m×n matrices forms a vector space of dimension m×n. This is relevant when working with convolutional filters or attention matrices.

Polynomial spaces: The space of polynomials of degree ≤ n forms an (n+1)-dimensional vector space. The polynomials 1, x, x², …, xⁿ form a basis.

from sympy import symbols, expand, Poly

# Polynomial space example
x = symbols('x')

# Define polynomials (vectors in polynomial space)
p1 = 2*x**2 + 3*x + 1
p2 = x**2 - x + 4
p3 = -x**2 + 2*x - 1

# Vector addition in polynomial space
p_sum = expand(p1 + p2)
print(f"p1 + p2 = {p_sum}")

# Scalar multiplication
p_scaled = expand(3 * p1)
print(f"3 * p1 = {p_scaled}")

# Check linear independence of polynomials
def poly_to_vector(poly, degree):
    """Convert polynomial to coefficient vector"""
    coeffs = Poly(poly, x).all_coeffs()
    # Pad with zeros if needed
    return np.array(coeffs + [0] * (degree + 1 - len(coeffs)))

vectors = [poly_to_vector(p, 2) for p in [p1, p2, p3]]
print("\nPolynomial coefficient vectors:")
for v in vectors:
    print(v)

Linear Transformations and Matrices

Every matrix represents a linear transformation between vector spaces. A linear transformation T: V → W satisfies:

T(u + v) = T(u) + T(v)
T(αv) = αT(v)

The kernel (null space) is the set of vectors that map to zero: {v : T(v) = 0}. The image (column space) is the set of all possible outputs: {T(v) : v ∈ V}.

These concepts are everywhere in machine learning. The kernel of your feature matrix tells you about redundant features. The image tells you what outputs are achievable.

def analyze_linear_transformation(A):
    """Analyze kernel and image of a linear transformation"""
    print("Transformation matrix A:")
    print(A)
    
    # Compute kernel (null space)
    from scipy.linalg import null_space
    kernel = null_space(A)
    print(f"\nKernel dimension: {kernel.shape[1]}")
    print(f"Kernel basis:\n{kernel}")
    
    # Compute image (column space) via rank
    rank = np.linalg.matrix_rank(A)
    print(f"\nImage dimension (rank): {rank}")
    
    return kernel, rank

# Example: projection matrix onto xy-plane in 3D
A = np.array([
    [1, 0, 0],
    [0, 1, 0],
    [0, 0, 0]
])

kernel, rank = analyze_linear_transformation(A)

# Visualize transformation
def visualize_transformation(A):
    """Visualize how a 2D transformation affects unit vectors"""
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
    
    # Original space
    unit_circle = np.array([[np.cos(t), np.sin(t)] 
                            for t in np.linspace(0, 2*np.pi, 100)]).T
    ax1.plot(unit_circle[0], unit_circle[1], 'b-', linewidth=2)
    ax1.set_aspect('equal')
    ax1.grid(True)
    ax1.set_title('Original Space')
    
    # Transformed space
    transformed = A @ unit_circle
    ax2.plot(transformed[0], transformed[1], 'r-', linewidth=2)
    ax2.set_aspect('equal')
    ax2.grid(True)
    ax2.set_title('Transformed Space')
    
    plt.savefig('transformation_visualization.png', dpi=150, bbox_inches='tight')

# 2D transformation example
A_2d = np.array([[2, 1], [0, 1]])
visualize_transformation(A_2d)

Applications in Machine Learning

Feature spaces: Every ML model operates on a feature space. Linear regression finds the best hyperplane in your feature space. Neural networks learn non-linear transformations between vector spaces.

PCA: Principal Component Analysis finds a new orthonormal basis for your data’s vector space, ordered by variance. The first k principal components span a k-dimensional subspace that best approximates your data.

Word embeddings: Word2Vec and similar models represent words as vectors in ℝᵈ where semantic relationships become geometric. The famous example: king - man + woman ≈ queen is vector arithmetic in this space.

def simple_pca(X, n_components):
    """Implement PCA from scratch"""
    # Center the data
    X_centered = X - np.mean(X, axis=0)
    
    # Compute covariance matrix
    cov_matrix = np.cov(X_centered.T)
    
    # Compute eigenvectors and eigenvalues
    eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
    
    # Sort by eigenvalues
    idx = eigenvalues.argsort()[::-1]
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    
    # Select top k eigenvectors (new basis)
    principal_components = eigenvectors[:, :n_components]
    
    # Project data onto new basis
    X_pca = X_centered @ principal_components
    
    # Explained variance ratio
    explained_variance = eigenvalues[:n_components] / np.sum(eigenvalues)
    
    return X_pca, principal_components, explained_variance

# Generate sample data
np.random.seed(42)
X = np.random.randn(100, 5)  # 100 samples, 5 features

# Apply PCA
X_pca, components, var_explained = simple_pca(X, n_components=2)

print(f"Original shape: {X.shape}")
print(f"Reduced shape: {X_pca.shape}")
print(f"Explained variance: {var_explained}")
print(f"Total variance explained: {np.sum(var_explained):.2%}")

# Word embedding vector arithmetic simulation
def word_vector_arithmetic():
    """Demonstrate word embedding arithmetic"""
    # Simplified word vectors (in practice, these are learned)
    embeddings = {
        'king': np.array([0.5, 0.9, 0.1]),
        'queen': np.array([0.5, 0.9, -0.1]),
        'man': np.array([0.3, 0.2, 0.8]),
        'woman': np.array([0.3, 0.2, -0.8])
    }
    
    # Vector arithmetic: king - man + woman ≈ queen
    result = embeddings['king'] - embeddings['man'] + embeddings['woman']
    
    print("\nWord embedding arithmetic:")
    print(f"king - man + woman = {result}")
    print(f"queen = {embeddings['queen']}")
    print(f"Cosine similarity: {np.dot(result, embeddings['queen']) / (np.linalg.norm(result) * np.linalg.norm(embeddings['queen'])):.3f}")

word_vector_arithmetic()

Conclusion and Further Learning

Vector spaces are the mathematical foundation that makes modern machine learning possible. Understanding them deeply changes how you think about data, features, and transformations.

Key takeaways: Vector spaces formalize how we manipulate high-dimensional data. Linear independence and basis vectors explain dimensionality reduction. Every matrix operation is a linear transformation between vector spaces. These aren’t abstract concepts—they’re the machinery behind PCA, embeddings, and neural networks.

For deeper study, explore inner product spaces (which add geometric concepts like angles and lengths), normed spaces (which formalize distance), and Hilbert spaces (infinite-dimensional spaces used in kernel methods). Gilbert Strang’s “Linear Algebra and Its Applications” and “Linear Algebra and Learning from Data” are excellent resources.

The bridge from vector spaces to practical ML is short. Once you see your data as living in vector spaces and your models as learning transformations between them, you’ll build better intuition for why algorithms work, when they’ll fail, and how to fix them.