Linear Algebra: Vector Spaces Explained
Vector spaces are the backbone of modern data science and machine learning. While the formal definition might seem abstract, every time you work with a dataset, apply a transformation, or train a...
Key Insights
- Vector spaces provide the mathematical foundation for machine learning algorithms, from linear regression to neural networks, by formalizing how we manipulate high-dimensional data
- Understanding subspaces and linear independence is critical for dimensionality reduction techniques like PCA, which project data onto lower-dimensional subspaces while preserving variance
- Every matrix operation in data science—from feature transformations to embedding layers—represents a linear transformation between vector spaces
Introduction to Vector Spaces
Vector spaces are the backbone of modern data science and machine learning. While the formal definition might seem abstract, every time you work with a dataset, apply a transformation, or train a neural network, you’re operating within vector spaces.
A vector space V over a field F (typically real numbers ℝ) is a set equipped with two operations: vector addition and scalar multiplication. These operations must satisfy ten specific axioms that we’ll explore shortly. Think of a vector space as a playground where vectors can be added together and scaled, with predictable, consistent rules.
Why does this matter? In machine learning, your data lives in vector spaces. A grayscale image is a vector in ℝ^(width×height). Word embeddings like Word2Vec represent words as vectors in ℝ^300 or similar. When you perform PCA, you’re finding a new basis for your vector space that captures maximum variance. Understanding vector spaces isn’t academic—it’s fundamental to reasoning about your data.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# Create simple 2D and 3D vectors
v1_2d = np.array([2, 3])
v2_2d = np.array([1, -1])
v1_3d = np.array([1, 2, 3])
v2_3d = np.array([4, -1, 2])
# Vector addition and scalar multiplication
v_sum = v1_2d + v2_2d
v_scaled = 2.5 * v1_2d
print(f"2D Vector 1: {v1_2d}")
print(f"2D Vector 2: {v2_2d}")
print(f"Sum: {v_sum}")
print(f"Scaled v1 by 2.5: {v_scaled}")
Core Properties and Axioms
Vector spaces must satisfy ten axioms. These aren’t arbitrary rules—they ensure vector spaces behave consistently and predictably. Let me break down the key ones:
Closure axioms:
- Adding two vectors yields another vector in the space
- Multiplying a vector by a scalar yields another vector in the space
Associativity and commutativity:
- (u + v) + w = u + (v + w)
- u + v = v + u
Identity and inverse elements:
- There exists a zero vector 0 such that v + 0 = v
- For every vector v, there exists -v such that v + (-v) = 0
Distributivity:
- a(u + v) = au + av
- (a + b)v = av + bv
These axioms might seem obvious for familiar spaces like ℝ^n, but they’re powerful because they apply to many non-obvious spaces too.
def verify_vector_space_properties(vectors, scalars):
"""Demonstrate key vector space axioms"""
v1, v2, v3 = vectors[:3]
a, b = scalars[:2]
# Closure under addition
print("Closure under addition:")
print(f"v1 + v2 = {v1 + v2}")
# Commutativity
print(f"\nCommutativity: v1 + v2 == v2 + v1")
print(np.allclose(v1 + v2, v2 + v1))
# Associativity
print(f"\nAssociativity: (v1 + v2) + v3 == v1 + (v2 + v3)")
print(np.allclose((v1 + v2) + v3, v1 + (v2 + v3)))
# Zero vector
zero = np.zeros_like(v1)
print(f"\nZero vector: v1 + 0 == v1")
print(np.allclose(v1 + zero, v1))
# Additive inverse
print(f"\nAdditive inverse: v1 + (-v1) == 0")
print(np.allclose(v1 + (-v1), zero))
# Distributivity
print(f"\nDistributivity: a(v1 + v2) == av1 + av2")
print(np.allclose(a * (v1 + v2), a * v1 + a * v2))
# Test with real vectors
vectors = [np.array([1, 2, 3]), np.array([4, 5, 6]), np.array([7, 8, 9])]
scalars = [2.0, 3.0]
verify_vector_space_properties(vectors, scalars)
Subspaces, Span, and Linear Independence
A subspace is a subset of a vector space that is itself a vector space under the same operations. The most important subspaces are those generated by spanning sets.
The span of a set of vectors is all possible linear combinations of those vectors. If vectors v₁, v₂, …, vₙ span a space V, then every vector in V can be written as a₁v₁ + a₂v₂ + … + aₙvₙ for some scalars aᵢ.
Vectors are linearly independent if no vector can be written as a linear combination of the others. A basis is a linearly independent spanning set—the minimal set of vectors needed to span the space. The number of vectors in a basis is the dimension of the space.
This is crucial for machine learning. When you have highly correlated features, they’re linearly dependent—you can drop some without losing information. PCA finds a new basis of linearly independent directions (principal components) ordered by importance.
def check_linear_independence(vectors):
"""Check if vectors are linearly independent using matrix rank"""
# Stack vectors as columns
matrix = np.column_stack(vectors)
rank = np.linalg.matrix_rank(matrix)
n_vectors = len(vectors)
print(f"Number of vectors: {n_vectors}")
print(f"Matrix rank: {rank}")
print(f"Linearly independent: {rank == n_vectors}")
return rank == n_vectors
# Example: linearly independent vectors
v1 = np.array([1, 0, 0])
v2 = np.array([0, 1, 0])
v3 = np.array([0, 0, 1])
print("Testing standard basis vectors:")
check_linear_independence([v1, v2, v3])
# Example: linearly dependent vectors
v1 = np.array([1, 2, 3])
v2 = np.array([2, 4, 6]) # v2 = 2*v1
v3 = np.array([1, 1, 1])
print("\nTesting dependent vectors:")
check_linear_independence([v1, v2, v3])
# Visualize span in 2D
def visualize_span_2d(v1, v2):
"""Visualize the span of two 2D vectors"""
fig, ax = plt.subplots(figsize=(8, 8))
# Plot original vectors
ax.quiver(0, 0, v1[0], v1[1], angles='xy', scale_units='xy',
scale=1, color='r', width=0.006, label='v1')
ax.quiver(0, 0, v2[0], v2[1], angles='xy', scale_units='xy',
scale=1, color='b', width=0.006, label='v2')
# Plot linear combinations
for a in np.linspace(-2, 2, 20):
for b in np.linspace(-2, 2, 20):
combo = a * v1 + b * v2
ax.plot(combo[0], combo[1], 'k.', markersize=1, alpha=0.3)
ax.set_xlim(-10, 10)
ax.set_ylim(-10, 10)
ax.grid(True)
ax.legend()
ax.set_title('Span of two vectors')
plt.savefig('span_visualization.png', dpi=150, bbox_inches='tight')
v1 = np.array([2, 1])
v2 = np.array([1, 3])
visualize_span_2d(v1, v2)
Common Vector Spaces in Practice
ℝⁿ (Euclidean spaces): The most familiar vector spaces. ℝ² is the 2D plane, ℝ³ is 3D space, and ℝⁿ extends to n dimensions. Your dataset with n features lives in ℝⁿ.
Function spaces: Functions can be vectors! The space of continuous functions on [0,1] is a vector space where you add functions pointwise: (f + g)(x) = f(x) + g(x). This perspective is fundamental in functional analysis and kernel methods.
Matrix spaces: The set of all m×n matrices forms a vector space of dimension m×n. This is relevant when working with convolutional filters or attention matrices.
Polynomial spaces: The space of polynomials of degree ≤ n forms an (n+1)-dimensional vector space. The polynomials 1, x, x², …, xⁿ form a basis.
from sympy import symbols, expand, Poly
# Polynomial space example
x = symbols('x')
# Define polynomials (vectors in polynomial space)
p1 = 2*x**2 + 3*x + 1
p2 = x**2 - x + 4
p3 = -x**2 + 2*x - 1
# Vector addition in polynomial space
p_sum = expand(p1 + p2)
print(f"p1 + p2 = {p_sum}")
# Scalar multiplication
p_scaled = expand(3 * p1)
print(f"3 * p1 = {p_scaled}")
# Check linear independence of polynomials
def poly_to_vector(poly, degree):
"""Convert polynomial to coefficient vector"""
coeffs = Poly(poly, x).all_coeffs()
# Pad with zeros if needed
return np.array(coeffs + [0] * (degree + 1 - len(coeffs)))
vectors = [poly_to_vector(p, 2) for p in [p1, p2, p3]]
print("\nPolynomial coefficient vectors:")
for v in vectors:
print(v)
Linear Transformations and Matrices
Every matrix represents a linear transformation between vector spaces. A linear transformation T: V → W satisfies:
- T(u + v) = T(u) + T(v)
- T(αv) = αT(v)
The kernel (null space) is the set of vectors that map to zero: {v : T(v) = 0}. The image (column space) is the set of all possible outputs: {T(v) : v ∈ V}.
These concepts are everywhere in machine learning. The kernel of your feature matrix tells you about redundant features. The image tells you what outputs are achievable.
def analyze_linear_transformation(A):
"""Analyze kernel and image of a linear transformation"""
print("Transformation matrix A:")
print(A)
# Compute kernel (null space)
from scipy.linalg import null_space
kernel = null_space(A)
print(f"\nKernel dimension: {kernel.shape[1]}")
print(f"Kernel basis:\n{kernel}")
# Compute image (column space) via rank
rank = np.linalg.matrix_rank(A)
print(f"\nImage dimension (rank): {rank}")
return kernel, rank
# Example: projection matrix onto xy-plane in 3D
A = np.array([
[1, 0, 0],
[0, 1, 0],
[0, 0, 0]
])
kernel, rank = analyze_linear_transformation(A)
# Visualize transformation
def visualize_transformation(A):
"""Visualize how a 2D transformation affects unit vectors"""
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
# Original space
unit_circle = np.array([[np.cos(t), np.sin(t)]
for t in np.linspace(0, 2*np.pi, 100)]).T
ax1.plot(unit_circle[0], unit_circle[1], 'b-', linewidth=2)
ax1.set_aspect('equal')
ax1.grid(True)
ax1.set_title('Original Space')
# Transformed space
transformed = A @ unit_circle
ax2.plot(transformed[0], transformed[1], 'r-', linewidth=2)
ax2.set_aspect('equal')
ax2.grid(True)
ax2.set_title('Transformed Space')
plt.savefig('transformation_visualization.png', dpi=150, bbox_inches='tight')
# 2D transformation example
A_2d = np.array([[2, 1], [0, 1]])
visualize_transformation(A_2d)
Applications in Machine Learning
Feature spaces: Every ML model operates on a feature space. Linear regression finds the best hyperplane in your feature space. Neural networks learn non-linear transformations between vector spaces.
PCA: Principal Component Analysis finds a new orthonormal basis for your data’s vector space, ordered by variance. The first k principal components span a k-dimensional subspace that best approximates your data.
Word embeddings: Word2Vec and similar models represent words as vectors in ℝᵈ where semantic relationships become geometric. The famous example: king - man + woman ≈ queen is vector arithmetic in this space.
def simple_pca(X, n_components):
"""Implement PCA from scratch"""
# Center the data
X_centered = X - np.mean(X, axis=0)
# Compute covariance matrix
cov_matrix = np.cov(X_centered.T)
# Compute eigenvectors and eigenvalues
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
# Sort by eigenvalues
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]
# Select top k eigenvectors (new basis)
principal_components = eigenvectors[:, :n_components]
# Project data onto new basis
X_pca = X_centered @ principal_components
# Explained variance ratio
explained_variance = eigenvalues[:n_components] / np.sum(eigenvalues)
return X_pca, principal_components, explained_variance
# Generate sample data
np.random.seed(42)
X = np.random.randn(100, 5) # 100 samples, 5 features
# Apply PCA
X_pca, components, var_explained = simple_pca(X, n_components=2)
print(f"Original shape: {X.shape}")
print(f"Reduced shape: {X_pca.shape}")
print(f"Explained variance: {var_explained}")
print(f"Total variance explained: {np.sum(var_explained):.2%}")
# Word embedding vector arithmetic simulation
def word_vector_arithmetic():
"""Demonstrate word embedding arithmetic"""
# Simplified word vectors (in practice, these are learned)
embeddings = {
'king': np.array([0.5, 0.9, 0.1]),
'queen': np.array([0.5, 0.9, -0.1]),
'man': np.array([0.3, 0.2, 0.8]),
'woman': np.array([0.3, 0.2, -0.8])
}
# Vector arithmetic: king - man + woman ≈ queen
result = embeddings['king'] - embeddings['man'] + embeddings['woman']
print("\nWord embedding arithmetic:")
print(f"king - man + woman = {result}")
print(f"queen = {embeddings['queen']}")
print(f"Cosine similarity: {np.dot(result, embeddings['queen']) / (np.linalg.norm(result) * np.linalg.norm(embeddings['queen'])):.3f}")
word_vector_arithmetic()
Conclusion and Further Learning
Vector spaces are the mathematical foundation that makes modern machine learning possible. Understanding them deeply changes how you think about data, features, and transformations.
Key takeaways: Vector spaces formalize how we manipulate high-dimensional data. Linear independence and basis vectors explain dimensionality reduction. Every matrix operation is a linear transformation between vector spaces. These aren’t abstract concepts—they’re the machinery behind PCA, embeddings, and neural networks.
For deeper study, explore inner product spaces (which add geometric concepts like angles and lengths), normed spaces (which formalize distance), and Hilbert spaces (infinite-dimensional spaces used in kernel methods). Gilbert Strang’s “Linear Algebra and Its Applications” and “Linear Algebra and Learning from Data” are excellent resources.
The bridge from vector spaces to practical ML is short. Once you see your data as living in vector spaces and your models as learning transformations between them, you’ll build better intuition for why algorithms work, when they’ll fail, and how to fix them.