NumPy - Matrix Multiplication (np.dot, np.matmul, @)
NumPy provides multiple ways to multiply arrays, but they're not interchangeable. The element-wise multiplication operator `*` performs element-by-element multiplication, while `np.dot()`,...
Key Insights
- NumPy offers three distinct approaches for matrix multiplication:
np.dot(),np.matmul(), and the@operator, each with specific behaviors for different array dimensions - The
@operator (introduced in Python 3.5) is the recommended approach for matrix multiplication as it provides cleaner syntax and consistent semantics aligned withnp.matmul() - Understanding the dimensional behavior differences between these methods is critical—
np.dot()treats 2D arrays as matrices but has inconsistent behavior with higher dimensions, whilenp.matmul()and@enforce stricter matrix multiplication rules
Core Differences Between Multiplication Methods
NumPy provides multiple ways to multiply arrays, but they’re not interchangeable. The element-wise multiplication operator * performs element-by-element multiplication, while np.dot(), np.matmul(), and @ perform mathematical matrix multiplication following linear algebra rules.
import numpy as np
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
# Element-wise multiplication
element_wise = a * b
print("Element-wise:\n", element_wise)
# [[5, 12], [21, 32]]
# Matrix multiplication
matrix_mult = a @ b
print("Matrix multiplication:\n", matrix_mult)
# [[19, 22], [43, 50]]
The key distinction: element-wise multiplication requires arrays of the same shape, while matrix multiplication follows the rule that an (m×n) matrix can multiply an (n×p) matrix to produce an (m×p) result.
np.dot() Behavior and Limitations
The np.dot() function has been part of NumPy since its inception, but its behavior varies significantly based on input dimensions.
# 1D arrays: dot product
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = np.dot(a, b)
print("1D dot product:", result) # 32
# 2D arrays: matrix multiplication
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
result = np.dot(a, b)
print("2D matrix mult:\n", result)
# [[19, 22], [43, 50]]
# Mixed dimensions: varies
a = np.array([[1, 2], [3, 4]]) # 2D
b = np.array([1, 2]) # 1D
result = np.dot(a, b)
print("2D @ 1D:", result) # [5, 11]
# Note: np.dot treats 1D arrays as either row or column vectors
# depending on position
For N-dimensional arrays (N > 2), np.dot() performs a sum product over the last axis of the first array and the second-to-last axis of the second array, which often produces unexpected results:
# 3D arrays with np.dot
a = np.random.rand(2, 3, 4)
b = np.random.rand(2, 4, 5)
result = np.dot(a, b)
print("Shape with np.dot:", result.shape) # (2, 3, 2, 5)
# Not what you typically want for batch matrix multiplication
np.matmul() and @ Operator
The np.matmul() function and @ operator (which calls np.matmul() internally) were introduced to provide more consistent and mathematically correct matrix multiplication semantics.
# Basic 2D matrix multiplication
a = np.array([[1, 2, 3], [4, 5, 6]]) # 2x3
b = np.array([[7, 8], [9, 10], [11, 12]]) # 3x2
result = a @ b
print("Result shape:", result.shape) # (2, 2)
print(result)
# [[58, 64], [139, 154]]
# Equivalent to:
result_matmul = np.matmul(a, b)
assert np.array_equal(result, result_matmul)
The critical advantage appears with broadcasting and batch operations:
# Stack of matrices (batch multiplication)
a = np.random.rand(10, 3, 4) # 10 matrices of 3x4
b = np.random.rand(10, 4, 5) # 10 matrices of 4x5
result = a @ b
print("Batch result shape:", result.shape) # (10, 3, 5)
# Each corresponding pair of matrices is multiplied
# Broadcasting with different batch sizes
a = np.random.rand(5, 3, 4) # 5 matrices
b = np.random.rand(4, 2) # Single matrix
result = a @ b
print("Broadcast result:", result.shape) # (5, 3, 2)
# The single matrix b is multiplied with each of the 5 matrices in a
Handling 1D Arrays
One significant difference between these methods is how they handle 1D arrays:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# All three produce the same result for 1D arrays
print("np.dot:", np.dot(a, b)) # 32
print("np.matmul:", np.matmul(a, b)) # 32
print("@ operator:", a @ b) # 32
# But behavior differs with mixed dimensions
matrix = np.array([[1, 2, 3], [4, 5, 6]])
vector = np.array([1, 2, 3])
# np.dot allows vector on either side
result1 = np.dot(matrix, vector)
print("matrix @ vector:", result1) # [14, 32]
result2 = np.dot(vector, matrix)
print("vector @ matrix:", result2) # [9, 12, 15]
# np.matmul and @ are more strict
result3 = matrix @ vector
print("matrix @ vector:", result3) # [14, 32]
result4 = vector @ matrix
print("vector @ matrix:", result4) # [9, 12, 15]
The key rule: np.matmul() and @ treat 1D arrays as having shape (n,) and promote them to (1, n) or (n, 1) as needed for valid matrix multiplication.
Scalar Multiplication Restrictions
An important limitation of np.matmul() and @:
a = np.array([[1, 2], [3, 4]])
scalar = 5
# np.dot allows scalar multiplication
result = np.dot(a, scalar)
print("np.dot with scalar:\n", result)
# [[5, 10], [15, 20]]
# np.matmul and @ do NOT allow scalar multiplication
try:
result = a @ scalar
except ValueError as e:
print("Error:", e)
# matmul: Input operand does not have enough dimensions
# Use standard multiplication for scalars
result = a * scalar
print("Correct scalar mult:\n", result)
Performance Considerations
For large matrices, all three methods use optimized BLAS libraries underneath, but there are subtle differences:
import time
# Large matrices
size = 1000
a = np.random.rand(size, size)
b = np.random.rand(size, size)
# Benchmark np.dot
start = time.perf_counter()
for _ in range(10):
result = np.dot(a, b)
dot_time = time.perf_counter() - start
# Benchmark @
start = time.perf_counter()
for _ in range(10):
result = a @ b
matmul_time = time.perf_counter() - start
print(f"np.dot: {dot_time:.4f}s")
print(f"@ operator: {matmul_time:.4f}s")
# Performance is virtually identical
Practical Application: Neural Network Layer
Here’s a realistic example showing why the @ operator is preferred:
class DenseLayer:
def __init__(self, input_size, output_size):
self.weights = np.random.randn(input_size, output_size) * 0.01
self.bias = np.zeros((1, output_size))
def forward(self, X):
"""
X: batch of inputs (batch_size, input_size)
Returns: (batch_size, output_size)
"""
# Clean, readable matrix multiplication
return X @ self.weights + self.bias
# Usage
layer = DenseLayer(input_size=784, output_size=128)
batch = np.random.randn(32, 784) # 32 samples
output = layer.forward(batch)
print("Output shape:", output.shape) # (32, 128)
# Stacking multiple layers
layer2 = DenseLayer(input_size=128, output_size=10)
final_output = layer2.forward(output)
print("Final shape:", final_output.shape) # (32, 10)
Recommendations
Use the @ operator for all matrix multiplication operations. It provides the clearest syntax, enforces proper mathematical semantics, and handles batch operations correctly. Reserve np.dot() only when maintaining legacy code or when you specifically need its scalar multiplication behavior.
For element-wise operations, use *. For matrix multiplication, use @. This distinction makes code intent immediately clear to readers and prevents subtle bugs from mismatched operations.