How to Perform Matrix Multiplication in NumPy

Matrix multiplication is fundamental to nearly every computationally intensive domain. Machine learning models rely on it for forward propagation, computer graphics use it for transformations, and...

Key Insights

  • NumPy offers three equivalent ways to perform matrix multiplication: np.dot(), np.matmul(), and the @ operator—use @ for readability in modern Python code.
  • The most common matrix multiplication error is shape mismatch; remember the rule (m×n) @ (n×p) → (m×p), where inner dimensions must match.
  • Never confuse * (element-wise multiplication) with @ (matrix multiplication)—they produce completely different results and this mistake silently breaks calculations.

Introduction to Matrix Multiplication in NumPy

Matrix multiplication is fundamental to nearly every computationally intensive domain. Machine learning models rely on it for forward propagation, computer graphics use it for transformations, and scientific simulations depend on it for solving systems of equations. If you’re working in any of these fields with Python, you’re using NumPy.

NumPy isn’t just convenient—it’s fast. Under the hood, it delegates matrix operations to optimized BLAS libraries written in C and Fortran. A matrix multiplication that takes seconds in pure Python completes in milliseconds with NumPy. This article covers everything you need to perform matrix multiplication correctly and efficiently.

Setting Up: Creating Matrices in NumPy

Before multiplying matrices, you need to create them. NumPy provides several factory functions for this purpose.

import numpy as np

# Create a matrix from a nested list
A = np.array([[1, 2, 3],
              [4, 5, 6]])

# Create a 3x2 matrix of zeros
zeros = np.zeros((3, 2))

# Create a 2x2 matrix of ones
ones = np.ones((2, 2))

# Create a 3x3 identity matrix
identity = np.eye(3)

# Create a matrix with random values (useful for testing)
random_matrix = np.random.randn(4, 3)

print(f"A shape: {A.shape}")          # (2, 3)
print(f"zeros shape: {zeros.shape}")  # (3, 2)
print(f"random shape: {random_matrix.shape}")  # (4, 3)

The .shape attribute is your best friend for debugging. Always check shapes before multiplication—it saves hours of frustration.

For specific use cases, consider these additional creation methods:

# Create a matrix with a specific range
sequential = np.arange(12).reshape(3, 4)
print(sequential)
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

# Create a matrix with evenly spaced values
linspace_matrix = np.linspace(0, 1, 9).reshape(3, 3)
print(linspace_matrix)
# [[0.    0.125 0.25 ]
#  [0.375 0.5   0.625]
#  [0.75  0.875 1.   ]]

The Three Main Methods for Matrix Multiplication

NumPy gives you three ways to multiply matrices. All three produce identical results for 2D arrays, so the choice comes down to readability and context.

import numpy as np

# Define two matrices
A = np.array([[1, 2],
              [3, 4]])

B = np.array([[5, 6],
              [7, 8]])

# Method 1: np.dot()
result_dot = np.dot(A, B)

# Method 2: np.matmul()
result_matmul = np.matmul(A, B)

# Method 3: @ operator (Python 3.5+)
result_at = A @ B

# Verify all methods produce the same result
print("np.dot():\n", result_dot)
print("\nnp.matmul():\n", result_matmul)
print("\n@ operator:\n", result_at)

# All output:
# [[19 22]
#  [43 50]]

# Confirm equality
print("\nAll equal:", np.array_equal(result_dot, result_matmul) 
                      and np.array_equal(result_matmul, result_at))
# All equal: True

Which should you use? For 2D matrices, use the @ operator. It’s the most readable and was added to Python specifically for this purpose. Reserve np.dot() for when you explicitly want dot product semantics (especially with 1D arrays), and use np.matmul() when working with stacks of matrices (3D+ arrays) where its broadcasting behavior differs from np.dot().

# @ operator reads naturally in mathematical expressions
weights = np.random.randn(10, 5)
inputs = np.random.randn(5, 1)
bias = np.random.randn(10, 1)

# Clean, readable neural network layer computation
output = weights @ inputs + bias

Understanding Shape Requirements and Broadcasting

Matrix multiplication has strict dimensional requirements. For A @ B to work, the number of columns in A must equal the number of rows in B.

import numpy as np

# Valid multiplication: (2, 3) @ (3, 4) → (2, 4)
A = np.array([[1, 2, 3],
              [4, 5, 6]])  # Shape: (2, 3)

B = np.array([[1, 2, 3, 4],
              [5, 6, 7, 8],
              [9, 10, 11, 12]])  # Shape: (3, 4)

C = A @ B
print(f"A shape: {A.shape}")  # (2, 3)
print(f"B shape: {B.shape}")  # (3, 4)
print(f"C shape: {C.shape}")  # (2, 4)
print(C)
# [[ 38  44  50  56]
#  [ 83  98 113 128]]

When shapes don’t align, NumPy raises a clear error:

# Invalid multiplication: (2, 3) @ (2, 4) - inner dimensions don't match
A = np.array([[1, 2, 3],
              [4, 5, 6]])  # Shape: (2, 3)

D = np.array([[1, 2, 3, 4],
              [5, 6, 7, 8]])  # Shape: (2, 4)

try:
    result = A @ D
except ValueError as e:
    print(f"Error: {e}")
# Error: matmul: Input operand 1 has a mismatch in its core dimension 0,
# with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 3)

Debugging shape mismatches: When you hit this error, print both shapes and verify the inner dimensions match.

def safe_matmul(A, B):
    """Matrix multiplication with helpful error messages."""
    if A.shape[1] != B.shape[0]:
        raise ValueError(
            f"Cannot multiply matrices with shapes {A.shape} and {B.shape}. "
            f"Inner dimensions {A.shape[1]} and {B.shape[0]} must match."
        )
    return A @ B

# This gives you a clearer error message
# safe_matmul(A, D)

For 1D arrays, NumPy promotes them automatically:

# 1D array behavior
vector = np.array([1, 2, 3])
matrix = np.array([[1, 2], [3, 4], [5, 6]])

# (3,) @ (3, 2) → (2,) - vector treated as row vector
result = vector @ matrix
print(f"Result shape: {result.shape}")  # (2,)
print(result)  # [22 28]

Element-wise vs Matrix Multiplication

This is where beginners get burned. The * operator performs element-wise multiplication, not matrix multiplication. These operations are fundamentally different.

import numpy as np

A = np.array([[1, 2],
              [3, 4]])

B = np.array([[5, 6],
              [7, 8]])

# Element-wise multiplication (Hadamard product)
elementwise = A * B
print("Element-wise (A * B):")
print(elementwise)
# [[ 5 12]
#  [21 32]]

# Matrix multiplication
matmul = A @ B
print("\nMatrix multiplication (A @ B):")
print(matmul)
# [[19 22]
#  [43 50]]

# Completely different results!
print("\nAre they equal?", np.array_equal(elementwise, matmul))
# Are they equal? False

Element-wise multiplication requires identical shapes (or broadcastable shapes). Matrix multiplication requires compatible inner dimensions. Know which one you need.

# Common mistake in neural networks
weights = np.array([[0.1, 0.2],
                    [0.3, 0.4]])
inputs = np.array([[1],
                   [2]])

# WRONG: element-wise won't even work here
# result = weights * inputs  # This broadcasts, giving wrong result!

# RIGHT: matrix multiplication
result = weights @ inputs
print(result)
# [[0.5]
#  [1.1]]

Performance Tips and Best Practices

For production code, small optimizations compound into significant gains.

Choose appropriate data types. float32 uses half the memory of float64 and can be twice as fast for large matrices. Use float64 only when you need the precision.

import numpy as np
import time

# Create large matrices
size = 1000

# float64 (default)
A_64 = np.random.randn(size, size)
B_64 = np.random.randn(size, size)

# float32
A_32 = A_64.astype(np.float32)
B_32 = B_64.astype(np.float32)

# Timing comparison
def benchmark(A, B, iterations=10):
    start = time.perf_counter()
    for _ in range(iterations):
        _ = A @ B
    return (time.perf_counter() - start) / iterations

time_64 = benchmark(A_64, B_64)
time_32 = benchmark(A_32, B_32)

print(f"float64: {time_64*1000:.2f} ms")
print(f"float32: {time_32*1000:.2f} ms")
print(f"Speedup: {time_64/time_32:.2f}x")
# Results vary by hardware, but float32 is typically 1.5-2x faster

Use np.einsum() for complex operations. When you need to combine multiple operations or work with higher-dimensional tensors, einsum expresses the computation clearly and often runs faster than chained operations.

import numpy as np

# Batch matrix multiplication: multiply each matrix in a batch
batch_size = 100
A = np.random.randn(batch_size, 4, 3)
B = np.random.randn(batch_size, 3, 5)

# Using einsum for batch matmul
result_einsum = np.einsum('bij,bjk->bik', A, B)
print(f"Batch result shape: {result_einsum.shape}")  # (100, 4, 5)

# Equivalent using matmul (also handles batches)
result_matmul = np.matmul(A, B)
print(f"Results equal: {np.allclose(result_einsum, result_matmul)}")

Avoid unnecessary copies. Use @= for in-place operations when possible, and be aware that slicing creates views, not copies.

# Pre-allocate output array for repeated operations
output = np.empty((100, 100))
np.matmul(A_64[:100, :100], B_64[:100, :100], out=output)

Conclusion

Matrix multiplication in NumPy is straightforward once you understand the fundamentals. Here’s your quick reference:

Method Use Case
A @ B Default choice for 2D matrices
np.matmul(A, B) Explicit function call, handles batches
np.dot(A, B) Dot products, legacy code
A * B Element-wise multiplication (not matrix multiplication!)

Remember the shape rule: (m×n) @ (n×p) → (m×p). When in doubt, print .shape.

For further reading, consult the NumPy documentation on matmul and the array broadcasting rules.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.