How to Calculate the Mean in NumPy

Key Insights

NumPy’s np.mean() is 10-100x faster than Python’s built-in statistics.mean() for large arrays, and the axis parameter lets you compute means across specific dimensions without loops.
Always use np.nanmean() when your data might contain missing values—standard np.mean() will silently return NaN and corrupt your entire calculation.
For weighted averages (like calculating GPA or portfolio returns), use np.average() with the weights parameter instead of manually implementing the calculation.

Introduction to NumPy Mean Calculations

Calculating the mean seems trivial until you’re working with millions of data points, multidimensional arrays, or datasets riddled with missing values. Python’s built-in statistics.mean() works fine for small lists, but it falls apart at scale.

NumPy solves this with vectorized operations that run at C speed. You get two ways to calculate means: the np.mean() function and the ndarray.mean() method. They’re functionally identical—use whichever reads better in your code.

import numpy as np

data = np.array([1, 2, 3, 4, 5])

# Both produce the same result
result1 = np.mean(data)      # Function syntax
result2 = data.mean()        # Method syntax

print(result1, result2)  # 3.0 3.0

The function syntax is more flexible when you’re working with array-like inputs that aren’t already NumPy arrays. The method syntax is cleaner when chaining operations.

Basic Mean Calculation with np.mean()

The basic syntax is straightforward: pass an array, get a mean. NumPy handles the type conversion and returns a scalar.

import numpy as np

# From a Python list
prices = [29.99, 34.50, 22.00, 45.99, 31.25]
avg_price = np.mean(prices)
print(f"Average price: ${avg_price:.2f}")  # Average price: $32.75

# From a NumPy array
temperatures = np.array([72.1, 68.5, 75.3, 71.8, 69.2, 73.6])
avg_temp = np.mean(temperatures)
print(f"Average temperature: {avg_temp:.1f}°F")  # Average temperature: 71.8°F

# Works with different numeric types
integers = np.array([10, 20, 30, 40, 50], dtype=np.int32)
print(np.mean(integers))  # 30.0 (always returns float by default)

Notice that np.mean() always returns a floating-point result, even when the input is integers. This prevents the truncation errors you’d get with integer division.

Calculating Mean Along Axes

Real-world data is rarely one-dimensional. When you’re working with matrices or higher-dimensional arrays, the axis parameter becomes essential.

Think of axis as “the dimension you want to collapse.” For a 2D array:

axis=0 collapses rows, giving you column means
axis=1 collapses columns, giving you row means
axis=None (default) flattens everything and returns a single value

import numpy as np

# Sales data: 4 quarters (rows) x 3 products (columns)
sales = np.array([
    [150, 200, 175],  # Q1
    [180, 220, 190],  # Q2
    [200, 250, 210],  # Q3
    [170, 230, 185]   # Q4
])

# Mean sales per product (across all quarters)
product_means = np.mean(sales, axis=0)
print(f"Product averages: {product_means}")
# Product averages: [175.  225.  190.]

# Mean sales per quarter (across all products)
quarter_means = np.mean(sales, axis=1)
print(f"Quarterly averages: {quarter_means}")
# Quarterly averages: [175.  196.67  220.  195.]

# Overall mean
overall = np.mean(sales)
print(f"Overall average: {overall:.2f}")
# Overall average: 196.67

For 3D arrays, the same logic extends. If you have a shape of (depth, rows, cols), then axis=0 averages across depth, axis=1 across rows, and axis=2 across columns.

# Monthly data for 2 years, 4 quarters, 3 products
# Shape: (2, 4, 3)
yearly_sales = np.array([
    [[150, 200, 175], [180, 220, 190], [200, 250, 210], [170, 230, 185]],  # Year 1
    [[160, 210, 180], [190, 230, 200], [220, 270, 230], [180, 240, 195]]   # Year 2
])

# Average across years (for each quarter/product combination)
print(np.mean(yearly_sales, axis=0).shape)  # (4, 3)

# Average across quarters (for each year/product combination)
print(np.mean(yearly_sales, axis=1).shape)  # (2, 3)

# Average across products (for each year/quarter combination)
print(np.mean(yearly_sales, axis=2).shape)  # (2, 4)

Handling Missing Data (NaN Values)

Here’s where many developers get burned. If your array contains even a single NaN value, np.mean() returns NaN for the entire calculation:

import numpy as np

# Sensor readings with a missing value
readings = np.array([23.5, 24.1, np.nan, 23.8, 24.3])

# This silently fails
bad_mean = np.mean(readings)
print(f"Standard mean: {bad_mean}")  # Standard mean: nan

This behavior is technically correct (NaN propagation), but it’s rarely what you want. Use np.nanmean() to ignore NaN values:

import numpy as np

readings = np.array([23.5, 24.1, np.nan, 23.8, 24.3])

# Ignores NaN values
good_mean = np.nanmean(readings)
print(f"Mean ignoring NaN: {good_mean:.2f}")  # Mean ignoring NaN: 23.93

# Works with axis parameter too
data_with_gaps = np.array([
    [1.0, 2.0, np.nan],
    [4.0, np.nan, 6.0],
    [7.0, 8.0, 9.0]
])

print("Column means (ignoring NaN):")
print(np.nanmean(data_with_gaps, axis=0))
# [4.  5.  7.5]

print("Row means (ignoring NaN):")
print(np.nanmean(data_with_gaps, axis=1))
# [1.5  5.   8. ]

Pro tip: Always check for NaN values before deciding which function to use. A quick np.isnan(data).any() tells you if you need nanmean().

Data Types and Precision

The dtype parameter controls the precision of intermediate calculations. This matters more than you might think, especially with large arrays of integers.

import numpy as np

# Large integers can overflow in intermediate calculations
large_ints = np.array([2_000_000_000, 2_000_000_000, 2_000_000_000], dtype=np.int32)

# Default behavior uses the array's dtype for accumulation
print(f"Default mean: {np.mean(large_ints)}")  # May show incorrect result

# Force 64-bit float for safe accumulation
print(f"Safe mean: {np.mean(large_ints, dtype=np.float64)}")  # 2000000000.0

For financial calculations where precision matters, explicitly specify the dtype:

import numpy as np

# Currency values that need precision
transactions = np.array([0.1, 0.2, 0.3, 0.1, 0.1, 0.2])

# Standard float64 is usually sufficient
mean_transaction = np.mean(transactions, dtype=np.float64)
print(f"Mean transaction: ${mean_transaction:.10f}")
# Mean transaction: $0.1666666667

When working with very large arrays, you can save memory by using float32, but be aware of the precision tradeoff:

import numpy as np

data = np.random.randn(1_000_000)

mean_64 = np.mean(data, dtype=np.float64)
mean_32 = np.mean(data, dtype=np.float32)

print(f"Float64: {mean_64:.10f}")
print(f"Float32: {mean_32:.10f}")
print(f"Difference: {abs(mean_64 - mean_32):.2e}")

Weighted Mean with np.average()

np.mean() treats all values equally. When values have different importance—like calculating GPA, portfolio returns, or weighted survey responses—use np.average():

import numpy as np

# GPA calculation: grades with credit hours as weights
grades = np.array([4.0, 3.7, 3.3, 4.0, 3.0])  # A, A-, B+, A, B
credits = np.array([3, 4, 3, 2, 4])            # Credit hours

# Unweighted mean (wrong for GPA)
simple_mean = np.mean(grades)
print(f"Simple mean: {simple_mean:.2f}")  # 3.60

# Weighted mean (correct GPA)
gpa = np.average(grades, weights=credits)
print(f"Weighted GPA: {gpa:.2f}")  # 3.54

np.average() also works with axes for multidimensional data:

import numpy as np

# Quarterly returns for 3 stocks
returns = np.array([
    [0.05, 0.08, 0.03, 0.06],  # Stock A
    [0.12, -0.02, 0.15, 0.08], # Stock B
    [0.03, 0.04, 0.02, 0.05]   # Stock C
])

# Portfolio weights
portfolio_weights = np.array([0.5, 0.3, 0.2])  # 50%, 30%, 20%

# Weighted average return per quarter
quarterly_portfolio_returns = np.average(returns, axis=0, weights=portfolio_weights)
print(f"Quarterly portfolio returns: {quarterly_portfolio_returns}")
# [0.067  0.046  0.064  0.064]

Performance Tips and Best Practices

Use keepdims for broadcasting compatibility. When you need to subtract the mean from your data (centering), keepdims=True preserves the array’s dimensionality:

import numpy as np

data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Without keepdims, this requires reshaping
mean_no_keepdims = np.mean(data, axis=1)
print(mean_no_keepdims.shape)  # (3,)

# With keepdims, broadcasting works directly
mean_keepdims = np.mean(data, axis=1, keepdims=True)
print(mean_keepdims.shape)  # (3, 1)

# Center the data (subtract row means)
centered = data - mean_keepdims
print(centered)
# [[-1.  0.  1.]
#  [-1.  0.  1.]
#  [-1.  0.  1.]]

Choose function vs method syntax based on context. Use np.mean(arr) when the input might be a list or when you want explicit clarity. Use arr.mean() when chaining operations or when the array is already clearly defined.

Pre-allocate output arrays for repeated calculations. If you’re computing means in a loop (which you should avoid when possible), use the out parameter:

import numpy as np

result = np.empty(10)
for i in range(10):
    chunk = np.random.randn(1000)
    np.mean(chunk, out=result[i:i+1])

The bottom line: NumPy’s mean functions are fast, flexible, and battle-tested. Use np.mean() for simple cases, np.nanmean() when missing data is possible, and np.average() when weights matter. Master the axis parameter, and you’ll never write a loop to calculate means again.