How to Calculate the Cumulative Sum in NumPy

Cumulative sum—also called a running total or prefix sum—is one of those operations that appears everywhere once you start looking for it. You're calculating the cumulative sum when you track a bank...

Key Insights

  • NumPy’s np.cumsum() computes running totals efficiently, with the axis parameter controlling whether you sum across rows, columns, or the entire flattened array.
  • Always specify dtype when working with large integers or many elements to prevent silent overflow errors that corrupt your results.
  • The out parameter enables memory-efficient in-place computation, which matters when processing large datasets or running cumulative sums repeatedly.

Introduction

Cumulative sum—also called a running total or prefix sum—is one of those operations that appears everywhere once you start looking for it. You’re calculating the cumulative sum when you track a bank account balance over time, compute a cumulative distribution function, or measure total distance traveled from a series of incremental movements.

The concept is straightforward: given a sequence [a, b, c, d], the cumulative sum produces [a, a+b, a+b+c, a+b+c+d]. Each element in the output represents the sum of all elements up to and including that position.

NumPy provides np.cumsum() for this operation, and while the basic usage is simple, there are nuances around multi-dimensional arrays, data types, and performance that trip up even experienced developers. This article covers what you need to use cumulative sums effectively in production code.

Using np.cumsum() Basics

The np.cumsum() function takes an array-like input and returns an array of the same shape containing cumulative sums. For one-dimensional arrays, the behavior is exactly what you’d expect:

import numpy as np

# Basic cumulative sum on integers
values = np.array([1, 2, 3, 4, 5])
running_total = np.cumsum(values)

print(running_total)
# Output: [ 1  3  6 10 15]

Each output element is the sum of all input elements from the start up to that index. The first element stays the same (it’s the sum of just itself), the second is 1 + 2 = 3, the third is 1 + 2 + 3 = 6, and so on.

This works with any numeric type:

# Works with floats
measurements = np.array([0.5, 1.2, 0.8, 2.1])
cumulative = np.cumsum(measurements)

print(cumulative)
# Output: [0.5 1.7 2.5 4.6]

# Works with negative numbers
changes = np.array([10, -3, 5, -8, 2])
balance = np.cumsum(changes)

print(balance)
# Output: [10  7 12  4  6]

The function signature is np.cumsum(a, axis=None, dtype=None, out=None). We’ll cover each parameter in the following sections.

Cumulative Sum on Multi-Dimensional Arrays

When working with 2D arrays (or higher dimensions), the axis parameter determines how the cumulative sum is computed. This is where developers often get confused, so let’s be explicit.

data = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

print("Original array:")
print(data)

axis=None (default): Flatten and sum

When you don’t specify an axis, NumPy flattens the array and computes the cumulative sum over all elements:

flat_cumsum = np.cumsum(data)
print("axis=None (flattened):")
print(flat_cumsum)
# Output: [ 1  3  6 10 15 21 28 36 45]

The output is 1D with length equal to the total number of elements.

axis=0: Sum down columns

With axis=0, the cumulative sum runs along the first axis (rows), meaning you’re accumulating down each column:

col_cumsum = np.cumsum(data, axis=0)
print("axis=0 (down columns):")
print(col_cumsum)
# Output:
# [[ 1  2  3]
#  [ 5  7  9]
#  [12 15 18]]

The first row stays unchanged. The second row becomes [1+4, 2+5, 3+6]. The third row becomes [1+4+7, 2+5+8, 3+6+9].

axis=1: Sum across rows

With axis=1, the cumulative sum runs along the second axis (columns), accumulating across each row:

row_cumsum = np.cumsum(data, axis=1)
print("axis=1 (across rows):")
print(row_cumsum)
# Output:
# [[ 1  3  6]
#  [ 4  9 15]
#  [ 7 15 24]]

Each row is processed independently. The first column stays unchanged, and subsequent columns accumulate from left to right.

Here’s a practical example—calculating cumulative sales across multiple products over several months:

# Rows = months, Columns = products
monthly_sales = np.array([
    [100, 200, 150],  # January
    [120, 180, 160],  # February
    [140, 220, 140],  # March
])

# Year-to-date sales per product (cumsum down months)
ytd_by_product = np.cumsum(monthly_sales, axis=0)
print("YTD by product:")
print(ytd_by_product)
# Output:
# [[100 200 150]
#  [220 380 310]
#  [360 600 450]]

# Cumulative sales within each month (cumsum across products)
cumulative_within_month = np.cumsum(monthly_sales, axis=1)
print("Cumulative within month:")
print(cumulative_within_month)
# Output:
# [[100 300 450]
#  [120 300 460]
#  [140 360 500]]

Handling Data Types

Here’s a bug that has bitten countless developers: integer overflow. NumPy arrays have fixed data types, and when cumulative sums exceed the type’s maximum value, they wrap around silently.

# Create an array with default int64 on most systems
large_values = np.array([10**18, 10**18, 10**18], dtype=np.int64)

# This will overflow!
result = np.cumsum(large_values)
print(result)
# Output: [1000000000000000000 2000000000000000000 -8446744073709551616]

The third value is garbage—it overflowed. NumPy doesn’t raise an error; it just gives you wrong data.

The fix is to use a larger dtype or switch to floats:

# Fix 1: Use float64 (can represent larger values, with some precision loss)
result_float = np.cumsum(large_values, dtype=np.float64)
print(result_float)
# Output: [1.e+18 2.e+18 3.e+18]

# Fix 2: Use Python objects (unlimited precision, but slower)
result_object = np.cumsum(large_values, dtype=object)
print(result_object)
# Output: [1000000000000000000 2000000000000000000 3000000000000000000]

Even with smaller integers, overflow can sneak up on you when summing many elements:

# Seems harmless, but watch out
small_values = np.array([100] * 1000000, dtype=np.int32)

# int32 max is about 2.1 billion
# 100 * 1,000,000 = 100 million, but cumsum keeps growing
result = np.cumsum(small_values)
print(f"Final value: {result[-1]}")  # Will overflow partway through

# Safe version
result_safe = np.cumsum(small_values, dtype=np.int64)
print(f"Final value (safe): {result_safe[-1]}")  # Correct: 100000000

My rule of thumb: always specify dtype=np.int64 or dtype=np.float64 for cumulative sums unless you’re certain about your data range.

Practical Applications

Let’s look at two real-world applications where cumulative sums shine.

Running Balance Calculation

# Daily transactions: positive = deposit, negative = withdrawal
transactions = np.array([1000, -50, -120, 500, -200, -80, 300])
starting_balance = 500

# Calculate running balance
running_balance = starting_balance + np.cumsum(transactions)
print("Daily balance:", running_balance)
# Output: [1500 1450 1330 1830 1630 1550 1850]

# Find minimum balance (for overdraft protection analysis)
min_balance = np.min(running_balance)
min_balance_day = np.argmin(running_balance) + 1
print(f"Minimum balance: ${min_balance} on day {min_balance_day}")

Cumulative Distribution Function (CDF)

# Sample data
np.random.seed(42)
samples = np.random.normal(loc=50, scale=10, size=1000)

# Create histogram
counts, bin_edges = np.histogram(samples, bins=50)

# Convert to probability density
pdf = counts / counts.sum()

# Cumulative distribution
cdf = np.cumsum(pdf)

# Find percentiles
bin_centers = (bin_edges[:-1] + bin_edges[1:]) / 2
p50_idx = np.searchsorted(cdf, 0.5)
p90_idx = np.searchsorted(cdf, 0.9)

print(f"50th percentile (median): {bin_centers[p50_idx]:.1f}")
print(f"90th percentile: {bin_centers[p90_idx]:.1f}")

Performance Considerations

For most use cases, np.cumsum() is fast enough that you won’t think about performance. But when processing large arrays repeatedly, the out parameter lets you reuse memory:

# Pre-allocate output array
data = np.random.rand(10_000_000)
output = np.empty_like(data)

# Compute cumsum into existing array (no new allocation)
np.cumsum(data, out=output)

# Useful in loops where you'd otherwise allocate repeatedly
def process_batches(batches, result_buffer):
    """Process multiple batches without repeated allocation."""
    for batch in batches:
        np.cumsum(batch, out=result_buffer[:len(batch)])
        # Do something with result_buffer
        yield result_buffer[:len(batch)].copy()

The out array must have the correct shape and a compatible dtype. For multi-dimensional arrays with a specified axis, out must match the input shape.

One more performance note: if you only need the final sum (not the running total), use np.sum() instead—it’s faster because it doesn’t store intermediate results.

Conclusion

NumPy’s np.cumsum() handles cumulative sums efficiently across arrays of any dimension. The key points to remember:

  • Use axis to control the direction of accumulation in multi-dimensional arrays: axis=0 sums down columns, axis=1 sums across rows, and axis=None flattens first.
  • Always consider dtype to prevent overflow, especially with integers or large arrays. When in doubt, use dtype=np.int64 or dtype=np.float64.
  • The out parameter enables memory-efficient computation when you’re processing many arrays or working in memory-constrained environments.

Cumulative sums are a building block for many algorithms—running totals, CDFs, prefix sums for range queries, and more. Master np.cumsum() and you’ll find yourself reaching for it regularly.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.