How to Calculate Cumulative Sum in Pandas

Key Insights

The cumsum() method calculates running totals instantly on Series or DataFrames, with the axis parameter controlling whether sums accumulate down rows or across columns.
Combining groupby() with cumsum() lets you compute running totals within categories—essential for tracking metrics like year-to-date sales by region or cumulative inventory by product.
Missing values propagate through cumulative sums by default (skipna=True), but understanding when to use skipna=False or pre-fill NaN values prevents subtle bugs in financial calculations.

Introduction

Cumulative sum—also called a running total—is one of those operations you’ll reach for constantly once you know it exists. It answers questions like “What’s my account balance after each transaction?” or “How much revenue have we accumulated this quarter?” Instead of showing isolated values, cumulative sums reveal the progressive buildup of data over time or sequence.

Financial analysts use running totals to track portfolio value. Operations teams monitor cumulative inventory movements. Data scientists calculate cumulative distributions for statistical analysis. The pattern appears everywhere, and Pandas makes it trivially easy with the cumsum() method.

This article covers everything from basic usage to grouped calculations and real-world applications. By the end, you’ll handle cumulative sums confidently in any scenario.

Basic Cumulative Sum with cumsum()

The cumsum() method works on both Series and DataFrame objects. At its simplest, you call it on a column and get back a new Series where each value represents the sum of all previous values plus the current one.

import pandas as pd

# Daily sales data
sales_data = {
    'date': ['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05'],
    'daily_sales': [150, 200, 175, 225, 190]
}

df = pd.DataFrame(sales_data)
df['cumulative_sales'] = df['daily_sales'].cumsum()

print(df)

Output:

         date  daily_sales  cumulative_sales
0  2024-01-01          150               150
1  2024-01-02          200               350
2  2024-01-03          175               525
3  2024-01-04          225               750
4  2024-01-05          190               940

The first row’s cumulative value equals its daily value (150). The second row adds 200 to the previous cumulative total (150 + 200 = 350). This continues down the column, giving you a running total at each point.

You can also call cumsum() directly on a Series:

sales_series = pd.Series([150, 200, 175, 225, 190])
running_total = sales_series.cumsum()
print(running_total.values)
# [150 350 525 750 940]

Cumulative Sum Across DataFrame Axes

When working with multi-column DataFrames, the axis parameter determines the direction of accumulation. By default, axis=0 calculates cumulative sums down each column. Setting axis=1 accumulates across each row.

# Quarterly revenue by department
quarterly_data = {
    'Q1': [100, 150, 200],
    'Q2': [120, 160, 180],
    'Q3': [140, 170, 220],
    'Q4': [160, 180, 240]
}

df = pd.DataFrame(quarterly_data, index=['Marketing', 'Sales', 'Engineering'])

# Cumulative sum down columns (axis=0)
# Shows how each department's contribution builds quarter by quarter
print("Column-wise (axis=0):")
print(df.cumsum(axis=0))

Output:

Column-wise (axis=0):
              Q1   Q2   Q3   Q4
Marketing    100  120  140  160
Sales        250  280  310  340
Engineering  450  460  530  580

This shows cumulative totals across departments for each quarter. Marketing’s Q1 is 100, then Sales adds 150 to make 250, then Engineering adds 200 to make 450.

# Cumulative sum across rows (axis=1)
# Shows year-to-date totals for each department
print("\nRow-wise (axis=1):")
print(df.cumsum(axis=1))

Output:

Row-wise (axis=1):
              Q1   Q2   Q3   Q4
Marketing    100  220  360  520
Sales        150  310  480  660
Engineering  200  380  600  840

Now each row shows that department’s year-to-date revenue. Marketing earned 100 in Q1, 220 through Q2, 360 through Q3, and 520 for the full year.

Choose axis=0 when you want to see how values accumulate across rows (typically categories or entities). Choose axis=1 when you want running totals across columns (typically time periods).

Grouped Cumulative Sum

Real data rarely exists in isolation. You usually need cumulative sums within categories—sales by region, inventory by warehouse, transactions by account. Combining groupby() with cumsum() handles this elegantly.

# Sales transactions by region
transactions = {
    'date': ['2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02', 
             '2024-01-03', '2024-01-03', '2024-01-04', '2024-01-04'],
    'region': ['North', 'South', 'North', 'South', 
               'North', 'South', 'North', 'South'],
    'sales': [100, 150, 120, 80, 90, 200, 110, 170]
}

df = pd.DataFrame(transactions)

# Cumulative sales within each region
df['cumulative_by_region'] = df.groupby('region')['sales'].cumsum()

print(df)

Output:

         date region  sales  cumulative_by_region
0  2024-01-01  North    100                   100
1  2024-01-01  South    150                   150
2  2024-01-02  North    120                   220
3  2024-01-02  South     80                   230
4  2024-01-03  North     90                   310
5  2024-01-03  South    200                   430
6  2024-01-04  North    110                   420
7  2024-01-04  South    170                   600

Each region maintains its own running total. North accumulates 100 → 220 → 310 → 420, while South tracks 150 → 230 → 430 → 600. The cumulative sum resets conceptually for each group.

You can group by multiple columns too:

# Multi-level grouping: region and product
df['product'] = ['A', 'A', 'B', 'A', 'A', 'B', 'B', 'A']
df['cumsum_region_product'] = df.groupby(['region', 'product'])['sales'].cumsum()

This calculates separate running totals for each unique combination of region and product.

Handling Missing Values

Missing values require careful consideration in cumulative calculations. By default, cumsum() uses skipna=True, which treats NaN values as zero for the purpose of accumulation but preserves them in the output.

# Data with missing values
data_with_nan = {
    'day': [1, 2, 3, 4, 5],
    'value': [10, 20, None, 40, 50]
}

df = pd.DataFrame(data_with_nan)

# Default behavior: skipna=True
df['cumsum_skipna_true'] = df['value'].cumsum(skipna=True)

# Alternative: skipna=False (NaN propagates)
df['cumsum_skipna_false'] = df['value'].cumsum(skipna=False)

print(df)

Output:

   day  value  cumsum_skipna_true  cumsum_skipna_false
0    1   10.0                10.0                 10.0
1    2   20.0                30.0                 30.0
2    3    NaN                 NaN                  NaN
3    4   40.0                70.0                  NaN
4    5   50.0               120.0                  NaN

With skipna=True, the cumulative sum continues past the NaN (30 + 40 = 70), though the NaN position itself remains NaN. With skipna=False, once a NaN appears, all subsequent values become NaN.

For financial calculations where missing data indicates a real problem, skipna=False forces you to address gaps. For time series where occasional missing readings are acceptable, skipna=True keeps the analysis moving.

You can also pre-fill missing values before calculating:

# Fill NaN with zero before cumsum
df['value_filled'] = df['value'].fillna(0)
df['cumsum_filled'] = df['value_filled'].cumsum()

# Or forward-fill to use the last known value
df['value_ffill'] = df['value'].ffill()
df['cumsum_ffill'] = df['value_ffill'].cumsum()

Practical Applications

Let’s look at three real-world scenarios where cumulative sums shine.

Bank Account Running Balance

# Account transactions
account = {
    'date': ['2024-01-01', '2024-01-03', '2024-01-05', '2024-01-07', '2024-01-10'],
    'description': ['Opening Balance', 'Deposit', 'Grocery Store', 'Paycheck', 'Electric Bill'],
    'amount': [1000, 500, -85, 2500, -120]
}

df = pd.DataFrame(account)
df['balance'] = df['amount'].cumsum()

print(df)

Output:

         date      description  amount  balance
0  2024-01-01  Opening Balance    1000     1000
1  2024-01-03          Deposit     500     1500
2  2024-01-05    Grocery Store     -85     1415
3  2024-01-07         Paycheck    2500     3915
4  2024-01-10    Electric Bill    -120     3795

Year-to-Date Sales with Cumulative Percentage

# Monthly sales with cumulative percentage
monthly_sales = {
    'month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'],
    'sales': [45000, 52000, 48000, 61000, 55000, 67000]
}

df = pd.DataFrame(monthly_sales)
total_sales = df['sales'].sum()

df['ytd_sales'] = df['sales'].cumsum()
df['ytd_percentage'] = (df['ytd_sales'] / total_sales * 100).round(1)

print(df)

Output:

  month  sales  ytd_sales  ytd_percentage
0   Jan  45000      45000            13.7
1   Feb  52000      97000            29.6
2   Mar  48000     145000            44.2
3   Apr  61000     206000            62.8
4   May  55000     261000            79.6
5   Jun  67000     328000           100.0

Inventory Tracking with Grouped Cumsum

# Warehouse inventory movements
inventory = {
    'date': ['2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02',
             '2024-01-03', '2024-01-03'],
    'warehouse': ['East', 'West', 'East', 'West', 'East', 'West'],
    'product': ['Widget', 'Widget', 'Widget', 'Widget', 'Widget', 'Widget'],
    'movement': [100, 150, -20, 50, -30, -40]  # positive=in, negative=out
}

df = pd.DataFrame(inventory)
df['running_stock'] = df.groupby('warehouse')['movement'].cumsum()

print(df)

Output:

         date warehouse product  movement  running_stock
0  2024-01-01      East  Widget       100            100
1  2024-01-01      West  Widget       150            150
2  2024-01-02      East  Widget       -20             80
3  2024-01-02      West  Widget        50            200
4  2024-01-03      East  Widget       -30             50
5  2024-01-03      West  Widget       -40            160

Conclusion

Cumulative sums transform raw data into actionable insights about trends and totals over time. The cumsum() method handles the heavy lifting, while axis and groupby() give you precise control over how values accumulate.

Use axis=0 for column-wise accumulation (the default) and axis=1 for row-wise running totals. Combine groupby() with cumsum() when you need separate running totals per category. Pay attention to skipna behavior when your data contains missing values—the default skips NaN but preserves it in output, while skipna=False propagates NaN through all subsequent calculations.

These techniques apply directly to financial tracking, inventory management, time series analysis, and any scenario where understanding cumulative progress matters more than isolated snapshots.

Introduction

Basic Cumulative Sum with cumsum()

Cumulative Sum Across DataFrame Axes

Grouped Cumulative Sum

Handling Missing Values

Practical Applications

Bank Account Running Balance

Year-to-Date Sales with Cumulative Percentage

Inventory Tracking with Grouped Cumsum

Conclusion

Liked this? There's more.

Similar Articles