How to Calculate Cumulative Sum in Pandas
Cumulative sum—also called a running total—is one of those operations you'll reach for constantly once you know it exists. It answers questions like 'What's my account balance after each...
Key Insights
- The
cumsum()method calculates running totals instantly on Series or DataFrames, with theaxisparameter controlling whether sums accumulate down rows or across columns. - Combining
groupby()withcumsum()lets you compute running totals within categories—essential for tracking metrics like year-to-date sales by region or cumulative inventory by product. - Missing values propagate through cumulative sums by default (
skipna=True), but understanding when to useskipna=Falseor pre-fill NaN values prevents subtle bugs in financial calculations.
Introduction
Cumulative sum—also called a running total—is one of those operations you’ll reach for constantly once you know it exists. It answers questions like “What’s my account balance after each transaction?” or “How much revenue have we accumulated this quarter?” Instead of showing isolated values, cumulative sums reveal the progressive buildup of data over time or sequence.
Financial analysts use running totals to track portfolio value. Operations teams monitor cumulative inventory movements. Data scientists calculate cumulative distributions for statistical analysis. The pattern appears everywhere, and Pandas makes it trivially easy with the cumsum() method.
This article covers everything from basic usage to grouped calculations and real-world applications. By the end, you’ll handle cumulative sums confidently in any scenario.
Basic Cumulative Sum with cumsum()
The cumsum() method works on both Series and DataFrame objects. At its simplest, you call it on a column and get back a new Series where each value represents the sum of all previous values plus the current one.
import pandas as pd
# Daily sales data
sales_data = {
'date': ['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05'],
'daily_sales': [150, 200, 175, 225, 190]
}
df = pd.DataFrame(sales_data)
df['cumulative_sales'] = df['daily_sales'].cumsum()
print(df)
Output:
date daily_sales cumulative_sales
0 2024-01-01 150 150
1 2024-01-02 200 350
2 2024-01-03 175 525
3 2024-01-04 225 750
4 2024-01-05 190 940
The first row’s cumulative value equals its daily value (150). The second row adds 200 to the previous cumulative total (150 + 200 = 350). This continues down the column, giving you a running total at each point.
You can also call cumsum() directly on a Series:
sales_series = pd.Series([150, 200, 175, 225, 190])
running_total = sales_series.cumsum()
print(running_total.values)
# [150 350 525 750 940]
Cumulative Sum Across DataFrame Axes
When working with multi-column DataFrames, the axis parameter determines the direction of accumulation. By default, axis=0 calculates cumulative sums down each column. Setting axis=1 accumulates across each row.
# Quarterly revenue by department
quarterly_data = {
'Q1': [100, 150, 200],
'Q2': [120, 160, 180],
'Q3': [140, 170, 220],
'Q4': [160, 180, 240]
}
df = pd.DataFrame(quarterly_data, index=['Marketing', 'Sales', 'Engineering'])
# Cumulative sum down columns (axis=0)
# Shows how each department's contribution builds quarter by quarter
print("Column-wise (axis=0):")
print(df.cumsum(axis=0))
Output:
Column-wise (axis=0):
Q1 Q2 Q3 Q4
Marketing 100 120 140 160
Sales 250 280 310 340
Engineering 450 460 530 580
This shows cumulative totals across departments for each quarter. Marketing’s Q1 is 100, then Sales adds 150 to make 250, then Engineering adds 200 to make 450.
# Cumulative sum across rows (axis=1)
# Shows year-to-date totals for each department
print("\nRow-wise (axis=1):")
print(df.cumsum(axis=1))
Output:
Row-wise (axis=1):
Q1 Q2 Q3 Q4
Marketing 100 220 360 520
Sales 150 310 480 660
Engineering 200 380 600 840
Now each row shows that department’s year-to-date revenue. Marketing earned 100 in Q1, 220 through Q2, 360 through Q3, and 520 for the full year.
Choose axis=0 when you want to see how values accumulate across rows (typically categories or entities). Choose axis=1 when you want running totals across columns (typically time periods).
Grouped Cumulative Sum
Real data rarely exists in isolation. You usually need cumulative sums within categories—sales by region, inventory by warehouse, transactions by account. Combining groupby() with cumsum() handles this elegantly.
# Sales transactions by region
transactions = {
'date': ['2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02',
'2024-01-03', '2024-01-03', '2024-01-04', '2024-01-04'],
'region': ['North', 'South', 'North', 'South',
'North', 'South', 'North', 'South'],
'sales': [100, 150, 120, 80, 90, 200, 110, 170]
}
df = pd.DataFrame(transactions)
# Cumulative sales within each region
df['cumulative_by_region'] = df.groupby('region')['sales'].cumsum()
print(df)
Output:
date region sales cumulative_by_region
0 2024-01-01 North 100 100
1 2024-01-01 South 150 150
2 2024-01-02 North 120 220
3 2024-01-02 South 80 230
4 2024-01-03 North 90 310
5 2024-01-03 South 200 430
6 2024-01-04 North 110 420
7 2024-01-04 South 170 600
Each region maintains its own running total. North accumulates 100 → 220 → 310 → 420, while South tracks 150 → 230 → 430 → 600. The cumulative sum resets conceptually for each group.
You can group by multiple columns too:
# Multi-level grouping: region and product
df['product'] = ['A', 'A', 'B', 'A', 'A', 'B', 'B', 'A']
df['cumsum_region_product'] = df.groupby(['region', 'product'])['sales'].cumsum()
This calculates separate running totals for each unique combination of region and product.
Handling Missing Values
Missing values require careful consideration in cumulative calculations. By default, cumsum() uses skipna=True, which treats NaN values as zero for the purpose of accumulation but preserves them in the output.
# Data with missing values
data_with_nan = {
'day': [1, 2, 3, 4, 5],
'value': [10, 20, None, 40, 50]
}
df = pd.DataFrame(data_with_nan)
# Default behavior: skipna=True
df['cumsum_skipna_true'] = df['value'].cumsum(skipna=True)
# Alternative: skipna=False (NaN propagates)
df['cumsum_skipna_false'] = df['value'].cumsum(skipna=False)
print(df)
Output:
day value cumsum_skipna_true cumsum_skipna_false
0 1 10.0 10.0 10.0
1 2 20.0 30.0 30.0
2 3 NaN NaN NaN
3 4 40.0 70.0 NaN
4 5 50.0 120.0 NaN
With skipna=True, the cumulative sum continues past the NaN (30 + 40 = 70), though the NaN position itself remains NaN. With skipna=False, once a NaN appears, all subsequent values become NaN.
For financial calculations where missing data indicates a real problem, skipna=False forces you to address gaps. For time series where occasional missing readings are acceptable, skipna=True keeps the analysis moving.
You can also pre-fill missing values before calculating:
# Fill NaN with zero before cumsum
df['value_filled'] = df['value'].fillna(0)
df['cumsum_filled'] = df['value_filled'].cumsum()
# Or forward-fill to use the last known value
df['value_ffill'] = df['value'].ffill()
df['cumsum_ffill'] = df['value_ffill'].cumsum()
Practical Applications
Let’s look at three real-world scenarios where cumulative sums shine.
Bank Account Running Balance
# Account transactions
account = {
'date': ['2024-01-01', '2024-01-03', '2024-01-05', '2024-01-07', '2024-01-10'],
'description': ['Opening Balance', 'Deposit', 'Grocery Store', 'Paycheck', 'Electric Bill'],
'amount': [1000, 500, -85, 2500, -120]
}
df = pd.DataFrame(account)
df['balance'] = df['amount'].cumsum()
print(df)
Output:
date description amount balance
0 2024-01-01 Opening Balance 1000 1000
1 2024-01-03 Deposit 500 1500
2 2024-01-05 Grocery Store -85 1415
3 2024-01-07 Paycheck 2500 3915
4 2024-01-10 Electric Bill -120 3795
Year-to-Date Sales with Cumulative Percentage
# Monthly sales with cumulative percentage
monthly_sales = {
'month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'],
'sales': [45000, 52000, 48000, 61000, 55000, 67000]
}
df = pd.DataFrame(monthly_sales)
total_sales = df['sales'].sum()
df['ytd_sales'] = df['sales'].cumsum()
df['ytd_percentage'] = (df['ytd_sales'] / total_sales * 100).round(1)
print(df)
Output:
month sales ytd_sales ytd_percentage
0 Jan 45000 45000 13.7
1 Feb 52000 97000 29.6
2 Mar 48000 145000 44.2
3 Apr 61000 206000 62.8
4 May 55000 261000 79.6
5 Jun 67000 328000 100.0
Inventory Tracking with Grouped Cumsum
# Warehouse inventory movements
inventory = {
'date': ['2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02',
'2024-01-03', '2024-01-03'],
'warehouse': ['East', 'West', 'East', 'West', 'East', 'West'],
'product': ['Widget', 'Widget', 'Widget', 'Widget', 'Widget', 'Widget'],
'movement': [100, 150, -20, 50, -30, -40] # positive=in, negative=out
}
df = pd.DataFrame(inventory)
df['running_stock'] = df.groupby('warehouse')['movement'].cumsum()
print(df)
Output:
date warehouse product movement running_stock
0 2024-01-01 East Widget 100 100
1 2024-01-01 West Widget 150 150
2 2024-01-02 East Widget -20 80
3 2024-01-02 West Widget 50 200
4 2024-01-03 East Widget -30 50
5 2024-01-03 West Widget -40 160
Conclusion
Cumulative sums transform raw data into actionable insights about trends and totals over time. The cumsum() method handles the heavy lifting, while axis and groupby() give you precise control over how values accumulate.
Use axis=0 for column-wise accumulation (the default) and axis=1 for row-wise running totals. Combine groupby() with cumsum() when you need separate running totals per category. Pay attention to skipna behavior when your data contains missing values—the default skips NaN but preserves it in output, while skipna=False propagates NaN through all subsequent calculations.
These techniques apply directly to financial tracking, inventory management, time series analysis, and any scenario where understanding cumulative progress matters more than isolated snapshots.