How to Calculate Percent Change in Pandas
Percent change is one of the most fundamental calculations in data analysis. Whether you're tracking stock returns, measuring revenue growth, analyzing user engagement metrics, or monitoring...
Key Insights
- Pandas’
pct_change()method calculates percent change using the formula(current - previous) / previous, returning decimal values (0.05 = 5% increase) - The
periodsparameter lets you calculate change over any interval—useperiods=7for week-over-week analysis on daily data orperiods=12for year-over-year on monthly data - Always handle the resulting NaN values deliberately; the first row(s) will always be NaN since there’s no previous value to compare against
Introduction
Percent change is one of the most fundamental calculations in data analysis. Whether you’re tracking stock returns, measuring revenue growth, analyzing user engagement metrics, or monitoring inventory levels, you need to understand how values change relative to their previous state.
The formula is straightforward: (current_value - previous_value) / previous_value. But implementing this manually across thousands of rows, handling edge cases, and applying it to grouped data gets tedious fast. That’s where Pandas’ pct_change() method comes in—it handles the heavy lifting while giving you control over the calculation details.
This article covers everything you need to calculate percent change effectively in Pandas, from basic usage to grouped operations and manual alternatives for edge cases.
Basic Usage of pct_change()
The pct_change() method works on both Series and DataFrames. By default, it compares each value to the immediately preceding value.
import pandas as pd
# Sample stock prices
prices = pd.Series([100, 105, 103, 110, 108],
index=pd.date_range('2024-01-01', periods=5, freq='D'),
name='price')
print("Original prices:")
print(prices)
print("\nPercent change:")
print(prices.pct_change())
Output:
Original prices:
2024-01-01 100
2024-01-02 105
2024-01-03 103
2024-01-04 110
2024-01-05 108
Freq: D, Name: price, dtype: int64
Percent change:
2024-01-01 NaN
2024-01-02 0.050000
2024-01-03 -0.019048
2024-01-04 0.067961
2024-01-05 -0.018182
Freq: D, Name: price, dtype: float64
Notice two things. First, the result is in decimal form—0.05 means a 5% increase, -0.019048 means approximately a 1.9% decrease. Multiply by 100 if you need percentage format. Second, the first value is NaN because there’s no previous value to compare against.
The underlying calculation for each row is:
- Row 2: (105 - 100) / 100 = 0.05
- Row 3: (103 - 105) / 105 = -0.019048
- Row 4: (110 - 103) / 103 = 0.067961
- Row 5: (108 - 110) / 110 = -0.018182
Key Parameters
The pct_change() method accepts several parameters that modify its behavior:
periods: Controls how many rows back to compare. Default is 1 (compare to immediately previous row).
fill_method: Historically used to handle missing values before calculation. Deprecated in recent Pandas versions—handle missing data explicitly instead.
limit: When using fill_method, limits how many consecutive NaNs to fill.
The periods parameter is particularly useful for time series analysis:
import pandas as pd
import numpy as np
# Daily sales data for two weeks
dates = pd.date_range('2024-01-01', periods=14, freq='D')
daily_sales = pd.Series([1000, 1050, 980, 1100, 1150, 1080, 1200,
1100, 1180, 1050, 1220, 1280, 1150, 1350],
index=dates, name='sales')
# Day-over-day change (default)
daily_change = daily_sales.pct_change()
# Week-over-week change (compare to same day last week)
weekly_change = daily_sales.pct_change(periods=7)
comparison = pd.DataFrame({
'sales': daily_sales,
'daily_pct': daily_change,
'weekly_pct': weekly_change
})
print(comparison.tail(7))
Output:
sales daily_pct weekly_pct
2024-01-08 1100 -0.083333 0.100000
2024-01-09 1180 0.072727 0.123810
2024-01-10 1050 -0.110169 0.071429
2024-01-11 1220 0.161905 0.109091
2024-01-12 1280 0.049180 0.113043
2024-01-13 1150 -0.101562 0.064815
2024-01-14 1350 0.173913 0.125000
Week-over-week comparisons often reveal trends that daily fluctuations obscure. For monthly data, use periods=12 to get year-over-year change.
Handling Missing Values
Missing values require careful handling. The pct_change() method produces NaN when either the current or previous value is missing:
import pandas as pd
import numpy as np
# Data with missing values
data = pd.Series([100, 105, np.nan, 110, 115, np.nan, np.nan, 125])
print("Original data:")
print(data)
print("\nPercent change (no preprocessing):")
print(data.pct_change())
Output:
Original data:
0 100.0
1 105.0
2 NaN
3 110.0
4 115.0
5 NaN
6 NaN
7 125.0
dtype: float64
Percent change (no preprocessing):
0 NaN
1 0.050000
2 NaN
3 NaN
4 0.045455
5 NaN
6 NaN
7 NaN
dtype: float64
The old fill_method parameter is deprecated. Instead, handle missing values explicitly before calling pct_change():
import pandas as pd
import numpy as np
data = pd.Series([100, 105, np.nan, 110, 115, np.nan, np.nan, 125])
# Option 1: Forward fill before calculating
filled_data = data.ffill()
pct_with_ffill = filled_data.pct_change()
# Option 2: Interpolate missing values
interpolated_data = data.interpolate()
pct_with_interp = interpolated_data.pct_change()
comparison = pd.DataFrame({
'original': data,
'ffill': filled_data,
'pct_ffill': pct_with_ffill,
'interpolated': interpolated_data,
'pct_interp': pct_with_interp
})
print(comparison)
Choose your fill strategy based on your data’s characteristics. Forward fill works well for prices (last known value persists). Interpolation suits metrics that change gradually. Sometimes dropping NaN rows entirely is the right call.
Applying to DataFrames and GroupBy
When applied to a DataFrame, pct_change() calculates percent change for each column independently:
import pandas as pd
# Multi-product sales data
df = pd.DataFrame({
'date': pd.date_range('2024-01-01', periods=6, freq='M'),
'product_a': [1000, 1100, 1050, 1200, 1180, 1300],
'product_b': [500, 520, 540, 530, 560, 590],
'product_c': [2000, 2100, 2200, 2150, 2300, 2400]
})
df.set_index('date', inplace=True)
# Calculate percent change for all products at once
pct_changes = df.pct_change()
print(pct_changes)
For grouped calculations—like calculating growth per product category or per region—combine groupby() with pct_change():
import pandas as pd
# Sales data by product and month
sales_data = pd.DataFrame({
'month': ['Jan', 'Feb', 'Mar', 'Apr'] * 3,
'product': ['Widget'] * 4 + ['Gadget'] * 4 + ['Gizmo'] * 4,
'revenue': [1000, 1100, 1050, 1200, # Widget
500, 550, 600, 580, # Gadget
2000, 2200, 2100, 2400] # Gizmo
})
# Sort to ensure correct order within groups
sales_data = sales_data.sort_values(['product', 'month'])
# Calculate month-over-month growth within each product
sales_data['mom_growth'] = sales_data.groupby('product')['revenue'].pct_change()
print(sales_data.sort_values(['product', 'month']))
Output:
month product revenue mom_growth
5 Feb Gadget 550 0.100000
4 Jan Gadget 500 NaN
7 Apr Gadget 580 -0.033333
6 Mar Gadget 600 0.090909
9 Feb Gizmo 2200 0.100000
8 Jan Gizmo 2000 NaN
11 Apr Gizmo 2400 0.142857
10 Mar Gizmo 2100 -0.045455
1 Feb Widget 1100 0.100000
0 Jan Widget 1000 NaN
3 Apr Widget 1200 0.142857
2 Mar Widget 1050 -0.045455
Each product’s percent change is calculated independently, with the first month of each product showing NaN.
Manual Calculation Alternative
Sometimes you need more control than pct_change() provides. Use shift() to implement the calculation manually:
import pandas as pd
df = pd.DataFrame({
'date': pd.date_range('2024-01-01', periods=5, freq='D'),
'value': [100, 105, 103, 110, 108]
})
# Manual percent change calculation
df['pct_change_manual'] = (df['value'] - df['value'].shift(1)) / df['value'].shift(1)
# Verify it matches built-in method
df['pct_change_builtin'] = df['value'].pct_change()
print(df)
Manual calculation is useful when you need:
- Custom handling of edge cases (e.g., division by zero)
- Absolute change alongside percent change
- Different comparison logic (e.g., compare to a fixed baseline)
import pandas as pd
df = pd.DataFrame({
'value': [100, 105, 0, 110, 108] # Note the zero
})
# Built-in produces inf when dividing by zero
df['builtin'] = df['value'].pct_change()
# Manual with zero handling
previous = df['value'].shift(1)
df['manual_safe'] = (df['value'] - previous) / previous.replace(0, pd.NA)
print(df)
Practical Use Case: Financial Analysis
Let’s put it all together with a realistic financial analysis example:
import pandas as pd
import numpy as np
# Simulated stock data for multiple tickers
np.random.seed(42)
dates = pd.date_range('2024-01-01', periods=60, freq='B') # Business days
stock_data = pd.DataFrame({
'date': dates.tolist() * 3,
'ticker': ['AAPL'] * 60 + ['GOOGL'] * 60 + ['MSFT'] * 60,
'close': np.concatenate([
150 + np.cumsum(np.random.randn(60) * 2), # AAPL
140 + np.cumsum(np.random.randn(60) * 2.5), # GOOGL
380 + np.cumsum(np.random.randn(60) * 3), # MSFT
])
})
# Calculate daily returns per stock
stock_data = stock_data.sort_values(['ticker', 'date'])
stock_data['daily_return'] = stock_data.groupby('ticker')['close'].pct_change()
# Calculate 5-day (weekly) returns
stock_data['weekly_return'] = stock_data.groupby('ticker')['close'].pct_change(periods=5)
# Summary statistics per ticker
summary = stock_data.groupby('ticker')['daily_return'].agg([
('mean_daily_return', 'mean'),
('std_daily_return', 'std'),
('min_daily_return', 'min'),
('max_daily_return', 'max')
])
# Annualized metrics (assuming 252 trading days)
summary['annualized_return'] = summary['mean_daily_return'] * 252
summary['annualized_volatility'] = summary['std_daily_return'] * np.sqrt(252)
summary['sharpe_ratio'] = summary['annualized_return'] / summary['annualized_volatility']
print("Daily Return Statistics by Ticker:")
print(summary.round(4))
This example demonstrates a complete workflow: loading data, calculating returns at multiple intervals, grouping by category, and deriving meaningful summary statistics.
For visualization, plot the cumulative returns:
# Calculate cumulative returns
stock_data['cumulative_return'] = stock_data.groupby('ticker')['daily_return'].transform(
lambda x: (1 + x).cumprod() - 1
)
# Plot with: stock_data.pivot(index='date', columns='ticker', values='cumulative_return').plot()
The pct_change() method handles the tedious calculation work, letting you focus on analysis and interpretation. Master its parameters and combine it with groupby operations, and you’ll handle most percent change scenarios efficiently.