Pandas - Percentage Change (pct_change)
• The `pct_change()` method calculates percentage change between consecutive elements, essential for analyzing trends in time series data, financial metrics, and growth rates
Key Insights
• The pct_change() method calculates percentage change between consecutive elements, essential for analyzing trends in time series data, financial metrics, and growth rates
• Understanding the periods parameter allows you to calculate percentage changes over custom intervals, while fill_method controls how missing values affect calculations
• Combining pct_change() with groupby operations enables powerful comparative analysis across different categories or time periods in your datasets
Basic Percentage Change Calculation
The pct_change() method computes the percentage change between the current and prior element in a Series or DataFrame. The formula is straightforward: (current - previous) / previous.
import pandas as pd
import numpy as np
# Simple Series example
prices = pd.Series([100, 105, 102, 110, 115])
pct_change = prices.pct_change()
print("Original Prices:")
print(prices)
print("\nPercentage Change:")
print(pct_change)
Output:
Original Prices:
0 100
1 105
2 102
3 110
4 115
Percentage Change:
0 NaN
1 0.050000
2 -0.028571
3 0.078431
4 0.045455
The first value is always NaN because there’s no previous value to compare against. The second value (0.05) represents a 5% increase from 100 to 105.
Working with DataFrames
When applied to DataFrames, pct_change() operates column-wise by default, calculating percentage changes for each column independently.
# Stock price data
data = {
'AAPL': [150, 155, 152, 158, 162],
'GOOGL': [2800, 2750, 2790, 2820, 2850],
'MSFT': [300, 305, 310, 308, 315]
}
df = pd.DataFrame(data)
pct_df = df.pct_change()
print("Original DataFrame:")
print(df)
print("\nPercentage Changes:")
print(pct_df)
Output:
Original DataFrame:
AAPL GOOGL MSFT
0 150 2800 300
1 155 2750 305
2 152 2790 310
3 158 2820 308
4 162 2850 315
Percentage Changes:
AAPL GOOGL MSFT
0 NaN NaN NaN
1 0.033333 -0.017857 0.016667
2 -0.019355 0.014545 0.016393
3 0.039474 0.010753 -0.006452
4 0.025316 0.010638 0.022727
Custom Period Intervals
The periods parameter lets you calculate percentage changes over different time intervals. This is crucial for comparing short-term versus long-term trends.
# Monthly revenue data
revenue = pd.Series([1000, 1050, 1100, 1080, 1150, 1200, 1250])
# Different period comparisons
daily_change = revenue.pct_change(periods=1) # Default
weekly_change = revenue.pct_change(periods=7) # Would need more data
three_period_change = revenue.pct_change(periods=3)
print("Revenue:", revenue.values)
print("\n1-Period Change:", daily_change.values)
print("3-Period Change:", three_period_change.values)
Output:
Revenue: [1000 1050 1100 1080 1150 1200 1250]
1-Period Change: [ nan 0.05 0.04761905 -0.01818182 0.06481481 0.04347826
0.04166667]
3-Period Change: [nan nan nan 0.08 0.0952381 0.09090909 0.15740741]
The 3-period change at index 3 shows an 8% increase from index 0 (1080 vs 1000), providing a broader view of the trend.
Handling Missing Values
The fill_method parameter controls how pct_change() handles NaN values in your data. By default, it’s set to None (previously ‘pad’ in older versions).
# Data with missing values
sales = pd.Series([100, np.nan, 120, 125, np.nan, 140])
# Different fill strategies
no_fill = sales.pct_change()
forward_fill = sales.pct_change(fill_method='ffill')
print("Original Sales:")
print(sales)
print("\nNo Fill Method:")
print(no_fill)
print("\nForward Fill Method:")
print(forward_fill)
Output:
Original Sales:
0 100.0
1 NaN
2 120.0
3 125.0
4 NaN
5 140.0
No Fill Method:
0 NaN
1 NaN
2 NaN
3 0.041667
4 NaN
5 NaN
Forward Fill Method:
0 NaN
1 0.000000
2 0.200000
3 0.041667
4 0.000000
5 0.120000
Time Series Analysis with DateTime Index
Percentage change calculations become particularly powerful when working with time-indexed data.
# Create time series data
dates = pd.date_range('2024-01-01', periods=10, freq='D')
stock_data = pd.DataFrame({
'price': [100, 102, 105, 103, 107, 110, 108, 112, 115, 118],
'volume': [1000, 1200, 1100, 1300, 1250, 1400, 1350, 1500, 1450, 1600]
}, index=dates)
# Calculate daily and 3-day percentage changes
stock_data['daily_pct'] = stock_data['price'].pct_change()
stock_data['3day_pct'] = stock_data['price'].pct_change(periods=3)
print(stock_data.head(6))
Output:
price volume daily_pct 3day_pct
2024-01-01 100 1000 NaN NaN
2024-01-02 102 1200 0.020000 NaN
2024-01-03 105 1100 0.029412 NaN
2024-01-04 103 1300 -0.019048 0.030000
2024-01-05 107 1250 0.038835 0.049020
2024-01-06 110 1400 0.028037 0.047619
GroupBy Operations with Percentage Change
When analyzing categorical data, combining groupby() with pct_change() enables within-group trend analysis.
# Sales data for multiple products
sales_data = pd.DataFrame({
'product': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
'month': [1, 2, 3, 1, 2, 3, 1, 2, 3],
'revenue': [1000, 1100, 1150, 800, 850, 820, 1200, 1300, 1400]
})
# Calculate percentage change within each product group
sales_data['pct_change'] = sales_data.groupby('product')['revenue'].pct_change()
print(sales_data)
Output:
product month revenue pct_change
0 A 1 1000 NaN
1 A 2 1100 0.100000
2 A 3 1150 0.045455
3 B 1 800 NaN
4 B 2 850 0.062500
5 B 3 820 -0.035294
6 C 1 1200 NaN
7 C 2 1300 0.083333
8 C 3 1400 0.076923
Each product’s percentage changes are calculated independently, making it easy to compare growth patterns across categories.
Cumulative Returns and Compounding
For financial analysis, you often need to calculate cumulative returns from percentage changes.
# Daily returns
returns = pd.Series([0.02, -0.01, 0.03, 0.015, -0.005])
# Calculate cumulative return using compounding
cumulative_return = (1 + returns).cumprod() - 1
print("Daily Returns:")
print(returns.values)
print("\nCumulative Return:")
print(cumulative_return.values)
print(f"\nTotal Return: {cumulative_return.iloc[-1]:.2%}")
Output:
Daily Returns:
[ 0.02 -0.01 0.03 0.015 -0.005]
Cumulative Return:
[0.02 0.0098 0.04094 0.05665391 0.05136864]
Total Return: 5.14%
Axis Control for Row-wise Calculations
The axis parameter allows you to calculate percentage changes across rows instead of down columns.
# Quarterly revenue by region
quarterly_data = pd.DataFrame({
'Q1': [100, 200, 150],
'Q2': [110, 210, 155],
'Q3': [120, 205, 160],
'Q4': [125, 220, 165]
}, index=['North', 'South', 'West'])
# Calculate quarter-over-quarter change for each region
qoq_change = quarterly_data.pct_change(axis=1)
print("Original Data:")
print(quarterly_data)
print("\nQuarter-over-Quarter Change:")
print(qoq_change)
Output:
Original Data:
Q1 Q2 Q3 Q4
North 100 110 120 125
South 200 210 205 220
West 150 155 160 165
Quarter-over-Quarter Change:
Q1 Q2 Q3 Q4
North NaN 0.100000 0.090909 0.041667
South NaN 0.050000 -0.023810 0.073171
West NaN 0.033333 0.032258 0.031250
This approach is essential when your time series runs horizontally rather than vertically in your DataFrame structure.