Pandas - Percentage Change (pct_change)

• The `pct_change()` method calculates percentage change between consecutive elements, essential for analyzing trends in time series data, financial metrics, and growth rates

Key Insights

• The pct_change() method calculates percentage change between consecutive elements, essential for analyzing trends in time series data, financial metrics, and growth rates • Understanding the periods parameter allows you to calculate percentage changes over custom intervals, while fill_method controls how missing values affect calculations • Combining pct_change() with groupby operations enables powerful comparative analysis across different categories or time periods in your datasets

Basic Percentage Change Calculation

The pct_change() method computes the percentage change between the current and prior element in a Series or DataFrame. The formula is straightforward: (current - previous) / previous.

import pandas as pd
import numpy as np

# Simple Series example
prices = pd.Series([100, 105, 102, 110, 115])
pct_change = prices.pct_change()

print("Original Prices:")
print(prices)
print("\nPercentage Change:")
print(pct_change)

Output:

Original Prices:
0    100
1    105
2    102
3    110
4    115

Percentage Change:
0         NaN
1    0.050000
2   -0.028571
3    0.078431
4    0.045455

The first value is always NaN because there’s no previous value to compare against. The second value (0.05) represents a 5% increase from 100 to 105.

Working with DataFrames

When applied to DataFrames, pct_change() operates column-wise by default, calculating percentage changes for each column independently.

# Stock price data
data = {
    'AAPL': [150, 155, 152, 158, 162],
    'GOOGL': [2800, 2750, 2790, 2820, 2850],
    'MSFT': [300, 305, 310, 308, 315]
}

df = pd.DataFrame(data)
pct_df = df.pct_change()

print("Original DataFrame:")
print(df)
print("\nPercentage Changes:")
print(pct_df)

Output:

Original DataFrame:
   AAPL  GOOGL  MSFT
0   150   2800   300
1   155   2750   305
2   152   2790   310
3   158   2820   308
4   162   2850   315

Percentage Changes:
       AAPL     GOOGL      MSFT
0       NaN       NaN       NaN
1  0.033333 -0.017857  0.016667
2 -0.019355  0.014545  0.016393
3  0.039474  0.010753 -0.006452
4  0.025316  0.010638  0.022727

Custom Period Intervals

The periods parameter lets you calculate percentage changes over different time intervals. This is crucial for comparing short-term versus long-term trends.

# Monthly revenue data
revenue = pd.Series([1000, 1050, 1100, 1080, 1150, 1200, 1250])

# Different period comparisons
daily_change = revenue.pct_change(periods=1)  # Default
weekly_change = revenue.pct_change(periods=7)  # Would need more data
three_period_change = revenue.pct_change(periods=3)

print("Revenue:", revenue.values)
print("\n1-Period Change:", daily_change.values)
print("3-Period Change:", three_period_change.values)

Output:

Revenue: [1000 1050 1100 1080 1150 1200 1250]

1-Period Change: [        nan  0.05        0.04761905 -0.01818182  0.06481481  0.04347826
  0.04166667]
3-Period Change: [nan nan nan 0.08 0.0952381 0.09090909 0.15740741]

The 3-period change at index 3 shows an 8% increase from index 0 (1080 vs 1000), providing a broader view of the trend.

Handling Missing Values

The fill_method parameter controls how pct_change() handles NaN values in your data. By default, it’s set to None (previously ‘pad’ in older versions).

# Data with missing values
sales = pd.Series([100, np.nan, 120, 125, np.nan, 140])

# Different fill strategies
no_fill = sales.pct_change()
forward_fill = sales.pct_change(fill_method='ffill')

print("Original Sales:")
print(sales)
print("\nNo Fill Method:")
print(no_fill)
print("\nForward Fill Method:")
print(forward_fill)

Output:

Original Sales:
0    100.0
1      NaN
2    120.0
3    125.0
4      NaN
5    140.0

No Fill Method:
0         NaN
1         NaN
2         NaN
3    0.041667
4         NaN
5         NaN

Forward Fill Method:
0         NaN
1    0.000000
2    0.200000
3    0.041667
4    0.000000
5    0.120000

Time Series Analysis with DateTime Index

Percentage change calculations become particularly powerful when working with time-indexed data.

# Create time series data
dates = pd.date_range('2024-01-01', periods=10, freq='D')
stock_data = pd.DataFrame({
    'price': [100, 102, 105, 103, 107, 110, 108, 112, 115, 118],
    'volume': [1000, 1200, 1100, 1300, 1250, 1400, 1350, 1500, 1450, 1600]
}, index=dates)

# Calculate daily and 3-day percentage changes
stock_data['daily_pct'] = stock_data['price'].pct_change()
stock_data['3day_pct'] = stock_data['price'].pct_change(periods=3)

print(stock_data.head(6))

Output:

            price  volume  daily_pct   3day_pct
2024-01-01    100    1000        NaN        NaN
2024-01-02    102    1200   0.020000        NaN
2024-01-03    105    1100   0.029412        NaN
2024-01-04    103    1300  -0.019048   0.030000
2024-01-05    107    1250   0.038835   0.049020
2024-01-06    110    1400   0.028037   0.047619

GroupBy Operations with Percentage Change

When analyzing categorical data, combining groupby() with pct_change() enables within-group trend analysis.

# Sales data for multiple products
sales_data = pd.DataFrame({
    'product': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
    'month': [1, 2, 3, 1, 2, 3, 1, 2, 3],
    'revenue': [1000, 1100, 1150, 800, 850, 820, 1200, 1300, 1400]
})

# Calculate percentage change within each product group
sales_data['pct_change'] = sales_data.groupby('product')['revenue'].pct_change()

print(sales_data)

Output:

  product  month  revenue  pct_change
0       A      1     1000         NaN
1       A      2     1100    0.100000
2       A      3     1150    0.045455
3       B      1      800         NaN
4       B      2      850    0.062500
5       B      3      820   -0.035294
6       C      1     1200         NaN
7       C      2     1300    0.083333
8       C      3     1400    0.076923

Each product’s percentage changes are calculated independently, making it easy to compare growth patterns across categories.

Cumulative Returns and Compounding

For financial analysis, you often need to calculate cumulative returns from percentage changes.

# Daily returns
returns = pd.Series([0.02, -0.01, 0.03, 0.015, -0.005])

# Calculate cumulative return using compounding
cumulative_return = (1 + returns).cumprod() - 1

print("Daily Returns:")
print(returns.values)
print("\nCumulative Return:")
print(cumulative_return.values)
print(f"\nTotal Return: {cumulative_return.iloc[-1]:.2%}")

Output:

Daily Returns:
[ 0.02  -0.01   0.03   0.015 -0.005]

Cumulative Return:
[0.02       0.0098     0.04094    0.05665391 0.05136864]

Total Return: 5.14%

Axis Control for Row-wise Calculations

The axis parameter allows you to calculate percentage changes across rows instead of down columns.

# Quarterly revenue by region
quarterly_data = pd.DataFrame({
    'Q1': [100, 200, 150],
    'Q2': [110, 210, 155],
    'Q3': [120, 205, 160],
    'Q4': [125, 220, 165]
}, index=['North', 'South', 'West'])

# Calculate quarter-over-quarter change for each region
qoq_change = quarterly_data.pct_change(axis=1)

print("Original Data:")
print(quarterly_data)
print("\nQuarter-over-Quarter Change:")
print(qoq_change)

Output:

Original Data:
        Q1   Q2   Q3   Q4
North  100  110  120  125
South  200  210  205  220
West   150  155  160  165

Quarter-over-Quarter Change:
         Q1        Q2        Q3        Q4
North   NaN  0.100000  0.090909  0.041667
South   NaN  0.050000 -0.023810  0.073171
West    NaN  0.033333  0.032258  0.031250

This approach is essential when your time series runs horizontally rather than vertically in your DataFrame structure.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.