How to Shift Values in Pandas
Shifting values is one of the most fundamental operations in time series analysis and data manipulation. The pandas `shift()` method moves data up or down along an axis, creating offset versions of...
Key Insights
- The
shift()method moves data up or down by a specified number of periods, creating lagged or leading values essential for time series analysis and feature engineering. - Use positive periods to shift data down (creating lag features) and negative periods to shift up (creating lead features), with
fill_valueto control how gaps are filled. - Time-based shifting with the
freqparameter operates on the index rather than row positions, making it invaluable for irregular time series data.
Introduction to the shift() Method
Shifting values is one of the most fundamental operations in time series analysis and data manipulation. The pandas shift() method moves data up or down along an axis, creating offset versions of your data that enable comparisons between current and previous (or future) values.
You’ll reach for shift() constantly when:
- Calculating period-over-period changes (daily returns, monthly growth)
- Creating lagged features for machine learning models
- Computing differences between consecutive rows
- Building rolling calculations that need access to previous values
The basic syntax is straightforward:
import pandas as pd
import numpy as np
# Create a simple Series
sales = pd.Series([100, 150, 130, 180, 200],
index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri'],
name='sales')
print("Original:")
print(sales)
print("\nShifted by 1:")
print(sales.shift(1))
Output:
Original:
Mon 100
Tue 150
Wed 130
Thu 180
Fri 200
Name: sales, dtype: int64
Shifted by 1:
Mon NaN
Tue 100.0
Wed 150.0
Thu 130.0
Fri 180.0
Name: sales, dtype: float64
Notice how each value moved down one position, and NaN filled the gap at the top. This is the core behavior you’ll build upon.
Shifting Rows Up and Down
The periods parameter controls direction and magnitude. Positive values shift data downward (backward in time), while negative values shift upward (forward in time).
df = pd.DataFrame({
'date': pd.date_range('2024-01-01', periods=5),
'revenue': [1000, 1200, 1100, 1400, 1300],
'units': [50, 60, 55, 70, 65]
})
print("Original DataFrame:")
print(df)
# Shift down by 1 (lag)
print("\nShifted down by 1 (previous day's values):")
print(df[['revenue', 'units']].shift(1))
# Shift up by 1 (lead)
print("\nShifted up by 1 (next day's values):")
print(df[['revenue', 'units']].shift(-1))
# Shift by multiple periods
print("\nShifted down by 2 (values from 2 days ago):")
print(df[['revenue', 'units']].shift(2))
Output:
Original DataFrame:
date revenue units
0 2024-01-01 1000 50
1 2024-01-02 1200 60
2 2024-01-03 1100 55
3 2024-01-04 1400 70
4 2024-01-05 1300 65
Shifted down by 1 (previous day's values):
revenue units
0 NaN NaN
1 1000.0 50.0
2 1200.0 60.0
3 1100.0 55.0
4 1400.0 70.0
Shifted up by 1 (next day's values):
revenue units
0 1200.0 60.0
1 1100.0 55.0
2 1400.0 70.0
3 1300.0 65.0
4 NaN NaN
Shifted down by 2 (values from 2 days ago):
revenue units
0 NaN NaN
1 NaN NaN
2 1000.0 50.0
3 1200.0 60.0
4 1100.0 55.0
Think of positive shifts as “give me what happened N periods ago” and negative shifts as “give me what will happen N periods from now.”
Shifting with Different Fill Values
The default NaN fill isn’t always what you want. The fill_value parameter lets you specify a replacement:
inventory = pd.Series([100, 80, 120, 90, 110], name='stock')
# Default NaN fill
print("Shift with NaN (default):")
print(inventory.shift(1))
# Fill with 0
print("\nShift with fill_value=0:")
print(inventory.shift(1, fill_value=0))
# Fill with the mean
print("\nShift with fill_value=mean:")
print(inventory.shift(1, fill_value=inventory.mean()))
Output:
Shift with NaN (default):
0 NaN
1 100.0
2 80.0
3 120.0
4 90.0
Name: stock, dtype: float64
Shift with fill_value=0:
0 0
1 100
2 80
3 120
4 90
Name: stock, dtype: int64
Shift with fill_value=mean:
0 100.0
1 100.0
2 80.0
3 120.0
4 90.0
Name: stock, dtype: float64
Notice that using fill_value preserves the original integer dtype when filling with an integer. This matters for memory efficiency and downstream operations that expect specific types.
Shifting Along Columns (Axis Parameter)
By default, shift() operates on rows (axis=0). Set axis=1 to shift horizontally across columns:
quarterly_data = pd.DataFrame({
'Q1': [100, 200, 150],
'Q2': [120, 180, 160],
'Q3': [140, 220, 170],
'Q4': [130, 240, 180]
}, index=['Product A', 'Product B', 'Product C'])
print("Original quarterly data:")
print(quarterly_data)
# Shift columns to the right
print("\nShifted right by 1 (previous quarter's values in each column):")
print(quarterly_data.shift(1, axis=1))
# Shift columns to the left
print("\nShifted left by 1 (next quarter's values in each column):")
print(quarterly_data.shift(-1, axis=1))
Output:
Original quarterly data:
Q1 Q2 Q3 Q4
Product A 100 120 140 130
Product B 200 180 220 240
Product C 150 160 170 180
Shifted right by 1 (previous quarter's values in each column):
Q1 Q2 Q3 Q4
Product A NaN 100.0 120.0 140.0
Product B NaN 200.0 180.0 220.0
Product C NaN 150.0 160.0 170.0
Shifted left by 1 (next quarter's values in each column):
Q1 Q2 Q3 Q4
Product A 120 140.0 130.0 NaN
Product B 180 220.0 240.0 NaN
Product C 160 170.0 180.0 NaN
Column shifting is useful for comparing adjacent time periods when your data is structured with periods as columns rather than rows.
Time-Based Shifting with freq Parameter
When working with DatetimeIndex data, the freq parameter shifts the index itself rather than the data positions. This distinction is critical for irregular time series:
# Create time series with DatetimeIndex
ts = pd.Series(
[100, 150, 130, 180],
index=pd.to_datetime(['2024-01-01', '2024-01-03', '2024-01-05', '2024-01-08']),
name='value'
)
print("Original (irregular time series):")
print(ts)
# Row-based shift (default) - shifts values, not dates
print("\nRow-based shift(1) - values move, index stays:")
print(ts.shift(1))
# Time-based shift - shifts the index
print("\nTime-based shift(1, freq='D') - index moves by 1 day:")
print(ts.shift(1, freq='D'))
# Shift by different frequencies
print("\nShift by 2 days:")
print(ts.shift(2, freq='D'))
print("\nShift by 1 week:")
print(ts.shift(1, freq='W'))
Output:
Original (irregular time series):
2024-01-01 100
2024-01-03 150
2024-01-05 130
2024-01-08 180
Name: value, dtype: int64
Row-based shift(1) - values move, index stays:
2024-01-01 NaN
2024-01-03 100.0
2024-01-05 150.0
2024-01-08 130.0
Name: value, dtype: float64
Time-based shift(1, freq='D') - index moves by 1 day:
2024-01-02 100
2024-01-04 150
2024-01-06 130
2024-01-09 180
Name: value, dtype: int64
Shift by 2 days:
2024-01-03 100
2024-01-05 150
2024-01-07 130
2024-01-10 180
Name: value, dtype: int64
Shift by 1 week:
2024-01-08 100
2024-01-10 150
2024-01-12 130
2024-01-15 180
Name: value, dtype: int64
With freq, the data stays intact while timestamps move. No NaN values appear because nothing is being displaced—just relabeled.
Practical Applications
Here’s where shift() proves its worth in real analysis work.
Calculating Period-Over-Period Changes
stock_prices = pd.DataFrame({
'date': pd.date_range('2024-01-01', periods=6),
'price': [100.0, 102.5, 101.0, 105.0, 103.5, 108.0]
})
# Daily price change (absolute)
stock_prices['daily_change'] = stock_prices['price'] - stock_prices['price'].shift(1)
# Daily return (percentage)
stock_prices['daily_return'] = (
(stock_prices['price'] - stock_prices['price'].shift(1)) /
stock_prices['price'].shift(1) * 100
)
# Equivalent using pct_change() - but shift() gives you more control
stock_prices['pct_change_method'] = stock_prices['price'].pct_change() * 100
print(stock_prices.round(2))
Output:
date price daily_change daily_return pct_change_method
0 2024-01-01 100.0 NaN NaN NaN
1 2024-01-02 102.5 2.5 2.50 2.50
2 2024-01-03 101.0 -1.5 -1.46 -1.46
3 2024-01-04 105.0 4.0 3.96 3.96
4 2024-01-05 103.5 -1.5 -1.43 -1.43
5 2024-01-06 108.0 4.5 4.35 4.35
Creating Lagged Features for Machine Learning
def create_lag_features(df, column, lags):
"""Create multiple lag features for ML models."""
result = df.copy()
for lag in lags:
result[f'{column}_lag_{lag}'] = result[column].shift(lag)
return result
sensor_data = pd.DataFrame({
'timestamp': pd.date_range('2024-01-01', periods=10, freq='h'),
'temperature': [20, 21, 22, 23, 24, 23, 22, 21, 20, 19]
})
# Create lag features for prediction
sensor_with_lags = create_lag_features(sensor_data, 'temperature', [1, 2, 3])
# Add a lead feature (what we're predicting)
sensor_with_lags['temperature_next'] = sensor_with_lags['temperature'].shift(-1)
print(sensor_with_lags)
Output:
timestamp temperature temperature_lag_1 temperature_lag_2 temperature_lag_3 temperature_next
0 2024-01-01 00:00:00 20 NaN NaN NaN 21.0
1 2024-01-01 01:00:00 21 20.0 NaN NaN 22.0
2 2024-01-01 02:00:00 22 21.0 20.0 NaN 23.0
3 2024-01-01 03:00:00 23 22.0 21.0 20.0 24.0
4 2024-01-01 04:00:00 24 23.0 22.0 21.0 23.0
5 2024-01-01 05:00:00 23 24.0 23.0 22.0 22.0
6 2024-01-01 06:00:00 22 23.0 24.0 23.0 21.0
7 2024-01-01 07:00:00 21 22.0 23.0 24.0 20.0
8 2024-01-01 08:00:00 20 21.0 22.0 23.0 19.0
9 2024-01-01 09:00:00 19 20.0 21.0 22.0 NaN
Drop rows with NaN before training, and you have a clean feature matrix.
Common Pitfalls and Best Practices
Handle NaN values explicitly. After shifting, decide whether to drop, fill, or interpolate missing values based on your use case:
data = pd.Series([10, 20, 30, 40, 50])
shifted = data.shift(2)
# Option 1: Drop NaN rows
clean_dropped = shifted.dropna()
# Option 2: Forward fill
clean_ffill = shifted.ffill()
# Option 3: Fill with a specific value
clean_filled = shifted.fillna(0)
Watch for dtype changes. Shifting integer columns introduces NaN, which forces conversion to float. Use fill_value to preserve types when appropriate, or convert back after filling:
integers = pd.Series([1, 2, 3, 4, 5], dtype='int64')
shifted = integers.shift(1, fill_value=0) # Stays int64
Be mindful of memory with large datasets. Each shifted column is a full copy. For DataFrames with millions of rows and many lag features, memory can balloon quickly. Consider computing shifts on-demand or using more memory-efficient approaches for very large datasets.
Don’t confuse row shifts with time shifts. When your index is a DatetimeIndex, remember that shift(1) moves values by one row position, not one time unit. Use freq when you need calendar-aware shifting.
The shift() method is deceptively simple but forms the backbone of time series feature engineering. Master it, and you’ll find yourself reaching for it constantly in data analysis workflows.