How to Shift Values in Pandas

Shifting values is one of the most fundamental operations in time series analysis and data manipulation. The pandas `shift()` method moves data up or down along an axis, creating offset versions of...

Key Insights

  • The shift() method moves data up or down by a specified number of periods, creating lagged or leading values essential for time series analysis and feature engineering.
  • Use positive periods to shift data down (creating lag features) and negative periods to shift up (creating lead features), with fill_value to control how gaps are filled.
  • Time-based shifting with the freq parameter operates on the index rather than row positions, making it invaluable for irregular time series data.

Introduction to the shift() Method

Shifting values is one of the most fundamental operations in time series analysis and data manipulation. The pandas shift() method moves data up or down along an axis, creating offset versions of your data that enable comparisons between current and previous (or future) values.

You’ll reach for shift() constantly when:

  • Calculating period-over-period changes (daily returns, monthly growth)
  • Creating lagged features for machine learning models
  • Computing differences between consecutive rows
  • Building rolling calculations that need access to previous values

The basic syntax is straightforward:

import pandas as pd
import numpy as np

# Create a simple Series
sales = pd.Series([100, 150, 130, 180, 200], 
                  index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri'],
                  name='sales')

print("Original:")
print(sales)

print("\nShifted by 1:")
print(sales.shift(1))

Output:

Original:
Mon    100
Tue    150
Wed    130
Thu    180
Fri    200
Name: sales, dtype: int64

Shifted by 1:
Mon      NaN
Tue    100.0
Wed    150.0
Thu    130.0
Fri    180.0
Name: sales, dtype: float64

Notice how each value moved down one position, and NaN filled the gap at the top. This is the core behavior you’ll build upon.

Shifting Rows Up and Down

The periods parameter controls direction and magnitude. Positive values shift data downward (backward in time), while negative values shift upward (forward in time).

df = pd.DataFrame({
    'date': pd.date_range('2024-01-01', periods=5),
    'revenue': [1000, 1200, 1100, 1400, 1300],
    'units': [50, 60, 55, 70, 65]
})

print("Original DataFrame:")
print(df)

# Shift down by 1 (lag)
print("\nShifted down by 1 (previous day's values):")
print(df[['revenue', 'units']].shift(1))

# Shift up by 1 (lead)
print("\nShifted up by 1 (next day's values):")
print(df[['revenue', 'units']].shift(-1))

# Shift by multiple periods
print("\nShifted down by 2 (values from 2 days ago):")
print(df[['revenue', 'units']].shift(2))

Output:

Original DataFrame:
        date  revenue  units
0 2024-01-01     1000     50
1 2024-01-02     1200     60
2 2024-01-03     1100     55
3 2024-01-04     1400     70
4 2024-01-05     1300     65

Shifted down by 1 (previous day's values):
   revenue  units
0      NaN    NaN
1   1000.0   50.0
2   1200.0   60.0
3   1100.0   55.0
4   1400.0   70.0

Shifted up by 1 (next day's values):
   revenue  units
0   1200.0   60.0
1   1100.0   55.0
2   1400.0   70.0
3   1300.0   65.0
4      NaN    NaN

Shifted down by 2 (values from 2 days ago):
   revenue  units
0      NaN    NaN
1      NaN    NaN
2   1000.0   50.0
3   1200.0   60.0
4   1100.0   55.0

Think of positive shifts as “give me what happened N periods ago” and negative shifts as “give me what will happen N periods from now.”

Shifting with Different Fill Values

The default NaN fill isn’t always what you want. The fill_value parameter lets you specify a replacement:

inventory = pd.Series([100, 80, 120, 90, 110], name='stock')

# Default NaN fill
print("Shift with NaN (default):")
print(inventory.shift(1))

# Fill with 0
print("\nShift with fill_value=0:")
print(inventory.shift(1, fill_value=0))

# Fill with the mean
print("\nShift with fill_value=mean:")
print(inventory.shift(1, fill_value=inventory.mean()))

Output:

Shift with NaN (default):
0      NaN
1    100.0
2     80.0
3    120.0
4     90.0
Name: stock, dtype: float64

Shift with fill_value=0:
0      0
1    100
2     80
3    120
4     90
Name: stock, dtype: int64

Shift with fill_value=mean:
0    100.0
1    100.0
2     80.0
3    120.0
4     90.0
Name: stock, dtype: float64

Notice that using fill_value preserves the original integer dtype when filling with an integer. This matters for memory efficiency and downstream operations that expect specific types.

Shifting Along Columns (Axis Parameter)

By default, shift() operates on rows (axis=0). Set axis=1 to shift horizontally across columns:

quarterly_data = pd.DataFrame({
    'Q1': [100, 200, 150],
    'Q2': [120, 180, 160],
    'Q3': [140, 220, 170],
    'Q4': [130, 240, 180]
}, index=['Product A', 'Product B', 'Product C'])

print("Original quarterly data:")
print(quarterly_data)

# Shift columns to the right
print("\nShifted right by 1 (previous quarter's values in each column):")
print(quarterly_data.shift(1, axis=1))

# Shift columns to the left
print("\nShifted left by 1 (next quarter's values in each column):")
print(quarterly_data.shift(-1, axis=1))

Output:

Original quarterly data:
            Q1   Q2   Q3   Q4
Product A  100  120  140  130
Product B  200  180  220  240
Product C  150  160  170  180

Shifted right by 1 (previous quarter's values in each column):
            Q1     Q2     Q3     Q4
Product A  NaN  100.0  120.0  140.0
Product B  NaN  200.0  180.0  220.0
Product C  NaN  150.0  160.0  170.0

Shifted left by 1 (next quarter's values in each column):
            Q1     Q2     Q3   Q4
Product A  120  140.0  130.0  NaN
Product B  180  220.0  240.0  NaN
Product C  160  170.0  180.0  NaN

Column shifting is useful for comparing adjacent time periods when your data is structured with periods as columns rather than rows.

Time-Based Shifting with freq Parameter

When working with DatetimeIndex data, the freq parameter shifts the index itself rather than the data positions. This distinction is critical for irregular time series:

# Create time series with DatetimeIndex
ts = pd.Series(
    [100, 150, 130, 180],
    index=pd.to_datetime(['2024-01-01', '2024-01-03', '2024-01-05', '2024-01-08']),
    name='value'
)

print("Original (irregular time series):")
print(ts)

# Row-based shift (default) - shifts values, not dates
print("\nRow-based shift(1) - values move, index stays:")
print(ts.shift(1))

# Time-based shift - shifts the index
print("\nTime-based shift(1, freq='D') - index moves by 1 day:")
print(ts.shift(1, freq='D'))

# Shift by different frequencies
print("\nShift by 2 days:")
print(ts.shift(2, freq='D'))

print("\nShift by 1 week:")
print(ts.shift(1, freq='W'))

Output:

Original (irregular time series):
2024-01-01    100
2024-01-03    150
2024-01-05    130
2024-01-08    180
Name: value, dtype: int64

Row-based shift(1) - values move, index stays:
2024-01-01      NaN
2024-01-03    100.0
2024-01-05    150.0
2024-01-08    130.0
Name: value, dtype: float64

Time-based shift(1, freq='D') - index moves by 1 day:
2024-01-02    100
2024-01-04    150
2024-01-06    130
2024-01-09    180
Name: value, dtype: int64

Shift by 2 days:
2024-01-03    100
2024-01-05    150
2024-01-07    130
2024-01-10    180
Name: value, dtype: int64

Shift by 1 week:
2024-01-08    100
2024-01-10    150
2024-01-12    130
2024-01-15    180
Name: value, dtype: int64

With freq, the data stays intact while timestamps move. No NaN values appear because nothing is being displaced—just relabeled.

Practical Applications

Here’s where shift() proves its worth in real analysis work.

Calculating Period-Over-Period Changes

stock_prices = pd.DataFrame({
    'date': pd.date_range('2024-01-01', periods=6),
    'price': [100.0, 102.5, 101.0, 105.0, 103.5, 108.0]
})

# Daily price change (absolute)
stock_prices['daily_change'] = stock_prices['price'] - stock_prices['price'].shift(1)

# Daily return (percentage)
stock_prices['daily_return'] = (
    (stock_prices['price'] - stock_prices['price'].shift(1)) / 
    stock_prices['price'].shift(1) * 100
)

# Equivalent using pct_change() - but shift() gives you more control
stock_prices['pct_change_method'] = stock_prices['price'].pct_change() * 100

print(stock_prices.round(2))

Output:

        date  price  daily_change  daily_return  pct_change_method
0 2024-01-01  100.0           NaN           NaN                NaN
1 2024-01-02  102.5           2.5          2.50               2.50
2 2024-01-03  101.0          -1.5         -1.46              -1.46
3 2024-01-04  105.0           4.0          3.96               3.96
4 2024-01-05  103.5          -1.5         -1.43              -1.43
5 2024-01-06  108.0           4.5          4.35               4.35

Creating Lagged Features for Machine Learning

def create_lag_features(df, column, lags):
    """Create multiple lag features for ML models."""
    result = df.copy()
    for lag in lags:
        result[f'{column}_lag_{lag}'] = result[column].shift(lag)
    return result

sensor_data = pd.DataFrame({
    'timestamp': pd.date_range('2024-01-01', periods=10, freq='h'),
    'temperature': [20, 21, 22, 23, 24, 23, 22, 21, 20, 19]
})

# Create lag features for prediction
sensor_with_lags = create_lag_features(sensor_data, 'temperature', [1, 2, 3])

# Add a lead feature (what we're predicting)
sensor_with_lags['temperature_next'] = sensor_with_lags['temperature'].shift(-1)

print(sensor_with_lags)

Output:

            timestamp  temperature  temperature_lag_1  temperature_lag_2  temperature_lag_3  temperature_next
0 2024-01-01 00:00:00           20                NaN                NaN                NaN              21.0
1 2024-01-01 01:00:00           21               20.0                NaN                NaN              22.0
2 2024-01-01 02:00:00           22               21.0               20.0                NaN              23.0
3 2024-01-01 03:00:00           23               22.0               21.0               20.0              24.0
4 2024-01-01 04:00:00           24               23.0               22.0               21.0              23.0
5 2024-01-01 05:00:00           23               24.0               23.0               22.0              22.0
6 2024-01-01 06:00:00           22               23.0               24.0               23.0              21.0
7 2024-01-01 07:00:00           21               22.0               23.0               24.0              20.0
8 2024-01-01 08:00:00           20               21.0               22.0               23.0              19.0
9 2024-01-01 09:00:00           19               20.0               21.0               22.0               NaN

Drop rows with NaN before training, and you have a clean feature matrix.

Common Pitfalls and Best Practices

Handle NaN values explicitly. After shifting, decide whether to drop, fill, or interpolate missing values based on your use case:

data = pd.Series([10, 20, 30, 40, 50])
shifted = data.shift(2)

# Option 1: Drop NaN rows
clean_dropped = shifted.dropna()

# Option 2: Forward fill
clean_ffill = shifted.ffill()

# Option 3: Fill with a specific value
clean_filled = shifted.fillna(0)

Watch for dtype changes. Shifting integer columns introduces NaN, which forces conversion to float. Use fill_value to preserve types when appropriate, or convert back after filling:

integers = pd.Series([1, 2, 3, 4, 5], dtype='int64')
shifted = integers.shift(1, fill_value=0)  # Stays int64

Be mindful of memory with large datasets. Each shifted column is a full copy. For DataFrames with millions of rows and many lag features, memory can balloon quickly. Consider computing shifts on-demand or using more memory-efficient approaches for very large datasets.

Don’t confuse row shifts with time shifts. When your index is a DatetimeIndex, remember that shift(1) moves values by one row position, not one time unit. Use freq when you need calendar-aware shifting.

The shift() method is deceptively simple but forms the backbone of time series feature engineering. Master it, and you’ll find yourself reaching for it constantly in data analysis workflows.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.