How to Use Shift in Polars
Shift operations move data vertically within a column by a specified number of positions. Shift down (positive values), and you get lagged data—what the value was n periods ago. Shift up (negative...
Key Insights
- Polars’
shift()method moves data up or down by n positions, creating null values at boundaries—usefill_valueto replace these nulls with meaningful defaults - Combining
shift()withgroup_by()enables powerful partitioned lag calculations, essential for time series analysis across categories like products, customers, or regions - Polars executes shift operations significantly faster than pandas due to its Rust backend and lazy evaluation, making it the better choice for large-scale data transformations
Introduction to Shift Operations
Shift operations move data vertically within a column by a specified number of positions. Shift down (positive values), and you get lagged data—what the value was n periods ago. Shift up (negative values), and you get lead data—what the value will be n periods ahead.
This sounds simple, but it unlocks critical analytical patterns. Time series analysis relies heavily on shift operations to compare current values against historical ones. Feature engineering for machine learning models uses lagged variables constantly. Financial analysis calculates returns, moving averages, and momentum indicators through shift-based computations.
Polars implements shift operations with excellent performance characteristics. Unlike pandas, which processes data row by row in many operations, Polars leverages Rust’s memory efficiency and parallel processing. For datasets with millions of rows, this difference becomes substantial.
Basic Shift Syntax
The shift() method accepts an integer argument specifying how many positions to move the data. Positive values shift data downward (creating lag), while negative values shift upward (creating lead).
import polars as pl
# Create a simple Series
s = pl.Series("values", [10, 20, 30, 40, 50])
# Shift down by 1 (lag)
lagged = s.shift(1)
print("Original:", s.to_list())
print("Shift(1):", lagged.to_list())
# Shift up by 1 (lead)
lead = s.shift(-1)
print("Shift(-1):", lead.to_list())
Output:
Original: [10, 20, 30, 40, 50]
Shift(1): [None, 10, 20, 30, 40]
Shift(-1): [20, 30, 40, 50, None]
Notice what happens at the boundaries. When you shift down by 1, the first position has no previous value to pull from—it becomes null. When you shift up by 1, the last position has no next value—also null. The number of nulls always equals the absolute value of your shift amount.
You can shift by any integer amount:
# Shift by multiple positions
s.shift(3) # [None, None, None, 10, 20]
s.shift(-2) # [30, 40, 50, None, None]
Handling Null Values with fill_value
Those boundary nulls often cause problems downstream. Aggregations may behave unexpectedly, and some algorithms choke on missing data. The fill_value parameter lets you specify a replacement value.
import polars as pl
s = pl.Series("values", [100, 200, 300, 400, 500])
# Fill nulls with zero
shifted_zero = s.shift(2, fill_value=0)
print("Shift(2, fill_value=0):", shifted_zero.to_list())
# Fill with the first actual value (forward fill simulation)
shifted_first = s.shift(1, fill_value=100)
print("Shift(1, fill_value=100):", shifted_first.to_list())
# Negative shift with fill
lead_filled = s.shift(-1, fill_value=500)
print("Shift(-1, fill_value=500):", lead_filled.to_list())
Output:
Shift(2, fill_value=0): [0, 0, 100, 200, 300]
Shift(1, fill_value=100): [100, 100, 200, 300, 400]
Shift(-1, fill_value=500): [200, 300, 400, 500, 500]
Choose your fill value based on context. For financial calculations, zero might introduce bias—consider using the first available value instead. For boolean flags, False often makes sense as a default. For counts, zero is usually appropriate.
Shift in DataFrame Context
Real-world analysis happens in DataFrames, not isolated Series. Use with_columns() to add shifted columns alongside your original data.
import polars as pl
df = pl.DataFrame({
"date": ["2024-01-01", "2024-01-02", "2024-01-03", "2024-01-04", "2024-01-05"],
"sales": [150, 200, 180, 220, 195]
})
# Add lagged column
df_with_lag = df.with_columns(
pl.col("sales").shift(1).alias("previous_day_sales")
)
print(df_with_lag)
Output:
shape: (5, 3)
┌────────────┬───────┬────────────────────┐
│ date ┆ sales ┆ previous_day_sales │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞════════════╪═══════╪════════════════════╡
│ 2024-01-01 ┆ 150 ┆ null │
│ 2024-01-02 ┆ 200 ┆ 150 │
│ 2024-01-03 ┆ 180 ┆ 200 │
│ 2024-01-04 ┆ 220 ┆ 180 │
│ 2024-01-05 ┆ 195 ┆ 220 │
└────────────┴───────┴────────────────────┘
You can create multiple shifted columns in a single with_columns() call:
df_multi_lag = df.with_columns(
pl.col("sales").shift(1).alias("lag_1"),
pl.col("sales").shift(2).alias("lag_2"),
pl.col("sales").shift(-1).alias("lead_1")
)
Practical Use Case: Calculating Period-over-Period Changes
The most common shift application calculates how values change between periods. This pattern appears everywhere: daily stock returns, month-over-month revenue growth, week-over-week user engagement.
import polars as pl
# Stock price data
stock_df = pl.DataFrame({
"date": ["2024-01-01", "2024-01-02", "2024-01-03", "2024-01-04", "2024-01-05"],
"ticker": ["AAPL"] * 5,
"close_price": [185.50, 187.25, 186.00, 189.75, 191.20]
})
# Calculate daily change and percent change
analysis_df = stock_df.with_columns(
pl.col("close_price").shift(1).alias("previous_close")
).with_columns(
(pl.col("close_price") - pl.col("previous_close")).alias("daily_change"),
((pl.col("close_price") - pl.col("previous_close")) / pl.col("previous_close") * 100)
.round(2)
.alias("pct_change")
)
print(analysis_df)
Output:
shape: (5, 6)
┌────────────┬────────┬─────────────┬────────────────┬──────────────┬────────────┐
│ date ┆ ticker ┆ close_price ┆ previous_close ┆ daily_change ┆ pct_change │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞════════════╪════════╪═════════════╪════════════════╪══════════════╪════════════╡
│ 2024-01-01 ┆ AAPL ┆ 185.5 ┆ null ┆ null ┆ null │
│ 2024-01-02 ┆ AAPL ┆ 187.25 ┆ 185.5 ┆ 1.75 ┆ 0.94 │
│ 2024-01-03 ┆ AAPL ┆ 186.0 ┆ 187.25 ┆ -1.25 ┆ -0.67 │
│ 2024-01-04 ┆ AAPL ┆ 189.75 ┆ 186.0 ┆ 3.75 ┆ 2.02 │
│ 2024-01-05 ┆ AAPL ┆ 191.2 ┆ 189.75 ┆ 1.45 ┆ 0.76 │
└────────────┴────────┴─────────────┴────────────────┴──────────────┴────────────┘
Polars also provides a built-in pct_change() method that handles this calculation directly, but understanding the shift-based approach gives you flexibility for custom calculations.
Shift with Group By Operations
When your data contains multiple entities—products, customers, regions—you need shift operations that respect group boundaries. Shifting globally would incorrectly pull values from one group into another.
import polars as pl
# Sales data for multiple products
sales_df = pl.DataFrame({
"date": ["2024-01-01", "2024-01-02", "2024-01-03"] * 3,
"product": ["Widget"] * 3 + ["Gadget"] * 3 + ["Gizmo"] * 3,
"units_sold": [100, 120, 115, 50, 55, 48, 200, 210, 225]
})
# Shift within each product group
grouped_shift = sales_df.with_columns(
pl.col("units_sold")
.shift(1)
.over("product")
.alias("previous_day_units")
)
print(grouped_shift.sort(["product", "date"]))
Output:
shape: (9, 4)
┌────────────┬─────────┬────────────┬────────────────────┐
│ date ┆ product ┆ units_sold ┆ previous_day_units │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ i64 │
╞════════════╪═════════╪════════════╪════════════════════╡
│ 2024-01-01 ┆ Gadget ┆ 50 ┆ null │
│ 2024-01-02 ┆ Gadget ┆ 55 ┆ 50 │
│ 2024-01-03 ┆ Gadget ┆ 48 ┆ 55 │
│ 2024-01-01 ┆ Gizmo ┆ 200 ┆ null │
│ 2024-01-02 ┆ Gizmo ┆ 210 ┆ 200 │
│ 2024-01-03 ┆ Gizmo ┆ 225 ┆ 210 │
│ 2024-01-01 ┆ Widget ┆ 100 ┆ null │
│ 2024-01-02 ┆ Widget ┆ 120 ┆ 100 │
│ 2024-01-03 ┆ Widget ┆ 115 ┆ 120 │
└────────────┴─────────┴────────────┴────────────────────┘
The over() method partitions the shift operation by the specified column(s). Each product’s first date correctly shows null for the previous day, rather than pulling from a different product.
You can combine this with calculations for grouped period-over-period analysis:
growth_analysis = sales_df.with_columns(
pl.col("units_sold").shift(1).over("product").alias("prev_units")
).with_columns(
((pl.col("units_sold") - pl.col("prev_units")) / pl.col("prev_units") * 100)
.round(1)
.over("product")
.alias("growth_pct")
)
Performance Tips and Comparison with Pandas
Polars outperforms pandas on shift operations, especially at scale. The Rust backend processes data in parallel and manages memory more efficiently. For DataFrames with millions of rows, expect 5-10x speed improvements.
Here’s how the syntax compares:
import polars as pl
import pandas as pd
# Pandas approach
pdf = pd.DataFrame({
"date": ["2024-01-01", "2024-01-02", "2024-01-03"],
"value": [100, 200, 300]
})
# Basic shift
pdf["lagged"] = pdf["value"].shift(1)
# Grouped shift
pdf["group"] = ["A", "A", "B"]
pdf["grouped_lag"] = pdf.groupby("group")["value"].shift(1)
# Polars approach
pldf = pl.DataFrame({
"date": ["2024-01-01", "2024-01-02", "2024-01-03"],
"value": [100, 200, 300],
"group": ["A", "A", "B"]
})
# Basic shift
pldf = pldf.with_columns(pl.col("value").shift(1).alias("lagged"))
# Grouped shift
pldf = pldf.with_columns(pl.col("value").shift(1).over("group").alias("grouped_lag"))
Key syntax differences: Polars uses with_columns() and alias() rather than direct column assignment. Grouped operations use over() instead of groupby().transform(). The Polars approach chains more naturally and expresses intent more clearly.
For maximum performance with large datasets, use lazy evaluation:
# Lazy evaluation for large datasets
result = (
pl.scan_parquet("large_dataset.parquet")
.with_columns(
pl.col("value").shift(1).over("category").alias("lagged_value")
)
.filter(pl.col("lagged_value").is_not_null())
.collect()
)
The lazy API builds a query plan and optimizes execution before processing any data. This eliminates intermediate allocations and enables predicate pushdown, making complex shift-based analyses dramatically faster.
Shift operations form the backbone of temporal analysis in Polars. Master them, and you unlock efficient time series processing, feature engineering, and period-over-period comparisons at any scale.