How to Pivot a DataFrame in Polars

Pivoting transforms your data from long format to wide format—rows become columns. It's one of those operations you'll reach for constantly when preparing data for reports, visualizations, or...

Key Insights

  • Polars’ pivot() function transforms long-format data into wide-format by turning unique values from one column into separate columns, requiring you to specify index, columns, and values parameters
  • When duplicate combinations exist in your data, you must provide an aggregate_function parameter (sum, mean, first, etc.) or Polars will raise an error—this explicit behavior prevents silent data loss
  • The unpivot() function reverses pivot operations, converting wide-format data back to long-format, which is essential for data pipeline flexibility and analysis workflows

Introduction

Pivoting transforms your data from long format to wide format—rows become columns. It’s one of those operations you’ll reach for constantly when preparing data for reports, visualizations, or specific analytical workflows.

Polars handles pivots with impressive speed, often 5-10x faster than pandas on larger datasets. The syntax is also more explicit, which means fewer surprises when your data has quirks. This article covers everything you need to pivot effectively in Polars, from basic operations to handling edge cases that trip up most developers.

Understanding the Pivot Operation

A pivot operation requires three components:

  1. Index columns: The columns that stay as rows (your row identifiers)
  2. Pivot column: The column whose unique values become new column headers
  3. Values column: The column whose values fill the new columns

Think of it as rotating your data. If you have sales records with one row per transaction, pivoting on the product column creates a new column for each product, with values spread horizontally.

Let’s create a sample dataset to work with throughout this article:

import polars as pl

# Sales data in long format
df = pl.DataFrame({
    "date": ["2024-01-01", "2024-01-01", "2024-01-01", 
             "2024-01-02", "2024-01-02", "2024-01-02"],
    "product": ["Widget", "Gadget", "Gizmo", 
                "Widget", "Gadget", "Gizmo"],
    "region": ["North", "North", "North", 
               "South", "South", "South"],
    "revenue": [100, 150, 200, 120, 180, 220],
    "quantity": [10, 15, 20, 12, 18, 22]
})

print(df)

Output:

shape: (6, 5)
┌────────────┬─────────┬────────┬─────────┬──────────┐
│ date       ┆ product ┆ region ┆ revenue ┆ quantity │
│ ---        ┆ ---     ┆ ---    ┆ ---     ┆ ---      │
│ str        ┆ str     ┆ str    ┆ i64     ┆ i64      │
╞════════════╪═════════╪════════╪═════════╪══════════╡
│ 2024-01-01 ┆ Widget  ┆ North  ┆ 100     ┆ 10       │
│ 2024-01-01 ┆ Gadget  ┆ North  ┆ 150     ┆ 15       │
│ 2024-01-01 ┆ Gizmo   ┆ North  ┆ 200     ┆ 20       │
│ 2024-01-02 ┆ Widget  ┆ South  ┆ 120     ┆ 12       │
│ 2024-01-02 ┆ Gadget  ┆ South  ┆ 180     ┆ 18       │
│ 2024-01-02 ┆ Gizmo   ┆ South  ┆ 220     ┆ 22       │
└────────────┴─────────┴────────┴─────────┴──────────┘

Basic Pivot with df.pivot()

The pivot() method takes three required-ish parameters: on (the column to pivot), index (row identifiers), and values (what fills the cells).

# Pivot products into columns, with revenue as values
pivoted = df.pivot(
    on="product",
    index="date",
    values="revenue"
)

print(pivoted)

Output:

shape: (2, 4)
┌────────────┬────────┬────────┬───────┐
│ date       ┆ Widget ┆ Gadget ┆ Gizmo │
│ ---        ┆ ---    ┆ ---    ┆ ---   │
│ str        ┆ i64    ┆ i64    ┆ i64   │
╞════════════╪════════╪════════╪═══════╡
│ 2024-01-01 ┆ 100    ┆ 150    ┆ 200   │
│ 2024-01-02 ┆ 120    ┆ 180    ┆ 220   │
└────────────┴────────┴────────┴───────┘

Each unique value in the product column became its own column. The date column serves as the index, and revenue values populate the grid.

You can also use multiple index columns:

# Pivot with multiple index columns
pivoted_multi = df.pivot(
    on="product",
    index=["date", "region"],
    values="revenue"
)

print(pivoted_multi)

Output:

shape: (2, 5)
┌────────────┬────────┬────────┬────────┬───────┐
│ date       ┆ region ┆ Widget ┆ Gadget ┆ Gizmo │
│ ---        ┆ ---    ┆ ---    ┆ ---    ┆ ---   │
│ str        ┆ str    ┆ i64    ┆ i64    ┆ i64   │
╞════════════╪════════╪════════╪════════╪═══════╡
│ 2024-01-01 ┆ North  ┆ 100    ┆ 150    ┆ 200   │
│ 2024-01-02 ┆ South  ┆ 120    ┆ 180    ┆ 220   │
└────────────┴────────┴────────┴────────┴───────┘

Handling Aggregations in Pivots

Real-world data often has multiple values for the same index/column combination. Let’s create a dataset with duplicates:

# Data with multiple sales per product/date
df_duplicates = pl.DataFrame({
    "date": ["2024-01-01", "2024-01-01", "2024-01-01", 
             "2024-01-01", "2024-01-02", "2024-01-02"],
    "product": ["Widget", "Widget", "Gadget", 
                "Gadget", "Widget", "Gadget"],
    "revenue": [100, 150, 200, 50, 120, 180]
})

print(df_duplicates)

If you try to pivot this without an aggregation function, Polars raises an error:

# This will raise an error!
# df_duplicates.pivot(on="product", index="date", values="revenue")
# ComputeError: found multiple elements in the same group...

The fix is to specify an aggregate_function:

# Pivot with sum aggregation
pivoted_sum = df_duplicates.pivot(
    on="product",
    index="date",
    values="revenue",
    aggregate_function="sum"
)

print(pivoted_sum)

Output:

shape: (2, 3)
┌────────────┬────────┬────────┐
│ date       ┆ Widget ┆ Gadget │
│ ---        ┆ ---    ┆ ---    │
│ str        ┆ i64    ┆ i64    │
╞════════════╪════════╪════════╡
│ 2024-01-01 ┆ 250    ┆ 250    │
│ 2024-01-02 ┆ 120    ┆ 180    │
└────────────┴────────┴────────┘

Available aggregation functions include:

  • "sum", "mean", "median" for numeric aggregations
  • "min", "max" for extremes
  • "first", "last" for taking single values
  • "count" for counting occurrences
  • "len" for counting including nulls
# Different aggregations
pivoted_mean = df_duplicates.pivot(
    on="product",
    index="date", 
    values="revenue",
    aggregate_function="mean"
)

pivoted_count = df_duplicates.pivot(
    on="product",
    index="date",
    values="revenue", 
    aggregate_function="count"
)

print("Mean:\n", pivoted_mean)
print("\nCount:\n", pivoted_count)

Pivoting Multiple Value Columns

Sometimes you need to pivot multiple metrics simultaneously. Polars handles this by passing a list to the values parameter:

# Original data with multiple metrics
df_metrics = pl.DataFrame({
    "date": ["2024-01-01", "2024-01-01", "2024-01-02", "2024-01-02"],
    "product": ["Widget", "Gadget", "Widget", "Gadget"],
    "revenue": [100, 150, 120, 180],
    "quantity": [10, 15, 12, 18],
    "returns": [1, 2, 0, 3]
})

# Pivot multiple value columns
pivoted_multi_values = df_metrics.pivot(
    on="product",
    index="date",
    values=["revenue", "quantity"]
)

print(pivoted_multi_values)

Output:

shape: (2, 5)
┌────────────┬────────────────┬────────────────┬─────────────────┬─────────────────┐
│ date       ┆ revenue_Widget ┆ revenue_Gadget ┆ quantity_Widget ┆ quantity_Gadget │
│ ---        ┆ ---            ┆ ---            ┆ ---             ┆ ---             │
│ str        ┆ i64            ┆ i64            ┆ i64             ┆ i64             │
╞════════════╪════════════════╪════════════════╪═════════════════╪═════════════════╡
│ 2024-01-01 ┆ 100            ┆ 150            ┆ 10              ┆ 15              │
│ 2024-01-02 ┆ 120            ┆ 180            ┆ 12              ┆ 18              │
└────────────┴────────────────┴────────────────┴─────────────────┴─────────────────┘

Polars automatically creates column names by combining the value column name with the pivot column value. This naming convention keeps things unambiguous when you’re working with the resulting DataFrame.

Unpivoting (Melt) - The Reverse Operation

The inverse of pivoting is unpivoting, also known as melting. This transforms wide-format data back to long-format. Polars uses the unpivot() method for this:

# Start with wide data
wide_df = pl.DataFrame({
    "date": ["2024-01-01", "2024-01-02"],
    "Widget": [100, 120],
    "Gadget": [150, 180],
    "Gizmo": [200, 220]
})

print("Wide format:\n", wide_df)

# Unpivot to long format
long_df = wide_df.unpivot(
    index="date",
    on=["Widget", "Gadget", "Gizmo"],
    variable_name="product",
    value_name="revenue"
)

print("\nLong format:\n", long_df)

Output:

Wide format:
shape: (2, 4)
┌────────────┬────────┬────────┬───────┐
│ date       ┆ Widget ┆ Gadget ┆ Gizmo │
...

Long format:
shape: (6, 3)
┌────────────┬─────────┬─────────┐
│ date       ┆ product ┆ revenue │
│ ---        ┆ ---     ┆ ---     │
│ str        ┆ str     ┆ i64     │
╞════════════╪═════════╪═════════╡
│ 2024-01-01 ┆ Widget  ┆ 100     │
│ 2024-01-02 ┆ Widget  ┆ 120     │
│ 2024-01-01 ┆ Gadget  ┆ 150     │
│ 2024-01-02 ┆ Gadget  ┆ 180     │
│ 2024-01-01 ┆ Gizmo   ┆ 200     │
│ 2024-01-02 ┆ Gizmo   ┆ 220     │
└────────────┴─────────┴─────────┘

The unpivot() parameters mirror pivot’s logic:

  • index: Columns to keep as identifiers
  • on: Columns to unpivot (if omitted, all non-index columns are unpivoted)
  • variable_name: Name for the new column containing original column names
  • value_name: Name for the new column containing values

Performance Tips and Common Pitfalls

Sort Before Pivoting for Consistent Column Order

Polars doesn’t guarantee column order in pivot results. If you need consistent ordering (for reports or downstream processing), sort first:

# Ensure consistent column order
df_sorted = df.sort("product")
pivoted_ordered = df_sorted.pivot(
    on="product",
    index="date",
    values="revenue"
)

# Or reorder columns after pivoting
pivoted = df.pivot(on="product", index="date", values="revenue")
ordered_cols = ["date"] + sorted([c for c in pivoted.columns if c != "date"])
pivoted_final = pivoted.select(ordered_cols)

Watch Memory with High-Cardinality Pivot Columns

Pivoting on a column with thousands of unique values creates thousands of columns. This explodes memory usage and usually indicates a design problem:

# Don't do this - creates 10,000 columns!
# df.pivot(on="user_id", index="date", values="revenue")

# Instead, aggregate first or reconsider your approach

Handle Null Values Explicitly

Pivots create null values where combinations don’t exist. Handle them based on your use case:

pivoted = df.pivot(on="product", index="date", values="revenue")

# Fill nulls with zero (common for numeric data)
pivoted_filled = pivoted.fill_null(0)

# Or fill with a specific strategy
pivoted_forward = pivoted.fill_null(strategy="forward")

Polars vs Pandas Performance

In benchmarks with datasets over 1 million rows, Polars pivot typically runs 3-10x faster than pandas pivot_table(). The gap widens with larger datasets due to Polars’ lazy evaluation and parallel execution. For small datasets under 10,000 rows, the difference is negligible—use whichever API you prefer.

# Polars lazy mode for large datasets
result = (
    pl.scan_csv("large_sales.csv")
    .collect()
    .pivot(on="product", index="date", values="revenue", aggregate_function="sum")
)

Pivoting is fundamental to data reshaping. Polars makes it fast and explicit—you always know what’s happening to your data. Start with simple pivots, add aggregations when needed, and use unpivot when you need to reverse the transformation.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.