How to Melt a DataFrame in Polars
Melting transforms your data from wide format to long format. If you have columns like `jan_sales`, `feb_sales`, `mar_sales`, melting pivots those column names into row values under a single 'month'...
Key Insights
- Polars uses
unpivot()instead ofmelt()(thoughmelt()still works as an alias), offering cleaner semantics and better performance than pandas for wide-to-long transformations. - The
indexparameter specifies columns to keep as identifiers, whileonspecifies columns to melt—if you omiton, Polars melts all columns not inindex. - Melting mixed-type columns requires explicit casting since Polars enforces strict typing, but this catches data issues early rather than silently corrupting your analysis.
Introduction
Melting transforms your data from wide format to long format. If you have columns like jan_sales, feb_sales, mar_sales, melting pivots those column names into row values under a single “month” column, with their corresponding values in another column.
This transformation is essential for visualization libraries that expect tidy data, time series analysis where you need dates as values rather than column headers, and any aggregation that needs to group by what were previously separate columns.
Polars handles this operation through unpivot(), and it does so significantly faster than pandas for large datasets. Let’s dig into how it works.
Understanding Wide vs. Long Data Formats
Wide data spreads observations across columns. You’ll see this format in spreadsheets, financial reports, and data exports where humans need to scan across rows. Long data stacks observations vertically, with categorical variables indicating what each value represents.
Here’s a typical wide DataFrame you might encounter:
import polars as pl
# Wide format: each month is a separate column
wide_df = pl.DataFrame({
"product": ["Widget A", "Widget B", "Widget C"],
"region": ["North", "South", "East"],
"jan_sales": [1500, 2300, 1800],
"feb_sales": [1650, 2100, 1950],
"mar_sales": [1400, 2500, 2100],
"apr_sales": [1800, 2200, 1750]
})
print(wide_df)
shape: (3, 6)
┌──────────┬────────┬───────────┬───────────┬───────────┬───────────┐
│ product ┆ region ┆ jan_sales ┆ feb_sales ┆ mar_sales ┆ apr_sales │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞══════════╪════════╪═══════════╪═══════════╪═══════════╪═══════════╡
│ Widget A ┆ North ┆ 1500 ┆ 1650 ┆ 1400 ┆ 1800 │
│ Widget B ┆ South ┆ 2300 ┆ 2100 ┆ 2500 ┆ 2200 │
│ Widget C ┆ East ┆ 1800 ┆ 1950 ┆ 2100 ┆ 1750 │
└──────────┴────────┴───────────┴───────────┴───────────┴───────────┘
Wide format works well for human readability and when you need quick comparisons across a row. But try to calculate average monthly sales across all products, or plot sales trends over time—suddenly you’re wrestling with column names instead of working with data values.
Long format solves this by making month a variable you can filter, group, and aggregate:
shape: (12, 4)
┌──────────┬────────┬───────────┬───────┐
│ product ┆ region ┆ month ┆ sales │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ i64 │
╞══════════╪════════╪═══════════╪═══════╡
│ Widget A ┆ North ┆ jan_sales ┆ 1500 │
│ Widget A ┆ North ┆ feb_sales ┆ 1650 │
│ ... ┆ ... ┆ ... ┆ ... │
└──────────┴────────┴───────────┴───────┘
Basic Melt Syntax in Polars
Polars provides unpivot() as the primary method for melting. The older melt() method still works but unpivot() is the preferred, more descriptive name.
The four key parameters are:
index: Columns to keep as row identifiers (not melted)on: Columns to melt into rows (if omitted, melts everything not inindex)variable_name: Name for the new column containing former column headersvalue_name: Name for the new column containing the values
# Basic melt: transform monthly columns into rows
long_df = wide_df.unpivot(
index=["product", "region"],
on=["jan_sales", "feb_sales", "mar_sales", "apr_sales"],
variable_name="month",
value_name="sales"
)
print(long_df)
shape: (12, 4)
┌──────────┬────────┬───────────┬───────┐
│ product ┆ region ┆ month ┆ sales │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ i64 │
╞══════════╪════════╪═══════════╪═══════╡
│ Widget A ┆ North ┆ jan_sales ┆ 1500 │
│ Widget B ┆ South ┆ jan_sales ┆ 2300 │
│ Widget C ┆ East ┆ jan_sales ┆ 1800 │
│ Widget A ┆ North ┆ feb_sales ┆ 1650 │
│ Widget B ┆ South ┆ feb_sales ┆ 2100 │
│ Widget C ┆ East ┆ feb_sales ┆ 1950 │
│ Widget A ┆ North ┆ mar_sales ┆ 1400 │
│ Widget B ┆ South ┆ mar_sales ┆ 2500 │
│ Widget C ┆ East ┆ mar_sales ┆ 2100 │
│ Widget A ┆ North ┆ apr_sales ┆ 1800 │
│ Widget B ┆ South ┆ apr_sales ┆ 2200 │
│ Widget C ┆ East ┆ apr_sales ┆ 1750 │
└──────────┴────────┴───────────┴───────┘
Now you can easily compute aggregations like average sales per month or filter to specific time periods.
Selective Melting with on and index
You don’t always want to melt every non-identifier column. The on parameter gives you precise control over which columns transform into rows.
Consider a DataFrame with both sales and inventory data:
mixed_df = pl.DataFrame({
"product": ["Widget A", "Widget B", "Widget C"],
"region": ["North", "South", "East"],
"jan_sales": [1500, 2300, 1800],
"feb_sales": [1650, 2100, 1950],
"jan_inventory": [500, 800, 600],
"feb_inventory": [450, 750, 550]
})
# Melt only sales columns, keeping inventory as-is
sales_long = mixed_df.select(
["product", "region", "jan_sales", "feb_sales"]
).unpivot(
index=["product", "region"],
variable_name="month",
value_name="sales"
)
print(sales_long)
When you omit the on parameter, Polars melts all columns not specified in index. This shorthand works well when you’ve already selected the columns you need:
# Equivalent: omit 'on' to melt everything except index columns
sales_subset = mixed_df.select(["product", "region", "jan_sales", "feb_sales"])
sales_long = sales_subset.unpivot(
index=["product", "region"],
variable_name="month",
value_name="sales"
)
For complex transformations, you might melt sales and inventory separately, then join the results:
# Melt sales
sales_long = mixed_df.unpivot(
index=["product", "region"],
on=["jan_sales", "feb_sales"],
variable_name="month",
value_name="sales"
).with_columns(
pl.col("month").str.replace("_sales", "")
)
# Melt inventory
inventory_long = mixed_df.unpivot(
index=["product", "region"],
on=["jan_inventory", "feb_inventory"],
variable_name="month",
value_name="inventory"
).with_columns(
pl.col("month").str.replace("_inventory", "")
)
# Join on common keys
combined = sales_long.join(
inventory_long,
on=["product", "region", "month"],
how="inner"
)
print(combined)
Handling Multiple Value Types
Polars enforces strict typing, which means melting columns of different types requires attention. If you try to melt an integer column alongside a string column, Polars will raise an error.
# This DataFrame has mixed types in columns we want to melt
problematic_df = pl.DataFrame({
"product": ["Widget A", "Widget B"],
"jan_sales": [1500, 2300],
"jan_notes": ["Good month", "Record sales"]
})
# This will fail: can't melt i64 and str into one column
# problematic_df.unpivot(index=["product"]) # SchemaError!
The solution is explicit casting before melting:
# Cast numeric columns to string for melting
melted_mixed = problematic_df.with_columns(
pl.col("jan_sales").cast(pl.Utf8)
).unpivot(
index=["product"],
variable_name="attribute",
value_name="value"
)
print(melted_mixed)
Alternatively, handle different types separately and stack the results:
# Separate melts for different types, then concatenate
numeric_cols = ["jan_sales"]
text_cols = ["jan_notes"]
numeric_melted = problematic_df.select(["product"] + numeric_cols).unpivot(
index=["product"],
variable_name="attribute",
value_name="numeric_value"
)
text_melted = problematic_df.select(["product"] + text_cols).unpivot(
index=["product"],
variable_name="attribute",
value_name="text_value"
)
This explicit handling is a feature, not a bug. Silent type coercion causes subtle data corruption that surfaces much later in your pipeline.
Polars vs. Pandas Melt Comparison
If you’re migrating from pandas, the syntax differences are minimal but the performance gains are substantial.
import pandas as pd
import polars as pl
# Pandas melt
pandas_df = pd.DataFrame({
"product": ["Widget A", "Widget B", "Widget C"],
"region": ["North", "South", "East"],
"jan_sales": [1500, 2300, 1800],
"feb_sales": [1650, 2100, 1950],
})
pandas_long = pandas_df.melt(
id_vars=["product", "region"],
value_vars=["jan_sales", "feb_sales"],
var_name="month",
value_name="sales"
)
# Polars unpivot (equivalent operation)
polars_df = pl.DataFrame({
"product": ["Widget A", "Widget B", "Widget C"],
"region": ["North", "South", "East"],
"jan_sales": [1500, 2300, 1800],
"feb_sales": [1650, 2100, 1950],
})
polars_long = polars_df.unpivot(
index=["product", "region"],
on=["jan_sales", "feb_sales"],
variable_name="month",
value_name="sales"
)
Key differences:
- Pandas uses
id_varsandvalue_vars; Polars usesindexandon - Pandas uses
var_name; Polars usesvariable_name - Polars runs parallel and uses Apache Arrow memory format, making it 5-10x faster on datasets with millions of rows
Practical Use Case: Preparing Data for Visualization
Visualization libraries like Plotly and Altair expect long-format data for grouped charts. Here’s a complete workflow:
import polars as pl
import plotly.express as px
# Start with wide quarterly sales data
quarterly_sales = pl.DataFrame({
"product": ["Widget A", "Widget B", "Widget C", "Widget D"],
"Q1_2024": [45000, 62000, 38000, 51000],
"Q2_2024": [48000, 58000, 42000, 55000],
"Q3_2024": [52000, 61000, 45000, 49000],
"Q4_2024": [58000, 67000, 48000, 62000]
})
# Melt to long format
sales_long = quarterly_sales.unpivot(
index=["product"],
variable_name="quarter",
value_name="revenue"
).with_columns(
# Clean up quarter names for display
pl.col("quarter").str.replace("_", " ")
)
# Convert to pandas for Plotly (Plotly doesn't natively support Polars yet)
sales_pandas = sales_long.to_pandas()
# Create grouped bar chart
fig = px.bar(
sales_pandas,
x="quarter",
y="revenue",
color="product",
barmode="group",
title="Quarterly Revenue by Product",
labels={"revenue": "Revenue ($)", "quarter": "Quarter"}
)
fig.show()
The melted format lets Plotly automatically group bars by product and position them along the quarter axis. Try doing that with wide data—you’d need manual subplot configuration and lose the automatic legend generation.
For Altair, the pattern is identical:
import altair as alt
chart = alt.Chart(sales_long.to_pandas()).mark_bar().encode(
x="quarter:N",
y="revenue:Q",
color="product:N",
xOffset="product:N"
).properties(width=600, height=400)
Melting isn’t just a data transformation—it’s the bridge between how data is stored and how visualization tools expect to consume it. Master unpivot() and you’ll spend less time wrestling with data shapes and more time building insights.