How to Unpivot a DataFrame with Melt in Pandas

Data rarely arrives in the format you need. Wide-format data—where each column represents a different observation—is common in spreadsheets and exports, but most analysis tools expect long-format...

Key Insights

  • The pd.melt() function transforms wide-format DataFrames into long-format by unpivoting columns into rows, making data suitable for visualization libraries and machine learning pipelines.
  • Master the four key parameters—id_vars, value_vars, var_name, and value_name—to control exactly which columns get melted and how the output is labeled.
  • Choose melt() over stack() when working with non-indexed data or when you need explicit control over column selection; use stack() for hierarchical index manipulation.

Introduction to Data Reshaping

Data rarely arrives in the format you need. Wide-format data—where each column represents a different observation—is common in spreadsheets and exports, but most analysis tools expect long-format data where each row represents a single observation.

Consider a sales report with columns for each month: Jan, Feb, Mar. This wide format is readable for humans but problematic for plotting time series in seaborn or training models that expect feature-value pairs. You need to unpivot this data, transforming those month columns into rows with a “Month” column and a “Sales” column.

Pivoting converts long data to wide format. Unpivoting (or melting) does the reverse. Pandas provides pd.melt() specifically for this wide-to-long transformation. Understanding when and how to use it will save you hours of manual data wrangling.

Understanding the Melt Function

The pd.melt() function takes a wide DataFrame and unpivots it into a longer format. Here’s the signature with its key parameters:

pd.melt(
    frame,
    id_vars=None,      # Columns to keep as identifiers
    value_vars=None,   # Columns to unpivot (default: all non-id columns)
    var_name=None,     # Name for the variable column
    value_name='value' # Name for the value column
)

Let’s break down each parameter:

  • id_vars: Columns that stay fixed. These become repeated for each unpivoted row.
  • value_vars: Columns to melt into rows. If omitted, all columns not in id_vars get melted.
  • var_name: The name of the new column containing the original column headers.
  • value_name: The name of the new column containing the original cell values.

Here’s the basic pattern:

import pandas as pd

# Wide format
df_wide = pd.DataFrame({
    'product': ['Widget', 'Gadget'],
    'Q1': [100, 150],
    'Q2': [120, 180],
    'Q3': [90, 200]
})

# Melt to long format
df_long = pd.melt(df_wide, id_vars=['product'])
print(df_long)

Output:

  product variable  value
0  Widget       Q1    100
1  Gadget       Q1    150
2  Widget       Q2    120
3  Gadget       Q2    180
4  Widget       Q3     90
5  Gadget       Q3    200

The product column stays intact while Q1, Q2, and Q3 become values in a new variable column.

Basic Melt Example

Let’s work through a realistic example. You have monthly sales data for multiple products:

import pandas as pd

# Sales data in wide format
sales_wide = pd.DataFrame({
    'product': ['Laptop', 'Phone', 'Tablet'],
    'Jan': [45000, 62000, 28000],
    'Feb': [48000, 58000, 31000],
    'Mar': [52000, 65000, 29000],
    'Apr': [49000, 71000, 33000]
})

print("Wide Format:")
print(sales_wide)

Output:

Wide Format:
  product    Jan    Feb    Mar    Apr
0  Laptop  45000  48000  52000  49000
1   Phone  62000  58000  65000  71000
2  Tablet  28000  31000  29000  33000

Now melt it:

sales_long = pd.melt(
    sales_wide,
    id_vars=['product'],
    var_name='month',
    value_name='sales'
)

print("\nLong Format:")
print(sales_long)

Output:

Long Format:
   product month  sales
0   Laptop   Jan  45000
1    Phone   Jan  62000
2   Tablet   Jan  28000
3   Laptop   Feb  48000
4    Phone   Feb  58000
5   Tablet   Feb  31000
6   Laptop   Mar  52000
7    Phone   Mar  65000
8   Tablet   Mar  29000
9   Laptop   Apr  49000
10   Phone   Apr  71000
11  Tablet   Apr  33000

Each row now represents a single observation: one product’s sales for one month. This format works directly with visualization libraries and statistical functions.

Customizing Column Names and Selecting Variables

You often don’t want to melt every column. The value_vars parameter lets you specify exactly which columns to unpivot:

# Only melt Q1 and Q2
quarterly_data = pd.DataFrame({
    'region': ['North', 'South', 'East', 'West'],
    'Q1': [100, 150, 120, 90],
    'Q2': [110, 160, 125, 95],
    'Q3': [105, 155, 130, 100],
    'Q4': [120, 170, 140, 110]
})

# Melt only first half of year
first_half = pd.melt(
    quarterly_data,
    id_vars=['region'],
    value_vars=['Q1', 'Q2'],
    var_name='quarter',
    value_name='revenue'
)

print(first_half)

Output:

  region quarter  revenue
0  North      Q1      100
1  South      Q1      150
2   East      Q1      120
3   West      Q1       90
4  North      Q2      110
5  South      Q2      160
6   East      Q2      125
7   West      Q2       95

The Q3 and Q4 columns are excluded entirely. This is useful when you only need a subset of your data or when some columns contain metadata you don’t want to melt.

Always specify var_name and value_name explicitly. The defaults (variable and value) are generic and make your code harder to understand later.

Handling Multi-Index and Complex DataFrames

Real datasets often require multiple identifier columns. Pass a list to id_vars:

# Sales by region and product
complex_data = pd.DataFrame({
    'region': ['North', 'North', 'South', 'South'],
    'product': ['Widget', 'Gadget', 'Widget', 'Gadget'],
    '2022': [1000, 1500, 1200, 1100],
    '2023': [1100, 1600, 1250, 1200],
    '2024': [1200, 1700, 1300, 1350]
})

melted = pd.melt(
    complex_data,
    id_vars=['region', 'product'],
    var_name='year',
    value_name='units_sold'
)

print(melted)

Output:

   region product  year  units_sold
0   North  Widget  2022        1000
1   North  Gadget  2022        1500
2   South  Widget  2022        1200
3   South  Gadget  2022        1100
4   North  Widget  2023        1100
5   North  Gadget  2023        1600
6   South  Widget  2023        1250
7   South  Gadget  2023        1200
8   North  Widget  2024        1200
9   North  Gadget  2024        1700
10  South  Widget  2024        1300
11  South  Gadget  2024        1350

Both region and product are preserved and repeated for each year value. This pattern handles any number of identifier columns.

For DataFrames with hierarchical column indices, you may need to flatten them first:

# Flatten multi-level columns before melting
df.columns = ['_'.join(col).strip() for col in df.columns.values]

Common Use Cases and Patterns

Preparing Data for Visualization

Seaborn and Plotly expect long-format data for most plots. Here’s a complete workflow:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Survey responses in wide format
survey = pd.DataFrame({
    'respondent': ['A', 'B', 'C', 'D'],
    'satisfaction': [4, 5, 3, 4],
    'likelihood_recommend': [5, 5, 2, 4],
    'ease_of_use': [3, 4, 4, 5]
})

# Melt for plotting
survey_long = pd.melt(
    survey,
    id_vars=['respondent'],
    var_name='metric',
    value_name='score'
)

# Create grouped bar chart
plt.figure(figsize=(10, 6))
sns.barplot(data=survey_long, x='respondent', y='score', hue='metric')
plt.title('Survey Responses by Metric')
plt.tight_layout()
plt.savefig('survey_results.png')

Time Series Reformatting

Convert date columns to a proper time series:

# Stock prices with date columns
stocks = pd.DataFrame({
    'ticker': ['AAPL', 'GOOGL', 'MSFT'],
    '2024-01-01': [185.5, 140.2, 375.0],
    '2024-01-02': [186.2, 141.5, 376.5],
    '2024-01-03': [184.8, 139.8, 374.2]
})

stocks_long = pd.melt(
    stocks,
    id_vars=['ticker'],
    var_name='date',
    value_name='price'
)

# Convert to proper datetime
stocks_long['date'] = pd.to_datetime(stocks_long['date'])
stocks_long = stocks_long.sort_values(['ticker', 'date'])

Melt vs Stack: When to Use Each

Pandas offers two ways to unpivot data: melt() and stack(). They serve different purposes.

Use melt() when:

  • Working with regular DataFrames without meaningful indices
  • You need explicit control over which columns to unpivot
  • Column names should become data values

Use stack() when:

  • Your DataFrame has a meaningful index you want to preserve
  • You’re working with hierarchical columns
  • You want the result as a Series or MultiIndex DataFrame

Here’s a comparison:

import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bob'],
    'math': [90, 85],
    'science': [88, 92]
})

# Using melt
melted = pd.melt(df, id_vars=['name'], var_name='subject', value_name='score')
print("Melt result:")
print(melted)

# Using stack (requires setting index first)
stacked = df.set_index('name').stack().reset_index()
stacked.columns = ['name', 'subject', 'score']
print("\nStack result:")
print(stacked)

Both produce identical output, but melt() is more explicit and readable. The stack() approach requires index manipulation and column renaming.

Performance is similar for most datasets. For very large DataFrames (millions of rows), stack() can be marginally faster since it operates on the index, but the difference rarely matters in practice.

Choose melt() as your default. It’s more intuitive, requires less boilerplate, and makes your intent clear. Reserve stack() for cases where you’re already working with hierarchical indices or need to preserve complex index structures.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.