How to Unpivot a DataFrame with Melt in Pandas
Data rarely arrives in the format you need. Wide-format data—where each column represents a different observation—is common in spreadsheets and exports, but most analysis tools expect long-format...
Key Insights
- The
pd.melt()function transforms wide-format DataFrames into long-format by unpivoting columns into rows, making data suitable for visualization libraries and machine learning pipelines. - Master the four key parameters—
id_vars,value_vars,var_name, andvalue_name—to control exactly which columns get melted and how the output is labeled. - Choose
melt()overstack()when working with non-indexed data or when you need explicit control over column selection; usestack()for hierarchical index manipulation.
Introduction to Data Reshaping
Data rarely arrives in the format you need. Wide-format data—where each column represents a different observation—is common in spreadsheets and exports, but most analysis tools expect long-format data where each row represents a single observation.
Consider a sales report with columns for each month: Jan, Feb, Mar. This wide format is readable for humans but problematic for plotting time series in seaborn or training models that expect feature-value pairs. You need to unpivot this data, transforming those month columns into rows with a “Month” column and a “Sales” column.
Pivoting converts long data to wide format. Unpivoting (or melting) does the reverse. Pandas provides pd.melt() specifically for this wide-to-long transformation. Understanding when and how to use it will save you hours of manual data wrangling.
Understanding the Melt Function
The pd.melt() function takes a wide DataFrame and unpivots it into a longer format. Here’s the signature with its key parameters:
pd.melt(
frame,
id_vars=None, # Columns to keep as identifiers
value_vars=None, # Columns to unpivot (default: all non-id columns)
var_name=None, # Name for the variable column
value_name='value' # Name for the value column
)
Let’s break down each parameter:
id_vars: Columns that stay fixed. These become repeated for each unpivoted row.value_vars: Columns to melt into rows. If omitted, all columns not inid_varsget melted.var_name: The name of the new column containing the original column headers.value_name: The name of the new column containing the original cell values.
Here’s the basic pattern:
import pandas as pd
# Wide format
df_wide = pd.DataFrame({
'product': ['Widget', 'Gadget'],
'Q1': [100, 150],
'Q2': [120, 180],
'Q3': [90, 200]
})
# Melt to long format
df_long = pd.melt(df_wide, id_vars=['product'])
print(df_long)
Output:
product variable value
0 Widget Q1 100
1 Gadget Q1 150
2 Widget Q2 120
3 Gadget Q2 180
4 Widget Q3 90
5 Gadget Q3 200
The product column stays intact while Q1, Q2, and Q3 become values in a new variable column.
Basic Melt Example
Let’s work through a realistic example. You have monthly sales data for multiple products:
import pandas as pd
# Sales data in wide format
sales_wide = pd.DataFrame({
'product': ['Laptop', 'Phone', 'Tablet'],
'Jan': [45000, 62000, 28000],
'Feb': [48000, 58000, 31000],
'Mar': [52000, 65000, 29000],
'Apr': [49000, 71000, 33000]
})
print("Wide Format:")
print(sales_wide)
Output:
Wide Format:
product Jan Feb Mar Apr
0 Laptop 45000 48000 52000 49000
1 Phone 62000 58000 65000 71000
2 Tablet 28000 31000 29000 33000
Now melt it:
sales_long = pd.melt(
sales_wide,
id_vars=['product'],
var_name='month',
value_name='sales'
)
print("\nLong Format:")
print(sales_long)
Output:
Long Format:
product month sales
0 Laptop Jan 45000
1 Phone Jan 62000
2 Tablet Jan 28000
3 Laptop Feb 48000
4 Phone Feb 58000
5 Tablet Feb 31000
6 Laptop Mar 52000
7 Phone Mar 65000
8 Tablet Mar 29000
9 Laptop Apr 49000
10 Phone Apr 71000
11 Tablet Apr 33000
Each row now represents a single observation: one product’s sales for one month. This format works directly with visualization libraries and statistical functions.
Customizing Column Names and Selecting Variables
You often don’t want to melt every column. The value_vars parameter lets you specify exactly which columns to unpivot:
# Only melt Q1 and Q2
quarterly_data = pd.DataFrame({
'region': ['North', 'South', 'East', 'West'],
'Q1': [100, 150, 120, 90],
'Q2': [110, 160, 125, 95],
'Q3': [105, 155, 130, 100],
'Q4': [120, 170, 140, 110]
})
# Melt only first half of year
first_half = pd.melt(
quarterly_data,
id_vars=['region'],
value_vars=['Q1', 'Q2'],
var_name='quarter',
value_name='revenue'
)
print(first_half)
Output:
region quarter revenue
0 North Q1 100
1 South Q1 150
2 East Q1 120
3 West Q1 90
4 North Q2 110
5 South Q2 160
6 East Q2 125
7 West Q2 95
The Q3 and Q4 columns are excluded entirely. This is useful when you only need a subset of your data or when some columns contain metadata you don’t want to melt.
Always specify var_name and value_name explicitly. The defaults (variable and value) are generic and make your code harder to understand later.
Handling Multi-Index and Complex DataFrames
Real datasets often require multiple identifier columns. Pass a list to id_vars:
# Sales by region and product
complex_data = pd.DataFrame({
'region': ['North', 'North', 'South', 'South'],
'product': ['Widget', 'Gadget', 'Widget', 'Gadget'],
'2022': [1000, 1500, 1200, 1100],
'2023': [1100, 1600, 1250, 1200],
'2024': [1200, 1700, 1300, 1350]
})
melted = pd.melt(
complex_data,
id_vars=['region', 'product'],
var_name='year',
value_name='units_sold'
)
print(melted)
Output:
region product year units_sold
0 North Widget 2022 1000
1 North Gadget 2022 1500
2 South Widget 2022 1200
3 South Gadget 2022 1100
4 North Widget 2023 1100
5 North Gadget 2023 1600
6 South Widget 2023 1250
7 South Gadget 2023 1200
8 North Widget 2024 1200
9 North Gadget 2024 1700
10 South Widget 2024 1300
11 South Gadget 2024 1350
Both region and product are preserved and repeated for each year value. This pattern handles any number of identifier columns.
For DataFrames with hierarchical column indices, you may need to flatten them first:
# Flatten multi-level columns before melting
df.columns = ['_'.join(col).strip() for col in df.columns.values]
Common Use Cases and Patterns
Preparing Data for Visualization
Seaborn and Plotly expect long-format data for most plots. Here’s a complete workflow:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Survey responses in wide format
survey = pd.DataFrame({
'respondent': ['A', 'B', 'C', 'D'],
'satisfaction': [4, 5, 3, 4],
'likelihood_recommend': [5, 5, 2, 4],
'ease_of_use': [3, 4, 4, 5]
})
# Melt for plotting
survey_long = pd.melt(
survey,
id_vars=['respondent'],
var_name='metric',
value_name='score'
)
# Create grouped bar chart
plt.figure(figsize=(10, 6))
sns.barplot(data=survey_long, x='respondent', y='score', hue='metric')
plt.title('Survey Responses by Metric')
plt.tight_layout()
plt.savefig('survey_results.png')
Time Series Reformatting
Convert date columns to a proper time series:
# Stock prices with date columns
stocks = pd.DataFrame({
'ticker': ['AAPL', 'GOOGL', 'MSFT'],
'2024-01-01': [185.5, 140.2, 375.0],
'2024-01-02': [186.2, 141.5, 376.5],
'2024-01-03': [184.8, 139.8, 374.2]
})
stocks_long = pd.melt(
stocks,
id_vars=['ticker'],
var_name='date',
value_name='price'
)
# Convert to proper datetime
stocks_long['date'] = pd.to_datetime(stocks_long['date'])
stocks_long = stocks_long.sort_values(['ticker', 'date'])
Melt vs Stack: When to Use Each
Pandas offers two ways to unpivot data: melt() and stack(). They serve different purposes.
Use melt() when:
- Working with regular DataFrames without meaningful indices
- You need explicit control over which columns to unpivot
- Column names should become data values
Use stack() when:
- Your DataFrame has a meaningful index you want to preserve
- You’re working with hierarchical columns
- You want the result as a Series or MultiIndex DataFrame
Here’s a comparison:
import pandas as pd
df = pd.DataFrame({
'name': ['Alice', 'Bob'],
'math': [90, 85],
'science': [88, 92]
})
# Using melt
melted = pd.melt(df, id_vars=['name'], var_name='subject', value_name='score')
print("Melt result:")
print(melted)
# Using stack (requires setting index first)
stacked = df.set_index('name').stack().reset_index()
stacked.columns = ['name', 'subject', 'score']
print("\nStack result:")
print(stacked)
Both produce identical output, but melt() is more explicit and readable. The stack() approach requires index manipulation and column renaming.
Performance is similar for most datasets. For very large DataFrames (millions of rows), stack() can be marginally faster since it operates on the index, but the difference rarely matters in practice.
Choose melt() as your default. It’s more intuitive, requires less boilerplate, and makes your intent clear. Reserve stack() for cases where you’re already working with hierarchical indices or need to preserve complex index structures.