How to Use Melt in Pandas
Data rarely arrives in the format you need. You'll encounter 'wide' datasets where each variable gets its own column, and 'long' datasets where observations stack vertically with categorical...
Key Insights
- The
melt()function transforms wide-format DataFrames into long-format by unpivoting columns into rows, making data suitable for visualization libraries and statistical analysis - Master the four core parameters—
id_vars,value_vars,var_name, andvalue_name—to control exactly which columns become identifiers, which get unpivoted, and how the resulting columns are named - Melt and pivot are inverse operations; understanding both lets you reshape data in any direction, which is essential for real-world data wrangling workflows
Introduction to Data Reshaping
Data rarely arrives in the format you need. You’ll encounter “wide” datasets where each variable gets its own column, and “long” datasets where observations stack vertically with categorical identifiers. Neither format is inherently better—the right choice depends on what you’re doing with the data.
Wide format works well for human readability and spreadsheet-style analysis. Long format is what most visualization libraries expect, and it’s required for many statistical operations. Pandas gives you melt() to convert wide to long, and pivot() to go the other direction.
If you’ve ever tried to plot multiple columns as separate lines in seaborn and hit a wall, or needed to reshape survey data for regression analysis, melt() is the function you need.
Understanding the Melt Function
The pd.melt() function unpivots a DataFrame from wide format to long format. Think of it as taking column headers and turning them into data values in a new column.
Here are the key parameters:
id_vars: Columns to keep as identifiers (they stay as columns)value_vars: Columns to unpivot (their headers become values)var_name: Name for the new column containing the old column headersvalue_name: Name for the new column containing the values
Let’s see this in action with a sales dataset:
import pandas as pd
# Wide format: each quarter is a separate column
sales_wide = pd.DataFrame({
'Product': ['Widget', 'Gadget', 'Gizmo'],
'Q1': [150, 200, 175],
'Q2': [180, 220, 160],
'Q3': [200, 190, 210],
'Q4': [170, 250, 195]
})
print("Wide format:")
print(sales_wide)
# Melt to long format
sales_long = pd.melt(
sales_wide,
id_vars=['Product'],
value_vars=['Q1', 'Q2', 'Q3', 'Q4'],
var_name='Quarter',
value_name='Sales'
)
print("\nLong format:")
print(sales_long)
Output:
Wide format:
Product Q1 Q2 Q3 Q4
0 Widget 150 180 200 170
1 Gadget 200 220 190 250
2 Gizmo 175 160 210 195
Long format:
Product Quarter Sales
0 Widget Q1 150
1 Gadget Q1 200
2 Gizmo Q1 175
3 Widget Q2 180
4 Gadget Q2 220
5 Gizmo Q2 160
...
The three rows became twelve rows. Each product-quarter combination is now its own observation.
Basic Melt Operations
When you omit parameters, melt() uses sensible defaults. Without value_vars, it melts all columns except those in id_vars. Without custom names, you get variable and value.
# Student grades dataset
grades = pd.DataFrame({
'Student': ['Alice', 'Bob', 'Charlie'],
'Math': [92, 78, 85],
'Science': [88, 82, 90],
'English': [95, 75, 88]
})
# Minimal melt - just specify what to keep
grades_melted = pd.melt(grades, id_vars=['Student'])
print(grades_melted)
Output:
Student variable value
0 Alice Math 92
1 Bob Math 78
2 Charlie Math 85
3 Alice Science 88
4 Bob Science 82
5 Charlie Science 90
6 Alice English 95
7 Bob English 75
8 Charlie English 88
The default column names are generic. Always specify var_name and value_name for clarity:
grades_clean = pd.melt(
grades,
id_vars=['Student'],
var_name='Subject',
value_name='Grade'
)
print(grades_clean)
This produces self-documenting output that anyone can understand without context.
Advanced Melt Techniques
Real datasets often have multiple identifier columns and you may only want to melt specific value columns.
Selective Melting with value_vars
When your DataFrame has columns you don’t want to melt or keep, use value_vars to be explicit:
# Dataset with extra columns we want to ignore
company_data = pd.DataFrame({
'Region': ['North', 'South', 'East'],
'Year': [2023, 2023, 2023],
'Revenue': [500000, 450000, 380000],
'Expenses': [320000, 290000, 250000],
'Headcount': [45, 38, 32], # Don't want to melt this
'Notes': ['Expanded', 'Stable', 'New office'] # Or this
})
# Only melt Revenue and Expenses
financials = pd.melt(
company_data,
id_vars=['Region', 'Year'],
value_vars=['Revenue', 'Expenses'],
var_name='Metric',
value_name='Amount'
)
print(financials)
Output:
Region Year Metric Amount
0 North 2023 Revenue 500000
1 South 2023 Revenue 450000
2 East 2023 Revenue 380000
3 North 2023 Expenses 320000
4 South 2023 Expenses 290000
5 East 2023 Expenses 250000
Combining Melt with GroupBy
Melted data integrates naturally with aggregation operations:
# After melting, calculate average by metric
avg_by_metric = financials.groupby('Metric')['Amount'].mean()
print(avg_by_metric)
# Or get totals by region
totals_by_region = financials.groupby('Region')['Amount'].sum()
print(totals_by_region)
Multi-Level Reshaping
For complex datasets with hierarchical column structures, you can chain operations:
# Performance data across multiple dimensions
performance = pd.DataFrame({
'Department': ['Engineering', 'Sales', 'Marketing'],
'Manager': ['Alice', 'Bob', 'Charlie'],
'2022_Q1': [85, 92, 78],
'2022_Q2': [88, 89, 82],
'2023_Q1': [90, 95, 85],
'2023_Q2': [92, 91, 88]
})
# Melt, then split the variable column
melted = pd.melt(
performance,
id_vars=['Department', 'Manager'],
var_name='Period',
value_name='Score'
)
# Extract year and quarter from the Period column
melted[['Year', 'Quarter']] = melted['Period'].str.split('_', expand=True)
melted = melted.drop('Period', axis=1)
print(melted)
Melt vs. Pivot: When to Use Each
These functions are inverses. Melt goes wide to long; pivot goes long to wide.
Use melt() when:
- You need to prepare data for seaborn, plotly, or ggplot-style visualization
- Column headers contain data values (like dates, categories, or measurements)
- You’re normalizing data for database storage
Use pivot() when:
- You need a summary table for reporting
- You want to compare values side-by-side
- You’re reversing a previous melt operation
Here’s a round-trip demonstration:
# Start with wide format
original = pd.DataFrame({
'City': ['NYC', 'LA', 'Chicago'],
'Jan': [32, 58, 25],
'Feb': [35, 60, 28],
'Mar': [45, 63, 40]
})
# Melt to long
long_format = pd.melt(
original,
id_vars=['City'],
var_name='Month',
value_name='Temperature'
)
# Pivot back to wide
restored = long_format.pivot(
index='City',
columns='Month',
values='Temperature'
).reset_index()
restored.columns.name = None # Clean up the column name
print(restored)
The restored DataFrame matches the original structure.
Real-World Application: Preparing Data for Visualization
Visualization libraries like seaborn expect long-format data for multi-series plots. Here’s a practical example with stock prices:
import pandas as pd
import matplotlib.pyplot as plt
# Stock prices in wide format (common from financial APIs)
stocks = pd.DataFrame({
'Date': pd.date_range('2024-01-01', periods=5),
'AAPL': [185.2, 187.5, 186.8, 189.3, 191.0],
'GOOGL': [140.1, 141.8, 139.5, 142.3, 143.7],
'MSFT': [375.4, 378.2, 376.9, 380.1, 382.5]
})
# Melt for plotting
stocks_long = pd.melt(
stocks,
id_vars=['Date'],
var_name='Ticker',
value_name='Price'
)
# Now seaborn can easily create a multi-line plot
import seaborn as sns
plt.figure(figsize=(10, 6))
sns.lineplot(data=stocks_long, x='Date', y='Price', hue='Ticker')
plt.title('Stock Prices Over Time')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Without melting, you’d need to plot each column separately and manually handle the legend. With melted data, seaborn does the work for you.
Common Pitfalls and Best Practices
Handling NaN Values
Melt preserves NaN values by default. If you want to drop them, chain with dropna():
melted_clean = pd.melt(df, id_vars=['ID']).dropna()
Memory Considerations
Melting increases row count significantly. A DataFrame with 1,000 rows and 100 value columns becomes 100,000 rows after melting. For large datasets:
# Check memory before melting
print(f"Original: {df.memory_usage(deep=True).sum() / 1e6:.2f} MB")
# Consider chunking for very large datasets
chunks = []
for chunk in pd.read_csv('large_file.csv', chunksize=10000):
melted_chunk = pd.melt(chunk, id_vars=['ID'])
chunks.append(melted_chunk)
result = pd.concat(chunks, ignore_index=True)
Preserving Data Types
Melting combines values from different columns into one, which can cause type coercion. If you’re mixing integers and floats, expect floats. If any column is a string, the entire value column becomes object type.
# Check types after melting
print(melted.dtypes)
# Convert back if needed
melted['value'] = pd.to_numeric(melted['value'], errors='coerce')
Use ignore_index Wisely
The ignore_index parameter (default True) resets the index. Set it to False if you need to preserve the original index for later joins:
melted = pd.melt(df, id_vars=['ID'], ignore_index=False)
The melt() function is fundamental to data wrangling in pandas. Master it alongside pivot(), and you’ll handle any reshaping task that comes your way.