How to Use Melt in Pandas

Data rarely arrives in the format you need. You'll encounter 'wide' datasets where each variable gets its own column, and 'long' datasets where observations stack vertically with categorical...

Key Insights

  • The melt() function transforms wide-format DataFrames into long-format by unpivoting columns into rows, making data suitable for visualization libraries and statistical analysis
  • Master the four core parameters—id_vars, value_vars, var_name, and value_name—to control exactly which columns become identifiers, which get unpivoted, and how the resulting columns are named
  • Melt and pivot are inverse operations; understanding both lets you reshape data in any direction, which is essential for real-world data wrangling workflows

Introduction to Data Reshaping

Data rarely arrives in the format you need. You’ll encounter “wide” datasets where each variable gets its own column, and “long” datasets where observations stack vertically with categorical identifiers. Neither format is inherently better—the right choice depends on what you’re doing with the data.

Wide format works well for human readability and spreadsheet-style analysis. Long format is what most visualization libraries expect, and it’s required for many statistical operations. Pandas gives you melt() to convert wide to long, and pivot() to go the other direction.

If you’ve ever tried to plot multiple columns as separate lines in seaborn and hit a wall, or needed to reshape survey data for regression analysis, melt() is the function you need.

Understanding the Melt Function

The pd.melt() function unpivots a DataFrame from wide format to long format. Think of it as taking column headers and turning them into data values in a new column.

Here are the key parameters:

  • id_vars: Columns to keep as identifiers (they stay as columns)
  • value_vars: Columns to unpivot (their headers become values)
  • var_name: Name for the new column containing the old column headers
  • value_name: Name for the new column containing the values

Let’s see this in action with a sales dataset:

import pandas as pd

# Wide format: each quarter is a separate column
sales_wide = pd.DataFrame({
    'Product': ['Widget', 'Gadget', 'Gizmo'],
    'Q1': [150, 200, 175],
    'Q2': [180, 220, 160],
    'Q3': [200, 190, 210],
    'Q4': [170, 250, 195]
})

print("Wide format:")
print(sales_wide)

# Melt to long format
sales_long = pd.melt(
    sales_wide,
    id_vars=['Product'],
    value_vars=['Q1', 'Q2', 'Q3', 'Q4'],
    var_name='Quarter',
    value_name='Sales'
)

print("\nLong format:")
print(sales_long)

Output:

Wide format:
   Product   Q1   Q2   Q3   Q4
0   Widget  150  180  200  170
1   Gadget  200  220  190  250
2    Gizmo  175  160  210  195

Long format:
   Product Quarter  Sales
0   Widget      Q1    150
1   Gadget      Q1    200
2    Gizmo      Q1    175
3   Widget      Q2    180
4   Gadget      Q2    220
5    Gizmo      Q2    160
...

The three rows became twelve rows. Each product-quarter combination is now its own observation.

Basic Melt Operations

When you omit parameters, melt() uses sensible defaults. Without value_vars, it melts all columns except those in id_vars. Without custom names, you get variable and value.

# Student grades dataset
grades = pd.DataFrame({
    'Student': ['Alice', 'Bob', 'Charlie'],
    'Math': [92, 78, 85],
    'Science': [88, 82, 90],
    'English': [95, 75, 88]
})

# Minimal melt - just specify what to keep
grades_melted = pd.melt(grades, id_vars=['Student'])
print(grades_melted)

Output:

   Student variable  value
0    Alice     Math     92
1      Bob     Math     78
2  Charlie     Math     85
3    Alice  Science     88
4      Bob  Science     82
5  Charlie  Science     90
6    Alice  English     95
7      Bob  English     75
8  Charlie  English     88

The default column names are generic. Always specify var_name and value_name for clarity:

grades_clean = pd.melt(
    grades,
    id_vars=['Student'],
    var_name='Subject',
    value_name='Grade'
)
print(grades_clean)

This produces self-documenting output that anyone can understand without context.

Advanced Melt Techniques

Real datasets often have multiple identifier columns and you may only want to melt specific value columns.

Selective Melting with value_vars

When your DataFrame has columns you don’t want to melt or keep, use value_vars to be explicit:

# Dataset with extra columns we want to ignore
company_data = pd.DataFrame({
    'Region': ['North', 'South', 'East'],
    'Year': [2023, 2023, 2023],
    'Revenue': [500000, 450000, 380000],
    'Expenses': [320000, 290000, 250000],
    'Headcount': [45, 38, 32],  # Don't want to melt this
    'Notes': ['Expanded', 'Stable', 'New office']  # Or this
})

# Only melt Revenue and Expenses
financials = pd.melt(
    company_data,
    id_vars=['Region', 'Year'],
    value_vars=['Revenue', 'Expenses'],
    var_name='Metric',
    value_name='Amount'
)
print(financials)

Output:

   Region  Year    Metric  Amount
0   North  2023   Revenue  500000
1   South  2023   Revenue  450000
2    East  2023   Revenue  380000
3   North  2023  Expenses  320000
4   South  2023  Expenses  290000
5    East  2023  Expenses  250000

Combining Melt with GroupBy

Melted data integrates naturally with aggregation operations:

# After melting, calculate average by metric
avg_by_metric = financials.groupby('Metric')['Amount'].mean()
print(avg_by_metric)

# Or get totals by region
totals_by_region = financials.groupby('Region')['Amount'].sum()
print(totals_by_region)

Multi-Level Reshaping

For complex datasets with hierarchical column structures, you can chain operations:

# Performance data across multiple dimensions
performance = pd.DataFrame({
    'Department': ['Engineering', 'Sales', 'Marketing'],
    'Manager': ['Alice', 'Bob', 'Charlie'],
    '2022_Q1': [85, 92, 78],
    '2022_Q2': [88, 89, 82],
    '2023_Q1': [90, 95, 85],
    '2023_Q2': [92, 91, 88]
})

# Melt, then split the variable column
melted = pd.melt(
    performance,
    id_vars=['Department', 'Manager'],
    var_name='Period',
    value_name='Score'
)

# Extract year and quarter from the Period column
melted[['Year', 'Quarter']] = melted['Period'].str.split('_', expand=True)
melted = melted.drop('Period', axis=1)
print(melted)

Melt vs. Pivot: When to Use Each

These functions are inverses. Melt goes wide to long; pivot goes long to wide.

Use melt() when:

  • You need to prepare data for seaborn, plotly, or ggplot-style visualization
  • Column headers contain data values (like dates, categories, or measurements)
  • You’re normalizing data for database storage

Use pivot() when:

  • You need a summary table for reporting
  • You want to compare values side-by-side
  • You’re reversing a previous melt operation

Here’s a round-trip demonstration:

# Start with wide format
original = pd.DataFrame({
    'City': ['NYC', 'LA', 'Chicago'],
    'Jan': [32, 58, 25],
    'Feb': [35, 60, 28],
    'Mar': [45, 63, 40]
})

# Melt to long
long_format = pd.melt(
    original,
    id_vars=['City'],
    var_name='Month',
    value_name='Temperature'
)

# Pivot back to wide
restored = long_format.pivot(
    index='City',
    columns='Month',
    values='Temperature'
).reset_index()

restored.columns.name = None  # Clean up the column name
print(restored)

The restored DataFrame matches the original structure.

Real-World Application: Preparing Data for Visualization

Visualization libraries like seaborn expect long-format data for multi-series plots. Here’s a practical example with stock prices:

import pandas as pd
import matplotlib.pyplot as plt

# Stock prices in wide format (common from financial APIs)
stocks = pd.DataFrame({
    'Date': pd.date_range('2024-01-01', periods=5),
    'AAPL': [185.2, 187.5, 186.8, 189.3, 191.0],
    'GOOGL': [140.1, 141.8, 139.5, 142.3, 143.7],
    'MSFT': [375.4, 378.2, 376.9, 380.1, 382.5]
})

# Melt for plotting
stocks_long = pd.melt(
    stocks,
    id_vars=['Date'],
    var_name='Ticker',
    value_name='Price'
)

# Now seaborn can easily create a multi-line plot
import seaborn as sns

plt.figure(figsize=(10, 6))
sns.lineplot(data=stocks_long, x='Date', y='Price', hue='Ticker')
plt.title('Stock Prices Over Time')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Without melting, you’d need to plot each column separately and manually handle the legend. With melted data, seaborn does the work for you.

Common Pitfalls and Best Practices

Handling NaN Values

Melt preserves NaN values by default. If you want to drop them, chain with dropna():

melted_clean = pd.melt(df, id_vars=['ID']).dropna()

Memory Considerations

Melting increases row count significantly. A DataFrame with 1,000 rows and 100 value columns becomes 100,000 rows after melting. For large datasets:

# Check memory before melting
print(f"Original: {df.memory_usage(deep=True).sum() / 1e6:.2f} MB")

# Consider chunking for very large datasets
chunks = []
for chunk in pd.read_csv('large_file.csv', chunksize=10000):
    melted_chunk = pd.melt(chunk, id_vars=['ID'])
    chunks.append(melted_chunk)

result = pd.concat(chunks, ignore_index=True)

Preserving Data Types

Melting combines values from different columns into one, which can cause type coercion. If you’re mixing integers and floats, expect floats. If any column is a string, the entire value column becomes object type.

# Check types after melting
print(melted.dtypes)

# Convert back if needed
melted['value'] = pd.to_numeric(melted['value'], errors='coerce')

Use ignore_index Wisely

The ignore_index parameter (default True) resets the index. Set it to False if you need to preserve the original index for later joins:

melted = pd.melt(df, id_vars=['ID'], ignore_index=False)

The melt() function is fundamental to data wrangling in pandas. Master it alongside pivot(), and you’ll handle any reshaping task that comes your way.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.