How to Stack and Unstack in Pandas
Pandas provides two complementary methods for reshaping data: `stack()` and `unstack()`. These operations pivot data between 'long' and 'wide' formats by moving index levels between the row and...
Key Insights
- Stack and unstack are inverse operations that reshape DataFrames by moving index levels between rows and columns, with stack converting columns to rows (longer format) and unstack converting rows to columns (wider format)
- Both operations require understanding MultiIndex structures—stack creates a MultiIndex in the row axis, while unstack creates a MultiIndex in the column axis
- These methods excel at reshaping hierarchical data but differ from pivot/melt in that they operate on index levels rather than column values, making them ideal for already-indexed data
Introduction to Reshaping with Stack/Unstack
Pandas provides two complementary methods for reshaping data: stack() and unstack(). These operations pivot data between “long” and “wide” formats by moving index levels between the row and column axes.
Think of it this way: stack() takes columns and “stacks” them into rows, making your DataFrame taller and narrower. unstack() does the opposite—it takes row index levels and spreads them out as columns, making your DataFrame shorter and wider.
You’ll reach for these methods when working with hierarchical data, preparing data for specific visualizations, or transforming aggregated results into presentation-ready formats. They’re particularly useful after groupby operations that produce MultiIndex results.
Understanding MultiIndex Prerequisites
Stack and unstack operations fundamentally work with hierarchical indices. When you stack a DataFrame, you’re converting column labels into an additional level of the row index. When you unstack, you’re promoting a row index level to become column labels.
Let’s start by creating a DataFrame with MultiIndex columns to see how this works:
import pandas as pd
import numpy as np
# Create a DataFrame with MultiIndex columns
columns = pd.MultiIndex.from_tuples([
('Revenue', 'Q1'), ('Revenue', 'Q2'),
('Expenses', 'Q1'), ('Expenses', 'Q2')
])
df = pd.DataFrame(
[[100, 120, 80, 85],
[200, 210, 150, 160],
[150, 180, 100, 110]],
index=['Product A', 'Product B', 'Product C'],
columns=columns
)
print(df)
Output:
Revenue Expenses
Q1 Q2 Q1 Q2
Product A 100 120 80 85
Product B 200 210 150 160
Product C 150 180 100 110
This hierarchical column structure has two levels: the metric type (Revenue/Expenses) and the quarter (Q1/Q2). Understanding this structure is essential because stack and unstack let you move these levels around.
The Stack Operation: Columns to Rows
The stack() method takes the innermost column level and pivots it to become the innermost row index level. This transforms your data from wide to long format.
# Basic stack on our MultiIndex DataFrame
stacked = df.stack()
print(stacked)
Output:
Expenses Revenue
Product A Q1 80 100
Q2 85 120
Product B Q1 150 200
Q2 160 210
Product C Q1 100 150
Q2 110 180
Notice what happened: the quarter level (Q1, Q2) moved from the columns to become part of the row index. The DataFrame now has a MultiIndex with two levels (Product, Quarter), and the columns are just the metric types.
You can control which level gets stacked using the level parameter:
# Stack the outer level (metric type) instead
stacked_outer = df.stack(level=0)
print(stacked_outer)
Output:
Q1 Q2
Product A Expenses 80 85
Revenue 100 120
Product B Expenses 150 160
Revenue 200 210
Product C Expenses 100 110
Revenue 150 180
Now the metric type (Revenue/Expenses) became part of the row index, and the quarters remain as columns. The level parameter accepts integers (0 for outermost, -1 for innermost) or level names if you’ve named your index levels.
For a simpler example with a regular DataFrame:
# Simple DataFrame without MultiIndex
simple_df = pd.DataFrame({
'Math': [90, 85, 92],
'Science': [88, 90, 87],
'English': [85, 88, 90]
}, index=['Alice', 'Bob', 'Carol'])
print("Original:")
print(simple_df)
print("\nStacked:")
print(simple_df.stack())
Output:
Original:
Math Science English
Alice 90 88 85
Bob 85 90 88
Carol 92 87 90
Stacked:
Alice Math 90
Science 88
English 85
Bob Math 85
Science 90
English 88
Carol Math 92
Science 87
English 90
dtype: int64
The result is a Series with a MultiIndex. Each student-subject combination becomes a single row.
The Unstack Operation: Rows to Columns
The unstack() method is the inverse of stack(). It takes a row index level and pivots it to become a column level.
# Start with a MultiIndex DataFrame
arrays = [
['North', 'North', 'South', 'South'],
['Store 1', 'Store 2', 'Store 1', 'Store 2']
]
index = pd.MultiIndex.from_arrays(arrays, names=['Region', 'Store'])
sales = pd.DataFrame({
'Sales': [100, 150, 200, 175],
'Returns': [10, 12, 18, 15]
}, index=index)
print("Original:")
print(sales)
Output:
Original:
Sales Returns
Region Store
North Store 1 100 10
Store 2 150 12
South Store 1 200 18
Store 2 175 15
Now let’s unstack the innermost level (Store):
# Unstack the Store level
unstacked = sales.unstack()
print(unstacked)
Output:
Sales Returns
Store 1 Store 2 Store 1 Store 2
Region
North 100 150 10 12
South 200 175 18 15
The Store level moved from the row index to the columns, creating a MultiIndex column structure. Each region now occupies a single row.
You can unstack different levels:
# Unstack the Region level instead
unstacked_region = sales.unstack(level=0)
print(unstacked_region)
Output:
Sales Returns
Region North South North South
Store
Store 1 100 200 10 18
Store 2 150 175 12 15
Now stores are rows and regions are spread across columns.
Handling Missing Values
When unstacking data, you may encounter situations where not all combinations exist. Pandas fills these gaps with NaN by default.
# Create data with uneven combinations
data = pd.DataFrame({
'City': ['NYC', 'NYC', 'LA', 'LA', 'Chicago'],
'Year': [2022, 2023, 2022, 2023, 2023],
'Population': [8.3, 8.4, 3.9, 4.0, 2.7]
})
indexed = data.set_index(['City', 'Year'])
print("Indexed data:")
print(indexed)
print("\nUnstacked (with NaN):")
print(indexed.unstack())
Output:
Indexed data:
Population
City Year
NYC 2022 8.3
2023 8.4
LA 2022 3.9
2023 4.0
Chicago 2023 2.7
Unstacked (with NaN):
Population
Year 2022 2023
City
Chicago NaN 2.7
LA 3.9 4.0
NYC 8.3 8.4
Chicago has no 2022 data, so it appears as NaN. Use fill_value to handle this:
# Fill missing values with 0
filled = indexed.unstack(fill_value=0)
print(filled)
Output:
Population
Year 2022 2023
City
Chicago 0.0 2.7
LA 3.9 4.0
NYC 8.3 8.4
The stack() method has a dropna parameter (True by default) that removes rows with NaN values:
# Stack with dropna=False to preserve NaN entries
restacked = filled.stack(dropna=False)
print(restacked)
Practical Use Case: Reshaping Time Series Data
Let’s work through a realistic scenario: analyzing monthly sales data across multiple products and regions.
# Create sample sales data in wide format
months = ['Jan', 'Feb', 'Mar']
products = ['Widget', 'Gadget']
columns = pd.MultiIndex.from_product([products, months],
names=['Product', 'Month'])
sales_data = pd.DataFrame(
[[1000, 1200, 1100, 500, 600, 550],
[800, 900, 950, 400, 450, 480],
[1500, 1600, 1400, 700, 750, 800]],
index=['East', 'West', 'Central'],
columns=columns
)
sales_data.index.name = 'Region'
print("Original wide format:")
print(sales_data)
# Stack to get long format for analysis
long_format = sales_data.stack(level=['Product', 'Month'])
long_format.name = 'Sales'
print("\nLong format (stacked):")
print(long_format.head(10))
# Calculate average by product across all regions and months
avg_by_product = long_format.groupby('Product').mean()
print("\nAverage sales by product:")
print(avg_by_product)
# Reshape for a region vs product summary
summary = long_format.groupby(['Region', 'Product']).sum().unstack()
print("\nTotal sales by region and product:")
print(summary)
This workflow demonstrates the power of stack/unstack: start with wide data, stack it for groupby operations, then unstack the results into a presentation-ready format.
Stack/Unstack vs. Pivot/Melt
Pandas offers multiple reshaping tools. Here’s when to use each:
Use stack/unstack when:
- Your data already has meaningful indices
- You’re working with MultiIndex structures
- You need to move index levels between axes
- You’re reshaping groupby results
Use pivot/melt when:
- You’re working with column values (not index levels)
- Your data is in a flat, database-style format
- You need to specify value and variable columns explicitly
# pivot_table works on column values
flat_data = pd.DataFrame({
'Region': ['East', 'East', 'West', 'West'],
'Product': ['Widget', 'Gadget', 'Widget', 'Gadget'],
'Sales': [1000, 500, 800, 400]
})
# This requires pivot_table, not unstack
pivoted = flat_data.pivot_table(index='Region', columns='Product', values='Sales')
Stack and unstack are index-level operations. Pivot and melt are column-value operations. Choose based on where your categorical information lives.
The key takeaway: master both pairs of methods. Stack/unstack gives you precise control over hierarchical data structures, while pivot/melt handles the common case of reshaping flat data. Together, they cover virtually any reshaping task you’ll encounter.