How to Stack and Unstack in Pandas

Pandas provides two complementary methods for reshaping data: `stack()` and `unstack()`. These operations pivot data between 'long' and 'wide' formats by moving index levels between the row and...

Key Insights

  • Stack and unstack are inverse operations that reshape DataFrames by moving index levels between rows and columns, with stack converting columns to rows (longer format) and unstack converting rows to columns (wider format)
  • Both operations require understanding MultiIndex structures—stack creates a MultiIndex in the row axis, while unstack creates a MultiIndex in the column axis
  • These methods excel at reshaping hierarchical data but differ from pivot/melt in that they operate on index levels rather than column values, making them ideal for already-indexed data

Introduction to Reshaping with Stack/Unstack

Pandas provides two complementary methods for reshaping data: stack() and unstack(). These operations pivot data between “long” and “wide” formats by moving index levels between the row and column axes.

Think of it this way: stack() takes columns and “stacks” them into rows, making your DataFrame taller and narrower. unstack() does the opposite—it takes row index levels and spreads them out as columns, making your DataFrame shorter and wider.

You’ll reach for these methods when working with hierarchical data, preparing data for specific visualizations, or transforming aggregated results into presentation-ready formats. They’re particularly useful after groupby operations that produce MultiIndex results.

Understanding MultiIndex Prerequisites

Stack and unstack operations fundamentally work with hierarchical indices. When you stack a DataFrame, you’re converting column labels into an additional level of the row index. When you unstack, you’re promoting a row index level to become column labels.

Let’s start by creating a DataFrame with MultiIndex columns to see how this works:

import pandas as pd
import numpy as np

# Create a DataFrame with MultiIndex columns
columns = pd.MultiIndex.from_tuples([
    ('Revenue', 'Q1'), ('Revenue', 'Q2'),
    ('Expenses', 'Q1'), ('Expenses', 'Q2')
])

df = pd.DataFrame(
    [[100, 120, 80, 85],
     [200, 210, 150, 160],
     [150, 180, 100, 110]],
    index=['Product A', 'Product B', 'Product C'],
    columns=columns
)

print(df)

Output:

          Revenue      Expenses     
               Q1   Q2       Q1   Q2
Product A     100  120       80   85
Product B     200  210      150  160
Product C     150  180      100  110

This hierarchical column structure has two levels: the metric type (Revenue/Expenses) and the quarter (Q1/Q2). Understanding this structure is essential because stack and unstack let you move these levels around.

The Stack Operation: Columns to Rows

The stack() method takes the innermost column level and pivots it to become the innermost row index level. This transforms your data from wide to long format.

# Basic stack on our MultiIndex DataFrame
stacked = df.stack()
print(stacked)

Output:

           Expenses  Revenue
Product A Q1      80      100
          Q2      85      120
Product B Q1     150      200
          Q2     160      210
Product C Q1     100      150
          Q2     110      180

Notice what happened: the quarter level (Q1, Q2) moved from the columns to become part of the row index. The DataFrame now has a MultiIndex with two levels (Product, Quarter), and the columns are just the metric types.

You can control which level gets stacked using the level parameter:

# Stack the outer level (metric type) instead
stacked_outer = df.stack(level=0)
print(stacked_outer)

Output:

                    Q1   Q2
Product A Expenses  80   85
          Revenue  100  120
Product B Expenses 150  160
          Revenue  200  210
Product C Expenses 100  110
          Revenue  150  180

Now the metric type (Revenue/Expenses) became part of the row index, and the quarters remain as columns. The level parameter accepts integers (0 for outermost, -1 for innermost) or level names if you’ve named your index levels.

For a simpler example with a regular DataFrame:

# Simple DataFrame without MultiIndex
simple_df = pd.DataFrame({
    'Math': [90, 85, 92],
    'Science': [88, 90, 87],
    'English': [85, 88, 90]
}, index=['Alice', 'Bob', 'Carol'])

print("Original:")
print(simple_df)

print("\nStacked:")
print(simple_df.stack())

Output:

Original:
       Math  Science  English
Alice    90       88       85
Bob      85       90       88
Carol    92       87       90

Stacked:
Alice  Math       90
       Science    88
       English    85
Bob    Math       85
       Science    90
       English    88
Carol  Math       92
       Science    87
       English    90
dtype: int64

The result is a Series with a MultiIndex. Each student-subject combination becomes a single row.

The Unstack Operation: Rows to Columns

The unstack() method is the inverse of stack(). It takes a row index level and pivots it to become a column level.

# Start with a MultiIndex DataFrame
arrays = [
    ['North', 'North', 'South', 'South'],
    ['Store 1', 'Store 2', 'Store 1', 'Store 2']
]
index = pd.MultiIndex.from_arrays(arrays, names=['Region', 'Store'])

sales = pd.DataFrame({
    'Sales': [100, 150, 200, 175],
    'Returns': [10, 12, 18, 15]
}, index=index)

print("Original:")
print(sales)

Output:

Original:
                Sales  Returns
Region Store                  
North  Store 1    100       10
       Store 2    150       12
South  Store 1    200       18
       Store 2    175       15

Now let’s unstack the innermost level (Store):

# Unstack the Store level
unstacked = sales.unstack()
print(unstacked)

Output:

        Sales           Returns         
        Store 1 Store 2 Store 1 Store 2
Region                                  
North       100     150      10      12
South       200     175      18      15

The Store level moved from the row index to the columns, creating a MultiIndex column structure. Each region now occupies a single row.

You can unstack different levels:

# Unstack the Region level instead
unstacked_region = sales.unstack(level=0)
print(unstacked_region)

Output:

         Sales        Returns       
Region   North South   North South
Store                              
Store 1    100   200      10    18
Store 2    150   175      12    15

Now stores are rows and regions are spread across columns.

Handling Missing Values

When unstacking data, you may encounter situations where not all combinations exist. Pandas fills these gaps with NaN by default.

# Create data with uneven combinations
data = pd.DataFrame({
    'City': ['NYC', 'NYC', 'LA', 'LA', 'Chicago'],
    'Year': [2022, 2023, 2022, 2023, 2023],
    'Population': [8.3, 8.4, 3.9, 4.0, 2.7]
})

indexed = data.set_index(['City', 'Year'])
print("Indexed data:")
print(indexed)

print("\nUnstacked (with NaN):")
print(indexed.unstack())

Output:

Indexed data:
              Population
City    Year            
NYC     2022         8.3
        2023         8.4
LA      2022         3.9
        2023         4.0
Chicago 2023         2.7

Unstacked (with NaN):
        Population     
Year          2022 2023
City                   
Chicago        NaN  2.7
LA             3.9  4.0
NYC            8.3  8.4

Chicago has no 2022 data, so it appears as NaN. Use fill_value to handle this:

# Fill missing values with 0
filled = indexed.unstack(fill_value=0)
print(filled)

Output:

        Population     
Year          2022 2023
City                   
Chicago        0.0  2.7
LA             3.9  4.0
NYC            8.3  8.4

The stack() method has a dropna parameter (True by default) that removes rows with NaN values:

# Stack with dropna=False to preserve NaN entries
restacked = filled.stack(dropna=False)
print(restacked)

Practical Use Case: Reshaping Time Series Data

Let’s work through a realistic scenario: analyzing monthly sales data across multiple products and regions.

# Create sample sales data in wide format
months = ['Jan', 'Feb', 'Mar']
products = ['Widget', 'Gadget']

columns = pd.MultiIndex.from_product([products, months], 
                                      names=['Product', 'Month'])

sales_data = pd.DataFrame(
    [[1000, 1200, 1100, 500, 600, 550],
     [800, 900, 950, 400, 450, 480],
     [1500, 1600, 1400, 700, 750, 800]],
    index=['East', 'West', 'Central'],
    columns=columns
)
sales_data.index.name = 'Region'

print("Original wide format:")
print(sales_data)

# Stack to get long format for analysis
long_format = sales_data.stack(level=['Product', 'Month'])
long_format.name = 'Sales'
print("\nLong format (stacked):")
print(long_format.head(10))

# Calculate average by product across all regions and months
avg_by_product = long_format.groupby('Product').mean()
print("\nAverage sales by product:")
print(avg_by_product)

# Reshape for a region vs product summary
summary = long_format.groupby(['Region', 'Product']).sum().unstack()
print("\nTotal sales by region and product:")
print(summary)

This workflow demonstrates the power of stack/unstack: start with wide data, stack it for groupby operations, then unstack the results into a presentation-ready format.

Stack/Unstack vs. Pivot/Melt

Pandas offers multiple reshaping tools. Here’s when to use each:

Use stack/unstack when:

  • Your data already has meaningful indices
  • You’re working with MultiIndex structures
  • You need to move index levels between axes
  • You’re reshaping groupby results

Use pivot/melt when:

  • You’re working with column values (not index levels)
  • Your data is in a flat, database-style format
  • You need to specify value and variable columns explicitly
# pivot_table works on column values
flat_data = pd.DataFrame({
    'Region': ['East', 'East', 'West', 'West'],
    'Product': ['Widget', 'Gadget', 'Widget', 'Gadget'],
    'Sales': [1000, 500, 800, 400]
})

# This requires pivot_table, not unstack
pivoted = flat_data.pivot_table(index='Region', columns='Product', values='Sales')

Stack and unstack are index-level operations. Pivot and melt are column-value operations. Choose based on where your categorical information lives.

The key takeaway: master both pairs of methods. Stack/unstack gives you precise control over hierarchical data structures, while pivot/melt handles the common case of reshaping flat data. Together, they cover virtually any reshaping task you’ll encounter.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.