How to Pivot a DataFrame in Pandas

Pivoting transforms data from a 'long' format (many rows, few columns) to a 'wide' format (fewer rows, more columns). If you've ever received transactional data where each row represents a single...

Key Insights

  • Use pivot() for simple reshaping when your index/column combinations are unique; use pivot_table() when you need to aggregate duplicate entries
  • The pivot_table() function is essentially a combination of groupby() and pivot(), making it the more versatile choice for real-world data
  • Always pair your pivoting knowledge with melt() to reverse the operation—data pipelines often require converting between long and wide formats multiple times

Introduction

Pivoting transforms data from a “long” format (many rows, few columns) to a “wide” format (fewer rows, more columns). If you’ve ever received transactional data where each row represents a single observation and needed to create a summary table with categories as columns, you’ve needed a pivot operation.

This transformation is fundamental to data analysis. Reporting tools expect wide format. Visualization libraries often work better with pivoted data. And sometimes you simply need to compare values side-by-side rather than scanning through hundreds of rows.

Pandas provides two primary methods for pivoting: pivot() for straightforward reshaping and pivot_table() for reshaping with aggregation. Understanding when to use each will save you hours of debugging and produce cleaner analysis code.

Understanding the Source Data

Before pivoting, you need data in long format. This is the typical structure you get from databases, CSV exports, and APIs—one observation per row with categorical columns that could become your new column headers.

Let’s create a realistic dataset representing quarterly sales data:

import pandas as pd
import numpy as np

# Create sample sales data in long format
data = {
    'date': ['2024-Q1', '2024-Q1', '2024-Q1', '2024-Q1', 
             '2024-Q2', '2024-Q2', '2024-Q2', '2024-Q2',
             '2024-Q3', '2024-Q3', '2024-Q3', '2024-Q3'],
    'product': ['Widget', 'Widget', 'Gadget', 'Gadget',
                'Widget', 'Widget', 'Gadget', 'Gadget',
                'Widget', 'Widget', 'Gadget', 'Gadget'],
    'region': ['North', 'South', 'North', 'South',
               'North', 'South', 'North', 'South',
               'North', 'South', 'North', 'South'],
    'revenue': [15000, 12000, 8000, 9500,
                18000, 14000, 11000, 10000,
                21000, 16000, 13000, 12500],
    'units_sold': [150, 120, 40, 47,
                   180, 140, 55, 50,
                   210, 160, 65, 62]
}

df = pd.DataFrame(data)
print(df)
      date product region  revenue  units_sold
0   2024-Q1  Widget  North    15000         150
1   2024-Q1  Widget  South    12000         120
2   2024-Q1  Gadget  North     8000          40
3   2024-Q1  Gadget  South     9500          47
4   2024-Q2  Widget  North    18000         180
5   2024-Q2  Widget  South    14000         140
6   2024-Q2  Gadget  North    11000          55
7   2024-Q2  Gadget  South    10000          50
8   2024-Q3  Widget  North    21000         210
9   2024-Q3  Widget  South    16000         160
10  2024-Q3  Gadget  North    13000          65
11  2024-Q3  Gadget  South    12500          62

This long format works well for storage but makes comparison difficult. Try quickly answering: “How did Widget revenue in the North change over time?” You’d need to scan multiple rows. A pivoted table makes this trivial.

Basic Pivoting with pivot()

The pivot() method performs a pure reshape operation. It takes three arguments: index (what becomes row labels), columns (what becomes column headers), and values (what fills the cells).

# Pivot to show revenue by date and product
pivoted = df.pivot(index='date', columns='product', values='revenue')
print(pivoted)

Wait—this will fail. Our data has duplicate combinations of date and product (North and South regions for each). The pivot() method cannot handle this:

# This raises ValueError: Index contains duplicate entries, cannot reshape
try:
    pivoted = df.pivot(index='date', columns='product', values='revenue')
except ValueError as e:
    print(f"Error: {e}")

For pivot() to work, each index-column combination must be unique. Let’s create a subset where this holds:

# Filter to North region only—now combinations are unique
df_north = df[df['region'] == 'North']

pivoted_north = df_north.pivot(index='date', columns='product', values='revenue')
print(pivoted_north)
product  Gadget  Widget
date                   
2024-Q1    8000   15000
2024-Q2   11000   18000
2024-Q3   13000   21000

Now you can instantly see the revenue trend for each product. The pivot() method is fast and simple, but its uniqueness requirement limits its usefulness with real-world data that typically contains duplicates.

Aggregating with pivot_table()

The pivot_table() function solves the duplicate problem by aggregating values. When multiple rows match the same index-column combination, it applies an aggregation function to combine them.

# Pivot with sum aggregation—combines North and South revenue
pivot_sum = df.pivot_table(
    index='date', 
    columns='product', 
    values='revenue', 
    aggfunc='sum'
)
print(pivot_sum)
product  Gadget  Widget
date                   
2024-Q1   17500   27000
2024-Q2   21000   32000
2024-Q3   25500   37000

The revenue for each product now represents the total across all regions. You can use any aggregation function—'mean', 'count', 'min', 'max', 'std', or even custom functions:

# Average revenue per product per quarter
pivot_mean = df.pivot_table(
    index='date', 
    columns='product', 
    values='revenue', 
    aggfunc='mean'
)
print(pivot_mean)
product   Gadget   Widget
date                     
2024-Q1   8750.0  13500.0
2024-Q2  10500.0  16000.0
2024-Q3  12750.0  18500.0

Need multiple aggregations? Pass a list:

# Multiple aggregation functions
pivot_multi = df.pivot_table(
    index='date', 
    columns='product', 
    values='revenue', 
    aggfunc=['sum', 'mean', 'count']
)
print(pivot_multi)
          sum            mean           count       
product Gadget Widget   Gadget   Widget Gadget Widget
date                                                 
2024-Q1  17500  27000   8750.0  13500.0      2      2
2024-Q2  21000  32000  10500.0  16000.0      2      2
2024-Q3  25500  37000  12750.0  18500.0      2      2

This creates a hierarchical column index, which you can flatten if needed:

# Flatten multi-level columns
pivot_multi.columns = ['_'.join(col).strip() for col in pivot_multi.columns.values]
print(pivot_multi.columns.tolist())
# ['sum_Gadget', 'sum_Widget', 'mean_Gadget', 'mean_Widget', 'count_Gadget', 'count_Widget']

Advanced Options

Real-world pivot tables often need totals, handling for missing data, and multiple value columns. The pivot_table() function handles all of these.

Adding Margins for Totals

The margins parameter adds row and column totals:

# Add totals row and column
pivot_with_totals = df.pivot_table(
    index='date', 
    columns='product', 
    values='revenue', 
    aggfunc='sum',
    margins=True,
    margins_name='Total'
)
print(pivot_with_totals)
product   Gadget  Widget   Total
date                            
2024-Q1    17500   27000   44500
2024-Q2    21000   32000   53000
2024-Q3    25500   37000   62500
Total      64000   96000  160000

Handling Missing Values

When certain index-column combinations don’t exist in your data, the result contains NaN. Use fill_value to replace these:

# Create data with a missing combination
df_sparse = df[~((df['date'] == '2024-Q2') & (df['product'] == 'Gadget'))]

pivot_sparse = df_sparse.pivot_table(
    index='date', 
    columns='product', 
    values='revenue', 
    aggfunc='sum',
    fill_value=0  # Replace NaN with 0
)
print(pivot_sparse)
product  Gadget  Widget
date                   
2024-Q1   17500   27000
2024-Q2       0   32000
2024-Q3   25500   37000

Multiple Value Columns

You can pivot multiple value columns simultaneously:

# Pivot both revenue and units_sold
pivot_multiple_values = df.pivot_table(
    index='date', 
    columns='product', 
    values=['revenue', 'units_sold'], 
    aggfunc='sum'
)
print(pivot_multiple_values)
        revenue         units_sold       
product  Gadget Widget     Gadget Widget
date                                     
2024-Q1   17500  27000         87    270
2024-Q2   21000  32000        105    320
2024-Q3   25500  37000        127    370

Reversing the Pivot with melt()

Data pipelines often require converting between formats. The melt() function reverses a pivot operation, converting wide data back to long format:

# Start with our pivoted data
pivoted = df.pivot_table(
    index='date', 
    columns='product', 
    values='revenue', 
    aggfunc='sum'
).reset_index()

print("Wide format:")
print(pivoted)

# Melt back to long format
melted = pivoted.melt(
    id_vars=['date'],           # Columns to keep as-is
    value_vars=['Gadget', 'Widget'],  # Columns to unpivot
    var_name='product',         # Name for the new category column
    value_name='total_revenue'  # Name for the new value column
)

print("\nLong format:")
print(melted)
Wide format:
product     date  Gadget  Widget
0        2024-Q1   17500   27000
1        2024-Q2   21000   32000
2        2024-Q3   25500   37000

Long format:
      date product  total_revenue
0  2024-Q1  Gadget          17500
1  2024-Q2  Gadget          21000
2  2024-Q3  Gadget          25500
3  2024-Q1  Widget          27000
4  2024-Q2  Widget          32000
5  2024-Q3  Widget          37000

Note that melting aggregated data doesn’t recover the original granularity—we now have total revenue per product, not the regional breakdown we started with.

Conclusion

Choose pivot() when your data has unique index-column combinations and you need a simple reshape without aggregation. It’s faster and the code is more explicit about your intent.

Choose pivot_table() for everything else. It handles duplicates through aggregation, supports margins for totals, fills missing values, and works with multiple value columns. The slight overhead is negligible for most datasets.

For related operations, explore stack() and unstack() which pivot between row and column indices in hierarchical DataFrames, and crosstab() for computing frequency tables. These tools, combined with melt() for the reverse operation, give you complete control over DataFrame shape transformations.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.