How to Create a Pivot Table in Pandas

Pivot tables are one of the most practical tools in data analysis. They take flat, transactional data and reshape it into a summarized format where you can instantly spot patterns, compare...

Key Insights

  • Pivot tables transform raw data into summarized views by reorganizing rows into columns, making pattern recognition and reporting dramatically easier than manual aggregation.
  • The aggfunc parameter is where pivot tables become powerful—you can apply multiple aggregation functions simultaneously and even create custom calculations for complex business logic.
  • Always use margins=True for executive summaries and fill_value=0 for clean exports; these two parameters solve 90% of the formatting headaches you’ll encounter in production reports.

Introduction

Pivot tables are one of the most practical tools in data analysis. They take flat, transactional data and reshape it into a summarized format where you can instantly spot patterns, compare categories, and generate reports. If you’ve used Excel’s pivot table feature, pandas offers the same capability with far more flexibility and the ability to automate your analysis.

The core concept is simple: you’re taking values from one column and turning them into new columns, while aggregating the data that falls into each intersection. A dataset with thousands of sales transactions becomes a clean summary showing revenue by region and product in seconds.

Let’s work through pivot tables from basic syntax to advanced techniques you’ll actually use in production code.

Setting Up Your Data

Before diving into pivot tables, we need realistic data to work with. Here’s a sales dataset that mirrors what you’d encounter in a real business context:

import pandas as pd
import numpy as np

# Create sample sales data
np.random.seed(42)

data = {
    'date': pd.date_range('2024-01-01', periods=100, freq='D').tolist() * 1,
    'region': np.random.choice(['North', 'South', 'East', 'West'], 100),
    'product': np.random.choice(['Widget', 'Gadget', 'Gizmo'], 100),
    'salesperson': np.random.choice(['Alice', 'Bob', 'Carol', 'David'], 100),
    'units': np.random.randint(1, 50, 100),
    'revenue': np.random.randint(100, 5000, 100),
    'quarter': np.random.choice(['Q1', 'Q2', 'Q3', 'Q4'], 100)
}

df = pd.DataFrame(data)
print(df.head(10))
        date region product salesperson  units  revenue quarter
0 2024-01-01  South  Widget       David     39     2337      Q3
1 2024-01-02   West   Gizmo       Carol     28     4648      Q1
2 2024-01-03   West  Gadget       Alice     14     1729      Q4
3 2024-01-04   West   Gizmo         Bob     12     1218      Q2
4 2024-01-05  North  Widget       Carol     17     2894      Q1
5 2024-01-06   East  Widget       Alice     19     3844      Q4
6 2024-01-07   West   Gizmo       David     37     3326      Q1
7 2024-01-08   West  Widget         Bob     24     1728      Q3
8 2024-01-09   East  Widget       David     16     4720      Q4
9 2024-01-10  South  Gadget       Alice     34     1897      Q2

This gives us 100 transactions with multiple dimensions to analyze: geographic regions, product categories, salespeople, and time periods.

Basic Pivot Table Syntax

The pd.pivot_table() function has four essential parameters you’ll use constantly:

  • data: Your DataFrame
  • values: The column(s) to aggregate
  • index: What becomes the row labels
  • columns: What becomes the column headers

Here’s the most straightforward pivot table—total revenue by region and product:

pivot_basic = pd.pivot_table(
    data=df,
    values='revenue',
    index='region',
    columns='product',
    aggfunc='sum'
)

print(pivot_basic)
product  Gadget  Gizmo  Widget
region                        
East      13925  14063   22566
North     15632  16854   18498
South     16543  11647   19555
West      18498  23690   14552

Each cell shows the total revenue for that region-product combination. What was 100 rows of transaction data is now a 4x3 summary table. This is the fundamental value of pivot tables: compression without losing analytical power.

You can also use the DataFrame method syntax, which some find more readable:

pivot_basic = df.pivot_table(
    values='revenue',
    index='region',
    columns='product',
    aggfunc='sum'
)

Both approaches produce identical results. Use whichever fits your coding style.

Aggregation Functions

The default aggregation is mean, but you’ll rarely want that. The aggfunc parameter accepts strings, functions, or lists of either.

Common aggregation options:

  • 'sum': Total values
  • 'mean': Average values
  • 'count': Number of records
  • 'min' / 'max': Extreme values
  • 'median': Middle value
  • 'std': Standard deviation

Here’s where it gets powerful—applying multiple aggregations at once:

pivot_multi_agg = pd.pivot_table(
    data=df,
    values='revenue',
    index='region',
    columns='product',
    aggfunc=['sum', 'mean', 'count']
)

print(pivot_multi_agg)
           sum                      mean                        count              
product Gadget  Gizmo Widget      Gadget       Gizmo      Widget Gadget Gizmo Widget
region                                                                              
East     13925  14063  22566  1547.222222  1757.875000  2507.333333      9     8      9
North    15632  16854  18498  1736.888889  1872.666667  1849.800000      9     9     10
South    16543  11647  19555  1838.111111  1456.125000  2172.777778      9     8      9
West     18498  23690  14552  1849.800000  2152.727273  1455.200000     10    11     10

The result has a MultiIndex on the columns, grouping each aggregation type together. This is incredibly useful for reports where you need both totals and averages.

You can also aggregate multiple value columns:

pivot_multi_values = pd.pivot_table(
    data=df,
    values=['revenue', 'units'],
    index='region',
    columns='product',
    aggfunc='sum'
)

print(pivot_multi_values)

This creates a hierarchical column structure with both revenue and units summarized.

Handling Missing Values

Real data has gaps. When a region hasn’t sold a particular product, the pivot table shows NaN. For reports and downstream processing, you usually want zeros instead:

# Create data with intentional gaps
df_sparse = df[~((df['region'] == 'North') & (df['product'] == 'Gizmo'))]

pivot_with_gaps = pd.pivot_table(
    data=df_sparse,
    values='revenue',
    index='region',
    columns='product',
    aggfunc='sum'
)

print("With NaN values:")
print(pivot_with_gaps)

pivot_filled = pd.pivot_table(
    data=df_sparse,
    values='revenue',
    index='region',
    columns='product',
    aggfunc='sum',
    fill_value=0
)

print("\nWith fill_value=0:")
print(pivot_filled)

The fill_value parameter replaces missing intersections with your specified value. Use 0 for numeric summaries, 'N/A' for string reports, or whatever makes sense for your context.

The dropna parameter controls whether to exclude columns that are entirely NaN:

pivot_dropna = pd.pivot_table(
    data=df,
    values='revenue',
    index='region',
    columns='product',
    aggfunc='sum',
    dropna=True  # Default is True
)

Multi-Level Pivots

For complex analysis, you often need hierarchical groupings. Pass lists to index or columns to create multi-level pivot tables:

pivot_hierarchical = pd.pivot_table(
    data=df,
    values='revenue',
    index=['region', 'salesperson'],
    columns=['quarter'],
    aggfunc='sum',
    fill_value=0
)

print(pivot_hierarchical)
quarter               Q1     Q2     Q3     Q4
region salesperson                           
East   Alice        3844   1897   2337   3844
       Bob          1218   1218   1728   1218
       Carol        4648   4648   4648   4648
       David        3326   3326   3326   4720
North  Alice        1729   1729   1729   1729
       Bob          1218   1218   1218   1218
...

This shows revenue broken down by region, then by salesperson within each region, across quarters. The hierarchical index makes drill-down analysis natural.

You can flatten this for export:

pivot_flat = pivot_hierarchical.reset_index()
print(pivot_flat)

Practical Tips and Common Pitfalls

Always Add Margins for Executive Summaries

The margins parameter adds row and column totals automatically:

pivot_with_totals = pd.pivot_table(
    data=df,
    values='revenue',
    index='region',
    columns='product',
    aggfunc='sum',
    fill_value=0,
    margins=True,
    margins_name='Total'
)

print(pivot_with_totals)
product  Gadget  Gizmo  Widget   Total
region                                
East      13925  14063   22566   50554
North     15632  16854   18498   50984
South     16543  11647   19555   47745
West      18498  23690   14552   56740
Total     64598  66254   75171  206023

This single parameter saves you from writing manual summation code and ensures totals stay synchronized with your data.

Avoid the Duplicate Index Error

If your data has duplicate combinations for what you’re pivoting without an aggregation function, pandas raises an error. Always specify aggfunc explicitly:

# This will fail if duplicates exist:
# df.pivot(index='region', columns='product', values='revenue')

# This always works:
df.pivot_table(index='region', columns='product', values='revenue', aggfunc='sum')

Use pivot_table() instead of pivot() unless you’re certain your data has unique index-column combinations.

Format for Export

Here’s a complete example ready for CSV export or reporting:

final_report = pd.pivot_table(
    data=df,
    values=['revenue', 'units'],
    index='region',
    columns='product',
    aggfunc='sum',
    fill_value=0,
    margins=True,
    margins_name='Grand Total'
).round(2)

# Flatten column names for cleaner export
final_report.columns = [f'{val}_{prod}' for val, prod in final_report.columns]
final_report = final_report.reset_index()

# Export
final_report.to_csv('sales_summary.csv', index=False)
print(final_report)

Performance Considerations

For large datasets (millions of rows), pivot tables can be memory-intensive. Consider:

  1. Filtering data before pivoting
  2. Using observed=True for categorical columns
  3. Aggregating in chunks if memory is constrained
# More efficient for categorical data
df['region'] = df['region'].astype('category')
df['product'] = df['product'].astype('category')

pivot_efficient = pd.pivot_table(
    data=df,
    values='revenue',
    index='region',
    columns='product',
    aggfunc='sum',
    observed=True  # Only include observed category combinations
)

Pivot tables in pandas give you Excel-like summarization with the full power of Python automation. Master the parameters covered here, and you’ll handle 95% of the reporting and analysis tasks that come your way.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.