How to Create a Crosstab in Pandas

Key Insights

Crosstabs are the fastest way to analyze relationships between categorical variables, revealing patterns that summary statistics miss
The normalize parameter transforms raw counts into percentages, making comparisons across groups with different sizes meaningful
Combining values and aggfunc parameters turns crosstabs from simple frequency tables into powerful aggregation tools for numeric data

Introduction to Crosstabs

A crosstab—short for cross-tabulation—is a table that displays the frequency distribution of variables. Think of it as a pivot table specifically designed for categorical data. When you need to answer questions like “How many employees in each department are full-time versus part-time?” or “What’s the breakdown of customer segments by region?”, a crosstab gives you the answer in seconds.

Pandas provides pd.crosstab() as a dedicated function for this task. While you can achieve similar results with pivot_table() or groupby(), crosstabs are more concise for frequency analysis and offer built-in normalization that other methods lack.

The key distinction: use crosstabs when you’re analyzing the relationship between two or more categorical variables. If you’re aggregating numeric values, pivot_table() might be more appropriate—though crosstabs can handle that too, as we’ll see.

Basic Crosstab Syntax

The pd.crosstab() function requires two arguments: index (row labels) and columns (column labels). Both should be array-like objects, typically DataFrame columns.

import pandas as pd
import numpy as np

# Sample employee data
data = {
    'department': ['Engineering', 'Sales', 'Engineering', 'HR', 'Sales', 
                   'Engineering', 'HR', 'Sales', 'Engineering', 'HR'],
    'status': ['Full-time', 'Full-time', 'Part-time', 'Full-time', 'Part-time',
               'Full-time', 'Part-time', 'Full-time', 'Full-time', 'Full-time'],
    'level': ['Senior', 'Junior', 'Junior', 'Senior', 'Junior',
              'Senior', 'Senior', 'Senior', 'Junior', 'Junior'],
    'salary': [95000, 55000, 45000, 72000, 48000, 
               105000, 38000, 78000, 75000, 52000]
}

df = pd.DataFrame(data)

# Basic crosstab: department vs employment status
ct = pd.crosstab(df['department'], df['status'])
print(ct)

Output:

status       Full-time  Part-time
department                       
Engineering          3          1
HR                   2          1
Sales                2          1

This immediately tells you that Engineering has the most full-time employees and that part-time workers are evenly distributed across departments. No loops, no manual counting—just one function call.

Adding Margins and Totals

Raw counts are useful, but you often need totals for context. The margins parameter adds row and column sums, while margins_name lets you customize the label.

# Crosstab with margins (totals)
ct_margins = pd.crosstab(
    df['department'], 
    df['status'],
    margins=True,
    margins_name='Total'
)
print(ct_margins)

Output:

status       Full-time  Part-time  Total
department                              
Engineering          3          1      4
HR                   2          1      3
Sales                2          1      3
Total                7          3     10

Now you can see that 70% of employees are full-time (7 out of 10) and that Engineering is the largest department. The margins provide essential context that raw crosstabs lack.

A practical tip: always include margins when presenting crosstabs to stakeholders. The totals anchor the data and prevent misinterpretation.

Normalizing Values (Percentages)

Counts can be misleading when group sizes differ significantly. If Engineering has 100 employees and HR has 10, comparing raw counts is meaningless. The normalize parameter converts counts to proportions.

Three normalization options exist:

'all': percentages of the grand total
'index': percentages within each row
'columns': percentages within each column

# Normalize across all values (percentage of grand total)
ct_all = pd.crosstab(
    df['department'], 
    df['status'],
    normalize='all'
)
print("Normalized by all:")
print(ct_all.round(2))

# Normalize by index (row percentages)
ct_index = pd.crosstab(
    df['department'], 
    df['status'],
    normalize='index'
)
print("\nNormalized by index (row):")
print(ct_index.round(2))

# Normalize by columns (column percentages)
ct_columns = pd.crosstab(
    df['department'], 
    df['status'],
    normalize='columns'
)
print("\nNormalized by columns:")
print(ct_columns.round(2))

Output:

Normalized by all:
status       Full-time  Part-time
department                       
Engineering        0.3        0.1
HR                 0.2        0.1
Sales              0.2        0.1

Normalized by index (row):
status       Full-time  Part-time
department                       
Engineering       0.75       0.25
HR                0.67       0.33
Sales             0.67       0.33

Normalized by columns:
status       Full-time  Part-time
department                       
Engineering       0.43       0.33
HR                0.29       0.33
Sales             0.29       0.33

Row normalization (normalize='index') is typically most useful—it answers “Within each department, what percentage are full-time?” This lets you compare departments fairly regardless of size.

Applying Aggregation Functions

Crosstabs aren’t limited to counting. The values and aggfunc parameters let you aggregate numeric data, transforming the crosstab into something closer to a pivot table.

# Average salary by department and status
ct_salary = pd.crosstab(
    df['department'],
    df['status'],
    values=df['salary'],
    aggfunc='mean'
)
print("Average salary:")
print(ct_salary.round(0))

# Multiple aggregations using a custom function
ct_multi = pd.crosstab(
    df['department'],
    df['status'],
    values=df['salary'],
    aggfunc=['mean', 'count', 'sum']
)
print("\nMultiple aggregations:")
print(ct_multi.round(0))

Output:

Average salary:
status       Full-time  Part-time
department                       
Engineering    91667.0    45000.0
HR             62000.0    38000.0
Sales          66500.0    48000.0

Multiple aggregations:
              mean                count            sum            
status   Full-time Part-time Full-time Part-time Full-time Part-time
department                                                          
Engineering  91667.0   45000.0       3.0       1.0  275000.0   45000.0
HR           62000.0   38000.0       2.0       1.0  124000.0   38000.0
Sales        66500.0   48000.0       2.0       1.0  133000.0   48000.0

This reveals that full-time Engineering employees earn significantly more on average than other groups. You can use any aggregation function that works with groupby(): 'sum', 'min', 'max', 'std', or even custom functions.

Multi-Level Crosstabs

Real analysis often requires more than two dimensions. Pass lists to index or columns to create hierarchical crosstabs.

# Multi-level index: department and level vs status
ct_multi_index = pd.crosstab(
    [df['department'], df['level']],
    df['status'],
    margins=True
)
print("Multi-level index:")
print(ct_multi_index)

# Multi-level columns: status vs department and level
ct_multi_cols = pd.crosstab(
    df['status'],
    [df['department'], df['level']]
)
print("\nMulti-level columns:")
print(ct_multi_cols)

Output:

Multi-level index:
status               Full-time  Part-time  All
department  level                              
Engineering Junior           1          1    2
            Senior           2          0    2
HR          Junior           1          1    2
            Senior           1          0    1
Sales       Junior           0          1    1
            Senior           2          0    2
All                          7          3   10

Multi-level columns:
department  Engineering        HR        Sales       
level            Junior Senior Junior Senior Junior Senior
status                                                    
Full-time             1      2      1      1      0      2
Part-time             1      0      1      0      1      0

Multi-level crosstabs answer complex questions: “Among full-time senior employees, which department has the most?” The hierarchical structure keeps related data together while showing all combinations.

Styling and Exporting

Raw numbers work for analysis, but presentations require polish. Pandas styling transforms crosstabs into visual tools.

# Create a crosstab for styling
ct_style = pd.crosstab(
    df['department'],
    df['status'],
    values=df['salary'],
    aggfunc='mean'
).round(0)

# Apply heatmap styling
styled = ct_style.style.background_gradient(
    cmap='YlOrRd',
    axis=None
).format('${:,.0f}')

# Display in Jupyter or export
styled

For export, you have several options:

# Export to Excel with formatting preserved
ct_margins.to_excel('crosstab_report.xlsx', sheet_name='Department Analysis')

# Export to HTML for web embedding
html_output = ct_style.style.background_gradient(cmap='Blues').to_html()
with open('crosstab.html', 'w') as f:
    f.write(html_output)

# Export to clipboard for pasting into other applications
ct_margins.to_clipboard()

The background_gradient() method creates heatmaps that make patterns immediately visible. High values get darker colors, drawing attention to outliers and trends that numbers alone might hide.

Practical Recommendations

After years of using crosstabs in production analysis, here’s what I’ve learned:

Start with counts, then normalize. Raw frequencies establish baseline understanding. Normalization comes second to answer comparative questions.

Always include margins for stakeholder presentations. Without totals, viewers lack context and may draw incorrect conclusions.

Use normalize='index' for most comparisons. Row percentages answer the most common business questions: “What percentage of X falls into category Y?”

Combine with .fillna(0) for sparse data. Missing combinations appear as NaN by default. Fill them explicitly when zeros are meaningful.

# Handle missing combinations
ct_filled = pd.crosstab(
    df['department'],
    df['status']
).fillna(0).astype(int)

Crosstabs are deceptively simple. Master them, and you’ll answer categorical questions faster than colleagues reaching for pivot tables or writing complex groupby chains.

Introduction to Crosstabs

Basic Crosstab Syntax

Adding Margins and Totals

Normalizing Values (Percentages)

Applying Aggregation Functions

Multi-Level Crosstabs

Styling and Exporting

Practical Recommendations

Liked this? There's more.

Similar Articles