How to Create a Cat Plot in Seaborn
Seaborn's `catplot()` function is your Swiss Army knife for categorical data visualization. It's a figure-level interface, meaning it creates an entire figure and handles subplot layout...
Key Insights
- Catplot is Seaborn’s figure-level interface that creates complete figures with multiple subplots, unlike axis-level functions that draw on existing axes—use catplot when you need faceting or want Seaborn to handle figure creation automatically.
- The
kindparameter lets you switch between eight plot types (strip, swarm, box, violin, boxen, point, bar, count) while keeping the same syntax, making it easy to experiment with different visualizations of your categorical data. - Combining
hue,col, androwparameters transforms simple plots into multi-dimensional visualizations that reveal patterns across three or four categorical variables simultaneously.
Introduction to Categorical Plots
Seaborn’s catplot() function is your Swiss Army knife for categorical data visualization. It’s a figure-level interface, meaning it creates an entire figure and handles subplot layout automatically—this distinguishes it from axis-level functions like boxplot() or violinplot() that draw on existing matplotlib axes.
Use catplot() when you need faceting (creating multiple subplots based on categorical variables) or want a quick, publication-ready figure without manually setting up matplotlib figures and axes. Use axis-level functions when you’re building complex custom layouts or need fine-grained control over subplot positioning.
Here’s the basic syntax:
import seaborn as sns
import matplotlib.pyplot as plt
# Simple categorical plot
sns.catplot(data=df, x='category', y='value', kind='box')
plt.show()
The beauty of catplot() is its consistency—change the kind parameter and you get a completely different visualization with the same underlying data structure.
Setting Up Your Environment
You’ll need Seaborn (which includes matplotlib as a dependency) and pandas for data manipulation. Seaborn ships with several built-in datasets perfect for learning.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Load the tips dataset
tips = sns.load_dataset('tips')
# Examine the structure
print(tips.head())
print(tips.info())
The tips dataset contains restaurant billing data with categorical variables like day of the week, meal time, sex, and smoker status—ideal for categorical plotting. For production work, you’d load your own data:
# Loading custom data
df = pd.read_csv('your_data.csv')
# Ensure categorical variables are properly typed
df['category'] = df['category'].astype('category')
Basic Catplot Types
Seaborn offers eight plot types through catplot, each suited for different analytical needs. Let’s explore the most useful ones.
Strip plots show individual data points, useful for small to medium datasets where you want to see every observation:
# Strip plot: see all individual tips by day
sns.catplot(data=tips, x='day', y='tip', kind='strip', height=5, aspect=1.5)
plt.title('Tip Distribution by Day of Week')
plt.show()
Strip plots can suffer from overplotting. Swarm plots solve this by adjusting point positions to avoid overlap:
# Swarm plot: better for seeing density
sns.catplot(data=tips, x='day', y='tip', kind='swarm', height=5, aspect=1.5)
plt.show()
Box plots provide statistical summaries—quartiles, median, and outliers:
# Box plot: statistical summary of bills by meal time
sns.catplot(data=tips, x='time', y='total_bill', kind='box', height=5, aspect=1.2)
plt.ylabel('Total Bill ($)')
plt.xlabel('Meal Time')
plt.show()
Violin plots combine box plots with kernel density estimation, showing the full distribution shape:
# Violin plot: distribution shape and statistics
sns.catplot(data=tips, x='day', y='total_bill', kind='violin', height=5, aspect=1.5)
plt.show()
Bar plots show point estimates with confidence intervals, perfect for comparing means across categories:
# Bar plot: average tip by day with confidence intervals
sns.catplot(data=tips, x='day', y='tip', kind='bar', height=5, aspect=1.5)
plt.ylabel('Average Tip ($)')
plt.show()
Count plots are bar plots for frequencies—no y-axis needed:
# Count plot: frequency of visits by day
sns.catplot(data=tips, x='day', kind='count', height=5, aspect=1.5)
plt.ylabel('Number of Visits')
plt.show()
Customizing Catplots
The real power emerges when you add dimensions through hue, col, and row parameters.
Adding a third dimension with hue:
# Hue adds color encoding for another categorical variable
sns.catplot(
data=tips,
x='day',
y='total_bill',
hue='sex',
kind='box',
height=5,
aspect=1.5
)
plt.title('Total Bill by Day and Gender')
plt.show()
Creating faceted plots with col and row:
# Faceting creates separate subplots
sns.catplot(
data=tips,
x='day',
y='tip',
col='time',
hue='smoker',
kind='violin',
height=4,
aspect=1.2
)
plt.show()
This creates two subplots (Lunch and Dinner), each showing tip distributions by day with smoker status color-coded. You can add row for even more dimensions:
# Four-dimensional visualization
sns.catplot(
data=tips,
x='day',
y='total_bill',
col='time',
row='sex',
hue='smoker',
kind='box',
height=3,
aspect=1.5
)
plt.show()
Custom color palettes:
# Using built-in palettes
sns.catplot(
data=tips,
x='day',
y='tip',
hue='time',
kind='bar',
palette='Set2',
height=5,
aspect=1.5
)
# Custom color mapping
custom_colors = {'Lunch': '#FF6B6B', 'Dinner': '#4ECDC4'}
sns.catplot(
data=tips,
x='day',
y='tip',
hue='time',
kind='bar',
palette=custom_colors,
height=5,
aspect=1.5
)
plt.show()
Control figure dimensions with height (height in inches of each facet) and aspect (ratio of width to height).
Styling and Formatting
Polish your visualizations with proper labels, titles, and styling:
# Comprehensive styling example
sns.set_style('whitegrid')
sns.set_context('talk') # Larger fonts for presentations
g = sns.catplot(
data=tips,
x='day',
y='total_bill',
hue='time',
kind='violin',
palette='muted',
height=6,
aspect=1.5,
legend=False
)
# Customize labels and title
g.set_axis_labels('Day of Week', 'Total Bill ($)', fontsize=12)
g.fig.suptitle('Restaurant Bills by Day and Meal Time', fontsize=14, y=1.02)
# Add custom legend
plt.legend(title='Meal Time', loc='upper right', frameon=True)
# Rotate x-axis labels if needed
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Access the underlying matplotlib figure and axes through the returned object:
g = sns.catplot(data=tips, x='day', y='tip', kind='box')
# Access the figure
g.fig.set_size_inches(10, 6)
# Access axes (returns array of axes)
g.axes[0, 0].set_ylim(0, 12)
plt.show()
Practical Use Cases
Use Case 1: A/B Test Results Analysis
Suppose you’re analyzing conversion rates across different user segments:
# Simulated A/B test data
np.random.seed(42)
n_samples = 200
ab_data = pd.DataFrame({
'variant': np.random.choice(['Control', 'Treatment'], n_samples),
'device': np.random.choice(['Mobile', 'Desktop', 'Tablet'], n_samples),
'conversion_rate': np.random.beta(2, 5, n_samples) * 100,
'user_type': np.random.choice(['New', 'Returning'], n_samples)
})
# Comprehensive visualization
g = sns.catplot(
data=ab_data,
x='variant',
y='conversion_rate',
col='device',
hue='user_type',
kind='box',
height=4,
aspect=1.2,
palette='Set1'
)
g.set_axis_labels('Variant', 'Conversion Rate (%)')
g.fig.suptitle('A/B Test Results by Device and User Type', y=1.02)
plt.show()
Use Case 2: Sales Performance Dashboard
Analyzing quarterly sales across regions and product categories:
# Sales data
sales_data = pd.DataFrame({
'quarter': ['Q1', 'Q2', 'Q3', 'Q4'] * 30,
'region': np.random.choice(['North', 'South', 'East', 'West'], 120),
'category': np.random.choice(['Electronics', 'Clothing', 'Food'], 120),
'revenue': np.random.gamma(2, 50000, 120)
})
# Multi-faceted analysis
sns.set_style('darkgrid')
g = sns.catplot(
data=sales_data,
x='quarter',
y='revenue',
col='region',
hue='category',
kind='bar',
height=4,
aspect=1.0,
palette='viridis',
col_wrap=2 # Wrap columns for better layout
)
g.set_axis_labels('Quarter', 'Revenue ($)')
g.set_titles('{col_name} Region')
g.fig.suptitle('Quarterly Revenue by Region and Category', y=1.01)
plt.tight_layout()
plt.show()
Use Case 3: Quality Control Monitoring
Tracking defect rates across manufacturing shifts and production lines:
# Manufacturing data
qc_data = pd.DataFrame({
'shift': np.random.choice(['Morning', 'Afternoon', 'Night'], 150),
'line': np.random.choice(['Line A', 'Line B', 'Line C'], 150),
'defect_rate': np.random.exponential(2.5, 150)
})
# Violin plot for distribution visualization
g = sns.catplot(
data=qc_data,
x='shift',
y='defect_rate',
col='line',
kind='violin',
height=5,
aspect=1.0,
palette='coolwarm',
inner='box' # Show box plot inside violin
)
g.set_axis_labels('Shift', 'Defect Rate (%)')
g.set_titles('Production {col_name}')
plt.show()
Catplot excels at exploratory data analysis. Start with simple visualizations, then layer in additional dimensions through hue, col, and row parameters. The consistent API means you can quickly iterate through different plot types by changing only the kind parameter—invaluable when you’re not sure which visualization best reveals your data’s story.