How to Create a Box Plot in Seaborn

Box plots (also called box-and-whisker plots) are one of the most efficient ways to visualize data distribution. They display five key statistics: minimum, first quartile (Q1), median (Q2), third...

Key Insights

  • Box plots visualize data distribution through quartiles, making them ideal for comparing multiple groups and identifying outliers—Seaborn’s high-level API creates publication-ready plots with minimal code compared to matplotlib
  • The hue parameter transforms basic box plots into powerful multi-dimensional visualizations, allowing you to compare distributions across two categorical variables simultaneously
  • Adjusting the whis parameter (default 1.5) controls outlier sensitivity—set it to (0, 100) to extend whiskers to actual data extremes rather than using the IQR-based calculation

Introduction to Box Plots

Box plots (also called box-and-whisker plots) are one of the most efficient ways to visualize data distribution. They display five key statistics: minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. The “box” spans from Q1 to Q3 (the interquartile range or IQR), with a line marking the median. “Whiskers” extend to show the data range, and points beyond the whiskers typically represent outliers.

Use box plots when you need to compare distributions across multiple categories, identify outliers, or quickly assess data spread and skewness. They’re particularly valuable when working with multiple groups because they’re more compact than histograms or density plots.

Seaborn simplifies box plot creation with sensible defaults, automatic color palettes, and seamless integration with pandas DataFrames. While matplotlib requires multiple function calls and manual calculations, Seaborn handles the statistical computations and styling automatically.

Setting Up Your Environment

Start by importing the necessary libraries. You’ll need Seaborn for plotting, matplotlib for display control, and pandas for data manipulation.

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Set the default style
sns.set_theme(style="whitegrid")

# Load a sample dataset
tips = sns.load_dataset('tips')
print(tips.head())

The tips dataset contains restaurant billing information with columns for total bill, tip amount, sex, smoker status, day, time, and party size. It’s perfect for demonstrating box plots because it has both numerical and categorical variables.

Creating a Basic Box Plot

The simplest box plot visualizes a single numerical variable’s distribution:

# Single variable box plot
plt.figure(figsize=(8, 6))
sns.boxplot(y=tips['total_bill'])
plt.title('Distribution of Total Bill')
plt.show()

This creates a vertical box plot showing the total bill distribution. However, box plots shine when comparing distributions across categories:

# Box plot with categorical grouping
plt.figure(figsize=(10, 6))
sns.boxplot(x='day', y='total_bill', data=tips)
plt.title('Total Bill Distribution by Day')
plt.ylabel('Total Bill ($)')
plt.xlabel('Day of Week')
plt.show()

Here, x specifies the categorical variable (day) and y specifies the numerical variable (total_bill). Seaborn automatically creates separate box plots for each day, making it trivial to compare distributions.

Customizing Box Plot Appearance

Seaborn provides extensive customization options. Start with color palettes:

# Using different color palettes
plt.figure(figsize=(10, 6))
sns.boxplot(x='day', y='total_bill', data=tips, palette='Set2')
plt.title('Total Bill by Day - Custom Palette')
plt.show()

# Custom colors for specific categories
custom_colors = ['#FF6B6B', '#4ECDC4', '#45B7D1', '#FFA07A']
plt.figure(figsize=(10, 6))
sns.boxplot(x='day', y='total_bill', data=tips, palette=custom_colors)
plt.title('Total Bill by Day - Custom Colors')
plt.show()

Change orientation for better readability with many categories or long labels:

# Horizontal box plot
plt.figure(figsize=(10, 6))
sns.boxplot(y='day', x='total_bill', data=tips, palette='muted')
plt.title('Total Bill by Day - Horizontal')
plt.xlabel('Total Bill ($)')
plt.show()

Adjust visual properties like width and line styles:

# Customizing box width and line properties
plt.figure(figsize=(10, 6))
sns.boxplot(x='day', y='total_bill', data=tips, 
            width=0.5,  # Narrower boxes
            linewidth=2.5,  # Thicker lines
            fliersize=8,  # Larger outlier markers
            palette='pastel')
plt.title('Total Bill by Day - Custom Width and Lines')
plt.show()

Advanced Box Plot Variations

The hue parameter adds a third dimension by splitting each category into subcategories:

# Box plot with hue for additional grouping
plt.figure(figsize=(12, 6))
sns.boxplot(x='day', y='total_bill', hue='sex', data=tips, palette='Set1')
plt.title('Total Bill by Day and Gender')
plt.legend(title='Gender', loc='upper right')
plt.show()

This creates side-by-side box plots for male and female customers within each day, revealing patterns that might be hidden in aggregated data.

For more complex comparisons, combine multiple categorical variables:

# Multiple categorical variables
plt.figure(figsize=(14, 6))
sns.boxplot(x='day', y='total_bill', hue='time', data=tips, palette='coolwarm')
plt.title('Total Bill by Day and Meal Time')
plt.legend(title='Meal Time')
plt.show()

Layer box plots with other plot types for richer visualizations:

# Combining box plot with strip plot
plt.figure(figsize=(12, 6))
sns.boxplot(x='day', y='total_bill', data=tips, palette='Set3', width=0.5)
sns.stripplot(x='day', y='total_bill', data=tips, 
              color='black', alpha=0.3, size=3)
plt.title('Total Bill by Day with Individual Data Points')
plt.show()

The strip plot overlay shows individual data points, providing context for the distribution summary.

Handling Outliers and Statistical Annotations

Control outlier detection by adjusting the whis parameter, which determines whisker length as a multiple of the IQR:

# Default behavior (whis=1.5)
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Standard outlier detection
sns.boxplot(x='day', y='total_bill', data=tips, ax=axes[0])
axes[0].set_title('Default (whis=1.5)')

# More conservative outlier detection
sns.boxplot(x='day', y='total_bill', data=tips, whis=2.0, ax=axes[1])
axes[1].set_title('Conservative (whis=2.0)')

# Whiskers to actual min/max
sns.boxplot(x='day', y='total_bill', data=tips, whis=(0, 100), ax=axes[2])
axes[2].set_title('Full Range (whis=(0,100))')

plt.tight_layout()
plt.show()

Add mean markers to complement the median line:

# Box plot with mean markers
plt.figure(figsize=(10, 6))
ax = sns.boxplot(x='day', y='total_bill', data=tips, palette='pastel')

# Calculate and plot means
means = tips.groupby('day')['total_bill'].mean()
positions = range(len(means))
ax.plot(positions, means, marker='D', color='red', 
        linestyle='none', markersize=8, label='Mean')

plt.title('Total Bill by Day with Mean Markers')
plt.legend()
plt.show()

Hide outliers when they clutter the visualization:

# Box plot without outliers
plt.figure(figsize=(10, 6))
sns.boxplot(x='day', y='total_bill', data=tips, 
            showfliers=False, palette='muted')
plt.title('Total Bill by Day (Outliers Hidden)')
plt.show()

Best Practices and Common Use Cases

Use box plots when you need to compare distributions across groups, especially with more than three categories. They’re more space-efficient than violin plots and provide clearer statistical summaries than strip plots alone.

Avoid box plots for small sample sizes (n < 20 per group) where individual points matter more than distribution statistics. In these cases, use strip plots or swarm plots instead.

Here’s a complete real-world example analyzing tip percentages across different scenarios:

# Calculate tip percentage
tips['tip_percent'] = (tips['tip'] / tips['total_bill']) * 100

# Create comprehensive comparison
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# By day
sns.boxplot(x='day', y='tip_percent', data=tips, 
            palette='Set2', ax=axes[0, 0])
axes[0, 0].set_title('Tip Percentage by Day', fontsize=14, fontweight='bold')
axes[0, 0].set_ylabel('Tip Percentage (%)')

# By time with gender
sns.boxplot(x='time', y='tip_percent', hue='sex', data=tips, 
            palette='Set1', ax=axes[0, 1])
axes[0, 1].set_title('Tip Percentage by Meal Time and Gender', 
                      fontsize=14, fontweight='bold')
axes[0, 1].set_ylabel('Tip Percentage (%)')

# By smoker status
sns.boxplot(x='smoker', y='tip_percent', data=tips, 
            palette='coolwarm', ax=axes[1, 0])
axes[1, 0].set_title('Tip Percentage by Smoker Status', 
                      fontsize=14, fontweight='bold')
axes[1, 0].set_ylabel('Tip Percentage (%)')
axes[1, 0].set_xlabel('Smoker')

# By party size
sns.boxplot(x='size', y='tip_percent', data=tips, 
            palette='viridis', ax=axes[1, 1])
axes[1, 1].set_title('Tip Percentage by Party Size', 
                      fontsize=14, fontweight='bold')
axes[1, 1].set_ylabel('Tip Percentage (%)')
axes[1, 1].set_xlabel('Party Size')

plt.tight_layout()
plt.show()

This comprehensive visualization reveals tipping patterns across multiple dimensions simultaneously. You can quickly identify that tip percentages remain relatively consistent across days but show more variation based on party size and meal time.

When presenting box plots, always label axes clearly and include units. If your audience isn’t familiar with box plots, add a brief explanation of the components. Consider adding mean markers when the mean differs significantly from the median, as this indicates skewness.

Box plots are your go-to tool for distribution comparison. Master the hue parameter for multi-dimensional analysis, adjust whis for domain-appropriate outlier detection, and combine with strip plots when individual data points add value. With these techniques, you’ll create informative visualizations that reveal patterns hidden in summary statistics alone.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.