How to Create a Box Plot in Matplotlib

Box plots, also known as box-and-whisker plots, are one of the most information-dense visualizations in data analysis. They display five key statistics simultaneously: minimum, first quartile (Q1),...

Key Insights

  • Box plots efficiently display data distribution through five key statistics (minimum, Q1, median, Q3, maximum) and automatically highlight outliers, making them superior to bar charts for comparing distributions across groups
  • Matplotlib’s boxplot() function offers extensive customization through parameters like patch_artist, notch, and showfliers, while Pandas integration enables direct plotting from DataFrames with minimal code
  • Horizontal box plots work better than vertical ones when comparing many categories or with long labels, and adding notches provides visual confidence intervals for median comparisons

Introduction to Box Plots

Box plots, also known as box-and-whisker plots, are one of the most information-dense visualizations in data analysis. They display five key statistics simultaneously: minimum, first quartile (Q1), median, third quartile (Q3), and maximum. Points beyond the whiskers are marked as outliers, typically defined as values more than 1.5 times the interquartile range (IQR) from the quartiles.

Use box plots when you need to understand data distribution, identify outliers, or compare distributions across multiple groups. They’re particularly valuable in exploratory data analysis, A/B testing, quality control, and anywhere you need to see data spread at a glance. Matplotlib provides robust box plot functionality through pyplot.boxplot(), with extensive customization options that we’ll explore in depth.

Basic Box Plot Creation

Creating a basic box plot in Matplotlib requires just a few lines of code. The plt.boxplot() function accepts a list, array, or sequence of arrays as input.

import matplotlib.pyplot as plt
import numpy as np

# Generate sample data
np.random.seed(42)
data = np.random.normal(100, 20, 200)

# Create basic box plot
plt.figure(figsize=(8, 6))
plt.boxplot(data)
plt.ylabel('Values')
plt.title('Basic Box Plot')
plt.grid(axis='y', alpha=0.3)
plt.show()

This creates a standard box plot with default styling. The box represents the interquartile range (IQR) from Q1 to Q3, the orange line shows the median, whiskers extend to 1.5 × IQR, and outliers appear as individual points. This default configuration works well for quick exploratory analysis, but you’ll often want more control over appearance.

Customizing Box Plot Appearance

Matplotlib offers extensive customization options to make your box plots more informative and visually appealing. The most important parameters include patch_artist (enables color filling), notch (adds confidence intervals), widths (controls box width), and various color parameters.

import matplotlib.pyplot as plt
import numpy as np

np.random.seed(42)
data = np.random.normal(100, 20, 200)

fig, ax = plt.subplots(figsize=(8, 6))

# Create customized box plot
bp = ax.boxplot(data, 
                patch_artist=True,  # Enable color filling
                notch=True,         # Add notch for median CI
                widths=0.6,
                boxprops=dict(facecolor='lightblue', color='darkblue', linewidth=2),
                whiskerprops=dict(color='darkblue', linewidth=1.5),
                capprops=dict(color='darkblue', linewidth=1.5),
                medianprops=dict(color='red', linewidth=2),
                flierprops=dict(marker='o', markerfacecolor='red', 
                               markersize=8, alpha=0.5))

ax.set_ylabel('Values', fontsize=12)
ax.set_title('Customized Box Plot with Notch', fontsize=14)
ax.grid(axis='y', alpha=0.3)
plt.show()

The patch_artist=True parameter is essential for filling boxes with colors. Notches provide a visual representation of the 95% confidence interval around the median—if notches from two box plots don’t overlap, their medians are significantly different. Each component (box, whiskers, caps, median, outliers) can be styled independently through dedicated property dictionaries.

Creating Multiple Box Plots

Comparing distributions across groups is where box plots truly shine. Pass a list of datasets to boxplot() to create side-by-side comparisons.

import matplotlib.pyplot as plt
import numpy as np

np.random.seed(42)

# Generate data for different groups
class_a = np.random.normal(75, 10, 100)
class_b = np.random.normal(82, 12, 100)
class_c = np.random.normal(78, 8, 100)
class_d = np.random.normal(85, 15, 100)

data = [class_a, class_b, class_c, class_d]
labels = ['Class A', 'Class B', 'Class C', 'Class D']

fig, ax = plt.subplots(figsize=(10, 6))

bp = ax.boxplot(data, labels=labels, patch_artist=True, notch=True)

# Color each box differently
colors = ['lightcoral', 'lightblue', 'lightgreen', 'lightyellow']
for patch, color in zip(bp['boxes'], colors):
    patch.set_facecolor(color)

ax.set_xlabel('Class', fontsize=12)
ax.set_ylabel('Test Scores', fontsize=12)
ax.set_title('Test Score Distribution by Class', fontsize=14)
ax.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

This visualization immediately reveals that Class D has the highest median but also the largest variance, while Class C shows the most consistent performance. This type of comparison would be difficult to interpret from bar charts or histograms alone.

Working with Pandas DataFrames

In real-world scenarios, you’ll typically work with Pandas DataFrames. Pandas provides convenient methods for creating box plots directly from DataFrame structures.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

np.random.seed(42)

# Create sample DataFrame
df = pd.DataFrame({
    'Product_A': np.random.normal(100, 15, 100),
    'Product_B': np.random.normal(120, 20, 100),
    'Product_C': np.random.normal(95, 10, 100),
    'Category': np.random.choice(['North', 'South', 'East', 'West'], 100)
})

# Method 1: Direct DataFrame boxplot
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

df[['Product_A', 'Product_B', 'Product_C']].boxplot(ax=ax1, patch_artist=True)
ax1.set_title('Sales by Product')
ax1.set_ylabel('Revenue ($)')
ax1.grid(axis='y', alpha=0.3)

# Method 2: Boxplot with groupby
df.boxplot(column='Product_A', by='Category', ax=ax2, patch_artist=True)
ax2.set_title('Product A Sales by Region')
ax2.set_xlabel('Region')
ax2.set_ylabel('Revenue ($)')
plt.suptitle('')  # Remove default title
ax2.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

The DataFrame boxplot() method handles data organization automatically. The by parameter enables grouping, which is invaluable for segmented analysis. For more complex scenarios, combine groupby() with Matplotlib’s boxplot() for complete control over styling and layout.

Advanced Customization

Advanced box plot techniques include horizontal orientation, custom outlier handling, and adding statistical annotations.

import matplotlib.pyplot as plt
import numpy as np

np.random.seed(42)

# Generate sample data
departments = ['Engineering', 'Sales', 'Marketing', 'Operations', 'Finance']
salaries = [np.random.normal(90000, 15000, 50),
            np.random.normal(75000, 12000, 50),
            np.random.normal(70000, 10000, 50),
            np.random.normal(65000, 11000, 50),
            np.random.normal(80000, 14000, 50)]

fig, ax = plt.subplots(figsize=(10, 8))

# Create horizontal box plot
bp = ax.boxplot(salaries, 
                labels=departments,
                vert=False,  # Horizontal orientation
                patch_artist=True,
                notch=True,
                showfliers=True,  # Show outliers
                flierprops=dict(marker='D', markerfacecolor='red', 
                               markersize=6, alpha=0.6))

# Customize colors
colors = ['#FF9999', '#66B2FF', '#99FF99', '#FFCC99', '#FF99CC']
for patch, color in zip(bp['boxes'], colors):
    patch.set_facecolor(color)
    patch.set_alpha(0.7)

# Add median value annotations
medians = [np.median(d) for d in salaries]
for i, median in enumerate(medians, 1):
    ax.text(median, i, f'${median:,.0f}', 
            verticalalignment='center', fontsize=10, 
            bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))

ax.set_xlabel('Annual Salary ($)', fontsize=12)
ax.set_title('Salary Distribution by Department', fontsize=14, fontweight='bold')
ax.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.show()

Horizontal box plots (vert=False) work better when comparing many categories or when labels are long. Adding median value annotations provides precise numbers without cluttering the visualization. The showfliers parameter controls outlier display—set to False when outliers distract from the main distribution pattern.

Best Practices and Common Use Cases

Choose box plots over histograms when comparing multiple distributions simultaneously. They’re more compact and make median and quartile comparisons trivial. Use them instead of bar charts when showing distributions rather than single summary statistics.

For readability, limit comparisons to 10-12 groups maximum in a single plot. Beyond that, consider faceting into multiple subplots or using violin plots. Always include axis labels and units. Add notches when statistical comparison of medians matters.

Common use cases include: comparing performance metrics across teams, analyzing experiment results across treatment groups, quality control monitoring across production batches, comparing response times across different system configurations, and financial analysis of returns across investment portfolios.

Avoid box plots when your audience isn’t familiar with quartile interpretation—they require some statistical literacy. For general audiences, consider supplementing with simpler visualizations or adding explanatory annotations. When distribution shape matters more than quartiles, violin plots or kernel density plots provide better insights.

Box plots excel at revealing data characteristics that summary statistics hide: skewness, outliers, spread, and multi-modal distributions all become visible. Master them, and you’ll have a powerful tool for any data analysis workflow.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.