How to Create a Box Plot in Plotly

• Box plots excel at revealing data distribution, outliers, and comparative statistics across categories—Plotly makes them interactive with hover details and zoom capabilities that static plots can't...

Key Insights

• Box plots excel at revealing data distribution, outliers, and comparative statistics across categories—Plotly makes them interactive with hover details and zoom capabilities that static plots can’t match.

• Use plotly.express for quick exploratory analysis and plotly.graph_objects when you need granular control over whisker calculations, outlier styling, or complex multi-trace compositions.

• Group your box plots by categorical variables to instantly spot distribution differences across segments—this is where box plots outperform histograms and density plots for comparative analysis.

Introduction to Box Plots

Box plots (also called box-and-whisker plots) are one of the most information-dense visualizations in data analysis. They compress five key statistics into a single graphic: minimum, first quartile (Q1), median, third quartile (Q3), and maximum. More importantly, they make outliers immediately visible and allow instant comparison across multiple groups.

Unlike histograms that show frequency distributions, box plots emphasize the spread and skewness of your data. The “box” represents the interquartile range (IQR) containing the middle 50% of values, while the whiskers extend to show the range of typical values. Points beyond the whiskers are flagged as potential outliers.

Plotly brings box plots into the modern era with interactivity. Users can hover over elements to see exact values, zoom into specific ranges, and toggle between groups. This interactivity is invaluable when presenting to stakeholders who want to explore the data themselves rather than passively consume static images.

Setting Up Your Environment

Before creating box plots, you need Plotly installed in your Python environment. Plotly works seamlessly with pandas DataFrames, making it ideal for real-world data workflows.

pip install plotly pandas

Import the necessary libraries for this tutorial:

import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np

Plotly offers two primary interfaces: plotly.express for rapid prototyping and plotly.graph_objects for fine-grained control. We’ll explore both approaches so you can choose the right tool for each situation.

Creating a Basic Box Plot

Let’s start with a simple dataset representing test scores across different subjects. This example demonstrates the core functionality without unnecessary complexity.

# Create sample data
data = {
    'Subject': ['Math', 'Math', 'Math', 'Math', 'Math', 'Math',
                'Science', 'Science', 'Science', 'Science', 'Science', 'Science',
                'English', 'English', 'English', 'English', 'English', 'English'],
    'Score': [78, 85, 92, 88, 76, 95, 
              82, 79, 88, 91, 85, 77,
              88, 92, 85, 89, 94, 87]
}
df = pd.DataFrame(data)

# Create box plot with plotly.express
fig = px.box(df, x='Subject', y='Score', 
             title='Test Scores by Subject')
fig.show()

This code generates an interactive box plot with minimal effort. Plotly Express handles the statistical calculations automatically, computing quartiles and identifying outliers using the standard 1.5×IQR rule.

For scenarios requiring more control, use plotly.graph_objects:

# Separate data by subject
math_scores = df[df['Subject'] == 'Math']['Score']
science_scores = df[df['Subject'] == 'Science']['Score']
english_scores = df[df['Subject'] == 'English']['Score']

# Create figure with graph_objects
fig = go.Figure()
fig.add_trace(go.Box(y=math_scores, name='Math'))
fig.add_trace(go.Box(y=science_scores, name='Science'))
fig.add_trace(go.Box(y=english_scores, name='English'))

fig.update_layout(title='Test Scores by Subject',
                  yaxis_title='Score')
fig.show()

The graph_objects approach requires more code but provides complete control over each trace. This becomes essential when you need to customize individual boxes differently or combine box plots with other chart types.

Customizing Box Plot Appearance

Visual customization transforms functional plots into publication-ready graphics. Plotly offers extensive styling options for colors, labels, and layout elements.

# Customized box plot with colors and styling
fig = px.box(df, x='Subject', y='Score',
             title='Test Score Distribution Analysis',
             color='Subject',
             color_discrete_map={
                 'Math': '#FF6B6B',
                 'Science': '#4ECDC4',
                 'English': '#45B7D1'
             })

fig.update_traces(marker=dict(size=8, line=dict(width=2, color='DarkSlateGrey')),
                  boxmean='sd')  # Show mean and standard deviation

fig.update_layout(
    xaxis_title='Subject Area',
    yaxis_title='Test Score (0-100)',
    font=dict(size=14),
    showlegend=False,
    height=500
)

fig.show()

The boxmean='sd' parameter adds a dashed line showing the mean and a shaded area representing standard deviation—useful for audiences familiar with these statistics.

For precise control over whisker behavior and outlier detection:

fig = go.Figure()

fig.add_trace(go.Box(
    y=df[df['Subject'] == 'Math']['Score'],
    name='Math',
    marker_color='#FF6B6B',
    boxpoints='outliers',  # Only show outlier points
    whiskerwidth=0.2,
    marker=dict(
        size=10,
        color='rgba(255, 107, 107, 0.5)',
        line=dict(color='rgb(255, 107, 107)', width=2)
    ),
    line=dict(width=2)
))

fig.update_layout(yaxis=dict(range=[70, 100]))
fig.show()

The boxpoints parameter controls point display: use 'outliers' to show only anomalies, 'all' to display every data point, or False to hide points entirely.

Advanced Box Plot Techniques

Real analysis often requires comparing distributions across multiple dimensions. Grouped box plots handle this elegantly.

# Create dataset with multiple grouping variables
extended_data = {
    'Subject': ['Math', 'Math', 'Science', 'Science', 'English', 'English'] * 6,
    'Semester': ['Fall', 'Spring'] * 18,
    'Score': [78, 85, 82, 79, 88, 92, 92, 88, 88, 91, 85, 89,
              76, 95, 85, 77, 94, 87, 85, 90, 80, 88, 91, 86,
              89, 84, 87, 83, 90, 93, 81, 86, 84, 89, 88, 91]
}
extended_df = pd.DataFrame(extended_data)

# Grouped box plot
fig = px.box(extended_df, x='Subject', y='Score', color='Semester',
             title='Test Scores by Subject and Semester',
             boxmode='group')  # Place boxes side-by-side

fig.update_layout(yaxis_title='Score', height=500)
fig.show()

The boxmode='group' parameter places boxes side-by-side for easy comparison. Use boxmode='overlay' to superimpose boxes when comparing just two groups.

Horizontal box plots work better for categorical variables with long names:

# Horizontal box plot with custom hover information
fig = px.box(extended_df, y='Subject', x='Score', 
             orientation='h',
             hover_data={'Score': ':.1f'})

fig.update_traces(
    hovertemplate='<b>%{y}</b><br>Score: %{x}<br><extra></extra>'
)

fig.show()

Horizontal orientation is particularly effective in dashboards with limited vertical space or when you have many categories to compare.

Real-World Application

Let’s apply these techniques to a realistic scenario: analyzing regional sales performance across quarters.

# Simulate sales data
np.random.seed(42)
regions = ['North', 'South', 'East', 'West']
quarters = ['Q1', 'Q2', 'Q3', 'Q4']

sales_data = []
for region in regions:
    for quarter in quarters:
        # Generate sales with regional and seasonal patterns
        base = {'North': 50000, 'South': 45000, 'East': 55000, 'West': 48000}[region]
        seasonal = {'Q1': 0.9, 'Q2': 1.0, 'Q3': 0.95, 'Q4': 1.15}[quarter]
        
        for _ in range(15):  # 15 sales per region-quarter
            sales = base * seasonal * np.random.normal(1, 0.15)
            sales_data.append({
                'Region': region,
                'Quarter': quarter,
                'Sales': max(0, sales)  # Ensure non-negative
            })

sales_df = pd.DataFrame(sales_data)

# Create comprehensive box plot
fig = px.box(sales_df, x='Region', y='Sales', color='Quarter',
             title='Regional Sales Distribution by Quarter',
             labels={'Sales': 'Sales Revenue ($)'},
             boxmode='group')

fig.update_traces(quartilemethod='linear')  # Use linear interpolation for quartiles

fig.update_layout(
    yaxis=dict(tickformat='$,.0f'),
    hovermode='x unified',
    height=600,
    legend=dict(title='Quarter', orientation='h', y=1.1, x=0.5, xanchor='center')
)

fig.show()

This example demonstrates a complete workflow: data generation, multi-group comparison, and professional formatting. The quartilemethod='linear' parameter ensures consistent quartile calculation across all boxes, which matters when comparing distributions.

Conclusion and Best Practices

Box plots are indispensable for understanding data distributions and spotting outliers. Plotly elevates them from static graphics to interactive exploration tools that engage your audience.

Choose plotly.express when you need quick insights during exploratory analysis. It handles most use cases with minimal code. Switch to plotly.graph_objects when building dashboards, combining multiple chart types, or implementing custom statistical calculations.

Remember these key practices: Always label your axes clearly with units. Use color strategically to differentiate groups, not just for decoration. When presenting to non-technical audiences, consider adding reference lines for targets or thresholds. Enable boxmean to show both median and mean when your audience understands the distinction.

Box plots work best when comparing distributions across categories (regions, products, time periods) or identifying outliers in quality control scenarios. They struggle with bimodal distributions—use violin plots or histograms when you suspect multiple peaks in your data.

The interactivity Plotly provides isn’t just a nice feature—it’s transformative. Stakeholders can hover to see exact values, zoom into interesting ranges, and toggle groups on and off. This self-service exploration builds trust in your analysis and often surfaces questions that lead to deeper insights.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.