How to Create a Histogram in Plotly
Histograms visualize the distribution of continuous data by grouping values into bins and displaying their frequencies. Unlike bar charts that show categorical data, histograms reveal patterns like...
Key Insights
- Plotly Express provides the fastest way to create histograms with
px.histogram(), while Graph Objects offers more granular control for complex visualizations - Bin size dramatically affects histogram interpretation—use the Freedman-Diaconis rule (
nbinsxparameter) or experiment with values between 10-50 for most datasets - Overlaying multiple distributions with adjusted opacity (0.6-0.7) and normalization (
histnorm='percent') makes comparative analysis significantly more effective
Introduction to Histograms and Plotly
Histograms visualize the distribution of continuous data by grouping values into bins and displaying their frequencies. Unlike bar charts that show categorical data, histograms reveal patterns like skewness, modality, and outliers in numerical datasets. You’ll use them when analyzing everything from response times in web applications to customer age distributions in marketing data.
Plotly stands out for histogram creation because it generates interactive visualizations that work seamlessly in Jupyter notebooks, web dashboards, and standalone HTML files. Users can hover over bins for exact values, zoom into specific ranges, and export charts as static images. The library offers two approaches: Plotly Express for rapid prototyping and Graph Objects for fine-tuned control.
Basic Histogram Creation
Start with Plotly Express—it requires minimal code and handles most common scenarios. Install Plotly first if you haven’t already:
pip install plotly pandas numpy
Here’s a basic histogram using randomly generated data:
import plotly.express as px
import numpy as np
# Generate sample data - 1000 values from normal distribution
data = np.random.randn(1000)
# Create basic histogram
fig = px.histogram(data,
x=data,
title='Distribution of Sample Data')
fig.show()
For more control, use Graph Objects:
import plotly.graph_objects as go
import numpy as np
data = np.random.randn(1000)
fig = go.Figure(data=[go.Histogram(x=data)])
fig.update_layout(
title='Distribution Using Graph Objects',
xaxis_title='Value',
yaxis_title='Count'
)
fig.show()
Both approaches create interactive histograms, but Graph Objects lets you build complex multi-trace figures more easily.
Customizing Histogram Appearance
The default histogram often needs refinement. Control bin count with nbinsx, adjust colors for clarity, and add descriptive labels:
import plotly.express as px
import numpy as np
data = np.random.randn(1000) * 15 + 100 # Mean=100, SD=15
fig = px.histogram(
data,
x=data,
nbins=30, # Explicitly set bin count
title='Test Score Distribution',
labels={'x': 'Score', 'y': 'Number of Students'},
color_discrete_sequence=['#2E86AB'] # Custom color
)
fig.update_traces(
opacity=0.75,
marker_line_color='white',
marker_line_width=1.5
)
fig.update_layout(
bargap=0.1, # Gap between bars
showlegend=False,
font=dict(size=14)
)
fig.show()
Bin size matters enormously. Too few bins hide distribution details; too many create noise. For most datasets, start with 20-30 bins and adjust based on your data size. Use the square root rule (√n bins for n observations) as a baseline, or let Plotly’s automatic binning handle it initially.
# Calculate optimal bins using Freedman-Diaconis rule
def freedman_diaconis_bins(data):
q75, q25 = np.percentile(data, [75, 25])
iqr = q75 - q25
bin_width = 2 * iqr * len(data) ** (-1/3)
bins = int((data.max() - data.min()) / bin_width)
return bins
optimal_bins = freedman_diaconis_bins(data)
fig = px.histogram(data, x=data, nbins=optimal_bins)
Advanced Histogram Techniques
Comparing distributions side-by-side reveals insights that single histograms miss. Overlay multiple datasets with transparency:
import plotly.graph_objects as go
import numpy as np
# Two different distributions
group_a = np.random.randn(1000) * 10 + 100
group_b = np.random.randn(1000) * 15 + 105
fig = go.Figure()
fig.add_trace(go.Histogram(
x=group_a,
name='Control Group',
opacity=0.65,
marker_color='#2E86AB',
nbinsx=30
))
fig.add_trace(go.Histogram(
x=group_b,
name='Treatment Group',
opacity=0.65,
marker_color='#A23B72',
nbinsx=30
))
fig.update_layout(
barmode='overlay', # Critical for overlaying
title='A/B Test Results Comparison',
xaxis_title='Conversion Rate (%)',
yaxis_title='Frequency'
)
fig.show()
Normalize histograms when comparing groups with different sample sizes:
fig = px.histogram(
df,
x='value',
color='group',
histnorm='percent', # or 'probability', 'density'
barmode='overlay',
opacity=0.7
)
Cumulative histograms show the proportion of data below each value—useful for percentile analysis:
import plotly.graph_objects as go
import numpy as np
data = np.random.exponential(scale=2.0, size=1000)
fig = go.Figure()
fig.add_trace(go.Histogram(
x=data,
cumulative_enabled=True,
marker_color='#F18F01',
nbinsx=40
))
fig.update_layout(
title='Cumulative Distribution of Response Times',
xaxis_title='Response Time (seconds)',
yaxis_title='Cumulative Count'
)
fig.show()
Adding Interactivity and Statistical Information
Custom hover templates provide context without cluttering the visualization:
import plotly.graph_objects as go
import numpy as np
data = np.random.randn(1000) * 15 + 100
fig = go.Figure()
fig.add_trace(go.Histogram(
x=data,
nbinsx=30,
hovertemplate='<b>Score Range</b>: %{x}<br>' +
'<b>Students</b>: %{y}<br>' +
'<extra></extra>' # Removes trace name
))
# Add mean line
mean_val = np.mean(data)
fig.add_vline(
x=mean_val,
line_dash="dash",
line_color="red",
annotation_text=f"Mean: {mean_val:.1f}",
annotation_position="top"
)
# Add median line
median_val = np.median(data)
fig.add_vline(
x=median_val,
line_dash="dash",
line_color="green",
annotation_text=f"Median: {median_val:.1f}",
annotation_position="bottom"
)
fig.update_layout(title='Score Distribution with Statistical Markers')
fig.show()
Real-World Application
Here’s a complete workflow analyzing e-commerce order values:
import plotly.graph_objects as go
import pandas as pd
import numpy as np
# Simulate realistic e-commerce data
np.random.seed(42)
n_orders = 5000
# Log-normal distribution (typical for order values)
order_values = np.random.lognormal(mean=4.2, sigma=0.8, size=n_orders)
order_values = np.clip(order_values, 10, 500) # Realistic range
df = pd.DataFrame({
'order_value': order_values,
'customer_type': np.random.choice(['New', 'Returning'], size=n_orders, p=[0.3, 0.7])
})
# Create comparison histogram
fig = go.Figure()
for customer_type in ['New', 'Returning']:
data = df[df['customer_type'] == customer_type]['order_value']
fig.add_trace(go.Histogram(
x=data,
name=customer_type,
opacity=0.7,
nbinsx=40,
hovertemplate='<b>Order Value</b>: $%{x:.2f}<br>' +
'<b>Orders</b>: %{y}<br>' +
'<extra></extra>'
))
# Add average order value lines
for customer_type, color in [('New', 'blue'), ('Returning', 'red')]:
avg = df[df['customer_type'] == customer_type]['order_value'].mean()
fig.add_vline(
x=avg,
line_dash="dash",
line_color=color,
annotation_text=f"{customer_type} Avg: ${avg:.2f}",
annotation_position="top"
)
fig.update_layout(
barmode='overlay',
title='Order Value Distribution by Customer Type',
xaxis_title='Order Value ($)',
yaxis_title='Number of Orders',
hovermode='x unified'
)
fig.show()
# Print summary statistics
print("\nSummary Statistics:")
print(df.groupby('customer_type')['order_value'].describe())
This example demonstrates data preparation, multi-group comparison, statistical annotations, and actionable insights—everything you need for production analytics.
Conclusion and Best Practices
Effective histogram creation balances technical accuracy with visual clarity. Always start with automatic binning, then adjust based on what story your data tells. For skewed distributions like income or response times, consider log-scale axes or data transformations before binning.
Avoid these common mistakes: using too many bins (creates noise), comparing raw counts for different sample sizes (use normalization), and forgetting axis labels (context is critical). When overlaying multiple histograms, keep opacity between 0.6-0.75 and limit comparisons to 2-3 groups maximum.
Choose Plotly Express for speed and Plotly Express for exploratory analysis, but switch to Graph Objects when building dashboards or combining multiple visualization types. Export your histograms as HTML files for sharing interactive versions, or use fig.write_image() for static reports.
The histogram remains one of the most powerful tools for understanding data distributions—master it in Plotly, and you’ll communicate insights more effectively across your entire organization.