How to Create a Violin Plot in Plotly
Violin plots are superior to box plots for one simple reason: they show you the actual distribution shape. A box plot reduces your data to five numbers (min, Q1, median, Q3, max), hiding whether your...
Key Insights
- Violin plots combine box plots with kernel density estimation to reveal distribution shapes that box plots miss—critical for detecting bimodal distributions or skewness in your data
- Plotly’s interactive violin plots let users hover for exact values and zoom into regions of interest, making them superior to static matplotlib alternatives for exploratory analysis
- Use
plotly.expressfor quick prototypes andplotly.graph_objectswhen you need fine-grained control over styling, annotations, and complex multi-group comparisons
Introduction to Violin Plots
Violin plots are superior to box plots for one simple reason: they show you the actual distribution shape. A box plot reduces your data to five numbers (min, Q1, median, Q3, max), hiding whether your distribution is bimodal, skewed, or has unusual density patterns. Violin plots solve this by overlaying a kernel density estimation on both sides of the box plot, creating a symmetrical shape that reveals the full story.
Use violin plots when distribution shape matters—analyzing user behavior patterns, comparing experimental groups where you suspect multimodal distributions, or examining sensor data for anomalies. Plotly’s implementation adds interactive hover tooltips, zooming, and panning, making it trivial to explore outliers and density patterns without writing additional code.
Basic Violin Plot Setup
Install Plotly if you haven’t already:
pip install plotly pandas
Plotly offers two APIs: plotly.express for rapid prototyping and plotly.graph_objects for detailed control. Start with Express—it’s faster and handles most use cases.
import plotly.express as px
import pandas as pd
import numpy as np
# Generate sample data
np.random.seed(42)
data = pd.DataFrame({
'values': np.concatenate([
np.random.normal(100, 15, 200),
np.random.normal(130, 10, 150)
]),
'category': ['Group A'] * 200 + ['Group B'] * 150
})
# Create basic violin plot
fig = px.violin(data, y='values', x='category', box=True, points='all')
fig.show()
This creates an interactive violin plot with embedded box plots (box=True) and all data points visible (points='all'). The bimodal nature of our combined distribution becomes immediately apparent—something a box plot would completely obscure.
For more control, use graph_objects:
import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Violin(
y=data[data['category'] == 'Group A']['values'],
name='Group A',
box_visible=True,
meanline_visible=True
))
fig.update_layout(
title='Distribution Analysis',
yaxis_title='Values',
showlegend=True
)
fig.show()
Customizing Violin Plot Appearance
Violin plots need visual clarity. Default settings rarely cut it for production dashboards or publications.
import plotly.express as px
import pandas as pd
import numpy as np
# Create richer dataset
np.random.seed(42)
data = pd.DataFrame({
'score': np.concatenate([
np.random.gamma(2, 20, 100),
np.random.gamma(5, 15, 100),
np.random.gamma(3, 25, 100)
]),
'group': ['Control'] * 100 + ['Treatment A'] * 100 + ['Treatment B'] * 100
})
fig = px.violin(
data,
y='score',
x='group',
box=True, # Show box plot inside
points='outliers', # Only show outliers as points
color='group', # Color by group
color_discrete_map={
'Control': '#636EFA',
'Treatment A': '#EF553B',
'Treatment B': '#00CC96'
}
)
# Customize appearance
fig.update_traces(
meanline_visible=True, # Show mean line
width=0.6, # Violin width
opacity=0.7, # Transparency
marker=dict(size=3) # Point size
)
fig.update_layout(
title='Treatment Effect Analysis',
yaxis_title='Performance Score',
xaxis_title='Experimental Group',
font=dict(size=12),
height=500,
showlegend=False # X-axis labels are sufficient
)
fig.show()
The points parameter deserves attention. Options include:
'all': Show every data point (use for small datasets)'outliers': Only show outliers (best for medium datasets)False: Hide points entirely (cleanest for large datasets)
Setting meanline_visible=True adds a dashed line at the mean, complementing the median line from the box plot. This is crucial when comparing groups with different skewness.
Multi-Group Violin Plots
Comparing multiple groups side-by-side reveals patterns impossible to spot in separate plots.
import plotly.graph_objects as go
import pandas as pd
import numpy as np
# Simulate A/B/C test results across different user segments
np.random.seed(42)
data = pd.DataFrame({
'conversion_time': np.concatenate([
np.random.lognormal(3, 0.5, 80), # Variant A - Mobile
np.random.lognormal(2.8, 0.6, 80), # Variant A - Desktop
np.random.lognormal(2.5, 0.4, 80), # Variant B - Mobile
np.random.lognormal(2.3, 0.5, 80), # Variant B - Desktop
np.random.lognormal(2.7, 0.55, 80),# Variant C - Mobile
np.random.lognormal(2.4, 0.45, 80) # Variant C - Desktop
]),
'variant': ['A']*160 + ['B']*160 + ['C']*160,
'platform': (['Mobile']*80 + ['Desktop']*80) * 3
})
fig = go.Figure()
colors = {'Mobile': '#FF6B6B', 'Desktop': '#4ECDC4'}
for platform in ['Mobile', 'Desktop']:
for variant in ['A', 'B', 'C']:
subset = data[(data['platform'] == platform) & (data['variant'] == variant)]
fig.add_trace(go.Violin(
x=subset['variant'],
y=subset['conversion_time'],
name=platform,
legendgroup=platform,
scalegroup=platform,
line_color=colors[platform],
box_visible=True,
meanline_visible=True,
showlegend=(variant == 'A') # Only show legend once per platform
))
fig.update_layout(
title='Conversion Time by Variant and Platform',
xaxis_title='Test Variant',
yaxis_title='Time to Conversion (seconds)',
violinmode='group', # Side-by-side violins
height=500
)
fig.show()
The violinmode='group' parameter places violins side-by-side. Use violinmode='overlay' for overlapping violins, though this reduces clarity with more than two groups.
Advanced Customization
Split violins directly compare two groups by showing half of each distribution on opposite sides.
import plotly.graph_objects as go
import pandas as pd
import numpy as np
# Compare male vs female salary distributions across departments
np.random.seed(42)
data = pd.DataFrame({
'salary': np.concatenate([
np.random.normal(75000, 12000, 100), # Engineering - Female
np.random.normal(78000, 13000, 120), # Engineering - Male
np.random.normal(68000, 10000, 90), # Marketing - Female
np.random.normal(70000, 11000, 85), # Marketing - Male
np.random.normal(82000, 15000, 80), # Sales - Female
np.random.normal(85000, 16000, 95) # Sales - Male
]),
'department': ['Engineering']*220 + ['Marketing']*175 + ['Sales']*175,
'gender': (['Female']*100 + ['Male']*120) + (['Female']*90 + ['Male']*85) + (['Female']*80 + ['Male']*95)
})
fig = go.Figure()
fig.add_trace(go.Violin(
x=data[data['gender'] == 'Female']['department'],
y=data[data['gender'] == 'Female']['salary'],
name='Female',
side='negative',
line_color='#FF6B6B',
fillcolor='#FF6B6B',
opacity=0.6,
box_visible=True,
meanline_visible=True,
hovertemplate='<b>Female</b><br>Dept: %{x}<br>Salary: $%{y:,.0f}<extra></extra>'
))
fig.add_trace(go.Violin(
x=data[data['gender'] == 'Male']['department'],
y=data[data['gender'] == 'Male']['salary'],
name='Male',
side='positive',
line_color='#4ECDC4',
fillcolor='#4ECDC4',
opacity=0.6,
box_visible=True,
meanline_visible=True,
hovertemplate='<b>Male</b><br>Dept: %{x}<br>Salary: $%{y:,.0f}<extra></extra>'
))
fig.update_layout(
title='Salary Distribution by Department and Gender',
xaxis_title='Department',
yaxis_title='Annual Salary (USD)',
violinmode='overlay',
height=500
)
fig.show()
Custom hover templates (hovertemplate) transform user experience. The <extra></extra> removes the default secondary box. Use %{y:,.0f} for formatted numbers and <b> tags for bold text.
Real-World Use Case
Here’s a complete example analyzing customer support response times across ticket priorities and time zones.
import plotly.graph_objects as go
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
# Simulate realistic support ticket data
np.random.seed(42)
n_tickets = 500
data = pd.DataFrame({
'response_time_hours': np.concatenate([
np.random.exponential(2, 150), # Low priority
np.random.exponential(1.2, 200), # Medium priority
np.random.exponential(0.5, 150) # High priority
]),
'priority': ['Low']*150 + ['Medium']*200 + ['High']*150,
'timezone': np.random.choice(['Americas', 'EMEA', 'APAC'], n_tickets)
})
fig = go.Figure()
colors = {'Low': '#95E1D3', 'Medium': '#F38181', 'High': '#AA4465'}
priorities = ['Low', 'Medium', 'High']
for priority in priorities:
subset = data[data['priority'] == priority]
fig.add_trace(go.Violin(
x=subset['timezone'],
y=subset['response_time_hours'],
name=priority,
legendgroup=priority,
scalegroup=priority,
line_color=colors[priority],
fillcolor=colors[priority],
opacity=0.6,
box_visible=True,
meanline_visible=True,
points='outliers',
hovertemplate=f'<b>{priority} Priority</b><br>' +
'Region: %{x}<br>' +
'Response Time: %{y:.1f}h<extra></extra>'
))
fig.update_layout(
title='Support Response Time Analysis by Priority and Region',
xaxis_title='Geographic Region',
yaxis_title='Response Time (hours)',
violinmode='group',
height=600,
font=dict(size=12),
hovermode='closest',
legend=dict(
title='Priority Level',
orientation='h',
yanchor='bottom',
y=1.02,
xanchor='right',
x=1
)
)
# Add SLA reference line
fig.add_hline(
y=4,
line_dash="dash",
line_color="red",
annotation_text="4h SLA Target",
annotation_position="right"
)
fig.show()
This visualization immediately reveals that APAC response times skew higher for low-priority tickets and that you’re consistently missing SLA for medium-priority tickets across all regions.
Best Practices and Tips
When to use violin plots: Choose them over box plots when distribution shape matters—detecting bimodal patterns, comparing skewness between groups, or identifying density clusters. Avoid them for small datasets (n < 20) where kernel density estimation becomes unreliable.
Performance considerations: Violin plots are computationally expensive. For datasets over 10,000 points per group, set points=False and consider downsampling. Plotly handles interactivity well up to about 50,000 total points before browser performance degrades.
Color accessibility: Always use color palettes that work for colorblind users. Plotly’s default palette is decent, but test with tools like Color Oracle. Add patterns or textures for critical distinctions when color alone isn’t sufficient.
Statistical rigor: Violin plots show distributions beautifully but don’t replace statistical tests. They’re exploratory tools. When you spot differences, confirm them with appropriate hypothesis tests.
Bandwidth selection: Plotly uses automatic bandwidth selection for kernel density estimation. For unusual distributions, you might need to preprocess data or switch to histograms. There’s no exposed bandwidth parameter in Plotly’s violin plots.
Export your plots for presentations using fig.write_html('plot.html') for interactive versions or fig.write_image('plot.png', width=1200, height=600) for static images (requires kaleido: pip install kaleido).