How to Create a Violin Plot in Plotly

Violin plots are superior to box plots for one simple reason: they show you the actual distribution shape. A box plot reduces your data to five numbers (min, Q1, median, Q3, max), hiding whether your...

Key Insights

  • Violin plots combine box plots with kernel density estimation to reveal distribution shapes that box plots miss—critical for detecting bimodal distributions or skewness in your data
  • Plotly’s interactive violin plots let users hover for exact values and zoom into regions of interest, making them superior to static matplotlib alternatives for exploratory analysis
  • Use plotly.express for quick prototypes and plotly.graph_objects when you need fine-grained control over styling, annotations, and complex multi-group comparisons

Introduction to Violin Plots

Violin plots are superior to box plots for one simple reason: they show you the actual distribution shape. A box plot reduces your data to five numbers (min, Q1, median, Q3, max), hiding whether your distribution is bimodal, skewed, or has unusual density patterns. Violin plots solve this by overlaying a kernel density estimation on both sides of the box plot, creating a symmetrical shape that reveals the full story.

Use violin plots when distribution shape matters—analyzing user behavior patterns, comparing experimental groups where you suspect multimodal distributions, or examining sensor data for anomalies. Plotly’s implementation adds interactive hover tooltips, zooming, and panning, making it trivial to explore outliers and density patterns without writing additional code.

Basic Violin Plot Setup

Install Plotly if you haven’t already:

pip install plotly pandas

Plotly offers two APIs: plotly.express for rapid prototyping and plotly.graph_objects for detailed control. Start with Express—it’s faster and handles most use cases.

import plotly.express as px
import pandas as pd
import numpy as np

# Generate sample data
np.random.seed(42)
data = pd.DataFrame({
    'values': np.concatenate([
        np.random.normal(100, 15, 200),
        np.random.normal(130, 10, 150)
    ]),
    'category': ['Group A'] * 200 + ['Group B'] * 150
})

# Create basic violin plot
fig = px.violin(data, y='values', x='category', box=True, points='all')
fig.show()

This creates an interactive violin plot with embedded box plots (box=True) and all data points visible (points='all'). The bimodal nature of our combined distribution becomes immediately apparent—something a box plot would completely obscure.

For more control, use graph_objects:

import plotly.graph_objects as go

fig = go.Figure()
fig.add_trace(go.Violin(
    y=data[data['category'] == 'Group A']['values'],
    name='Group A',
    box_visible=True,
    meanline_visible=True
))

fig.update_layout(
    title='Distribution Analysis',
    yaxis_title='Values',
    showlegend=True
)
fig.show()

Customizing Violin Plot Appearance

Violin plots need visual clarity. Default settings rarely cut it for production dashboards or publications.

import plotly.express as px
import pandas as pd
import numpy as np

# Create richer dataset
np.random.seed(42)
data = pd.DataFrame({
    'score': np.concatenate([
        np.random.gamma(2, 20, 100),
        np.random.gamma(5, 15, 100),
        np.random.gamma(3, 25, 100)
    ]),
    'group': ['Control'] * 100 + ['Treatment A'] * 100 + ['Treatment B'] * 100
})

fig = px.violin(
    data, 
    y='score', 
    x='group',
    box=True,           # Show box plot inside
    points='outliers',  # Only show outliers as points
    color='group',      # Color by group
    color_discrete_map={
        'Control': '#636EFA',
        'Treatment A': '#EF553B',
        'Treatment B': '#00CC96'
    }
)

# Customize appearance
fig.update_traces(
    meanline_visible=True,  # Show mean line
    width=0.6,              # Violin width
    opacity=0.7,            # Transparency
    marker=dict(size=3)     # Point size
)

fig.update_layout(
    title='Treatment Effect Analysis',
    yaxis_title='Performance Score',
    xaxis_title='Experimental Group',
    font=dict(size=12),
    height=500,
    showlegend=False  # X-axis labels are sufficient
)

fig.show()

The points parameter deserves attention. Options include:

  • 'all': Show every data point (use for small datasets)
  • 'outliers': Only show outliers (best for medium datasets)
  • False: Hide points entirely (cleanest for large datasets)

Setting meanline_visible=True adds a dashed line at the mean, complementing the median line from the box plot. This is crucial when comparing groups with different skewness.

Multi-Group Violin Plots

Comparing multiple groups side-by-side reveals patterns impossible to spot in separate plots.

import plotly.graph_objects as go
import pandas as pd
import numpy as np

# Simulate A/B/C test results across different user segments
np.random.seed(42)
data = pd.DataFrame({
    'conversion_time': np.concatenate([
        np.random.lognormal(3, 0.5, 80),   # Variant A - Mobile
        np.random.lognormal(2.8, 0.6, 80), # Variant A - Desktop
        np.random.lognormal(2.5, 0.4, 80), # Variant B - Mobile
        np.random.lognormal(2.3, 0.5, 80), # Variant B - Desktop
        np.random.lognormal(2.7, 0.55, 80),# Variant C - Mobile
        np.random.lognormal(2.4, 0.45, 80) # Variant C - Desktop
    ]),
    'variant': ['A']*160 + ['B']*160 + ['C']*160,
    'platform': (['Mobile']*80 + ['Desktop']*80) * 3
})

fig = go.Figure()

colors = {'Mobile': '#FF6B6B', 'Desktop': '#4ECDC4'}

for platform in ['Mobile', 'Desktop']:
    for variant in ['A', 'B', 'C']:
        subset = data[(data['platform'] == platform) & (data['variant'] == variant)]
        fig.add_trace(go.Violin(
            x=subset['variant'],
            y=subset['conversion_time'],
            name=platform,
            legendgroup=platform,
            scalegroup=platform,
            line_color=colors[platform],
            box_visible=True,
            meanline_visible=True,
            showlegend=(variant == 'A')  # Only show legend once per platform
        ))

fig.update_layout(
    title='Conversion Time by Variant and Platform',
    xaxis_title='Test Variant',
    yaxis_title='Time to Conversion (seconds)',
    violinmode='group',  # Side-by-side violins
    height=500
)

fig.show()

The violinmode='group' parameter places violins side-by-side. Use violinmode='overlay' for overlapping violins, though this reduces clarity with more than two groups.

Advanced Customization

Split violins directly compare two groups by showing half of each distribution on opposite sides.

import plotly.graph_objects as go
import pandas as pd
import numpy as np

# Compare male vs female salary distributions across departments
np.random.seed(42)
data = pd.DataFrame({
    'salary': np.concatenate([
        np.random.normal(75000, 12000, 100),  # Engineering - Female
        np.random.normal(78000, 13000, 120),  # Engineering - Male
        np.random.normal(68000, 10000, 90),   # Marketing - Female
        np.random.normal(70000, 11000, 85),   # Marketing - Male
        np.random.normal(82000, 15000, 80),   # Sales - Female
        np.random.normal(85000, 16000, 95)    # Sales - Male
    ]),
    'department': ['Engineering']*220 + ['Marketing']*175 + ['Sales']*175,
    'gender': (['Female']*100 + ['Male']*120) + (['Female']*90 + ['Male']*85) + (['Female']*80 + ['Male']*95)
})

fig = go.Figure()

fig.add_trace(go.Violin(
    x=data[data['gender'] == 'Female']['department'],
    y=data[data['gender'] == 'Female']['salary'],
    name='Female',
    side='negative',
    line_color='#FF6B6B',
    fillcolor='#FF6B6B',
    opacity=0.6,
    box_visible=True,
    meanline_visible=True,
    hovertemplate='<b>Female</b><br>Dept: %{x}<br>Salary: $%{y:,.0f}<extra></extra>'
))

fig.add_trace(go.Violin(
    x=data[data['gender'] == 'Male']['department'],
    y=data[data['gender'] == 'Male']['salary'],
    name='Male',
    side='positive',
    line_color='#4ECDC4',
    fillcolor='#4ECDC4',
    opacity=0.6,
    box_visible=True,
    meanline_visible=True,
    hovertemplate='<b>Male</b><br>Dept: %{x}<br>Salary: $%{y:,.0f}<extra></extra>'
))

fig.update_layout(
    title='Salary Distribution by Department and Gender',
    xaxis_title='Department',
    yaxis_title='Annual Salary (USD)',
    violinmode='overlay',
    height=500
)

fig.show()

Custom hover templates (hovertemplate) transform user experience. The <extra></extra> removes the default secondary box. Use %{y:,.0f} for formatted numbers and <b> tags for bold text.

Real-World Use Case

Here’s a complete example analyzing customer support response times across ticket priorities and time zones.

import plotly.graph_objects as go
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Simulate realistic support ticket data
np.random.seed(42)
n_tickets = 500

data = pd.DataFrame({
    'response_time_hours': np.concatenate([
        np.random.exponential(2, 150),    # Low priority
        np.random.exponential(1.2, 200),  # Medium priority
        np.random.exponential(0.5, 150)   # High priority
    ]),
    'priority': ['Low']*150 + ['Medium']*200 + ['High']*150,
    'timezone': np.random.choice(['Americas', 'EMEA', 'APAC'], n_tickets)
})

fig = go.Figure()

colors = {'Low': '#95E1D3', 'Medium': '#F38181', 'High': '#AA4465'}
priorities = ['Low', 'Medium', 'High']

for priority in priorities:
    subset = data[data['priority'] == priority]
    fig.add_trace(go.Violin(
        x=subset['timezone'],
        y=subset['response_time_hours'],
        name=priority,
        legendgroup=priority,
        scalegroup=priority,
        line_color=colors[priority],
        fillcolor=colors[priority],
        opacity=0.6,
        box_visible=True,
        meanline_visible=True,
        points='outliers',
        hovertemplate=f'<b>{priority} Priority</b><br>' +
                      'Region: %{x}<br>' +
                      'Response Time: %{y:.1f}h<extra></extra>'
    ))

fig.update_layout(
    title='Support Response Time Analysis by Priority and Region',
    xaxis_title='Geographic Region',
    yaxis_title='Response Time (hours)',
    violinmode='group',
    height=600,
    font=dict(size=12),
    hovermode='closest',
    legend=dict(
        title='Priority Level',
        orientation='h',
        yanchor='bottom',
        y=1.02,
        xanchor='right',
        x=1
    )
)

# Add SLA reference line
fig.add_hline(
    y=4, 
    line_dash="dash", 
    line_color="red",
    annotation_text="4h SLA Target",
    annotation_position="right"
)

fig.show()

This visualization immediately reveals that APAC response times skew higher for low-priority tickets and that you’re consistently missing SLA for medium-priority tickets across all regions.

Best Practices and Tips

When to use violin plots: Choose them over box plots when distribution shape matters—detecting bimodal patterns, comparing skewness between groups, or identifying density clusters. Avoid them for small datasets (n < 20) where kernel density estimation becomes unreliable.

Performance considerations: Violin plots are computationally expensive. For datasets over 10,000 points per group, set points=False and consider downsampling. Plotly handles interactivity well up to about 50,000 total points before browser performance degrades.

Color accessibility: Always use color palettes that work for colorblind users. Plotly’s default palette is decent, but test with tools like Color Oracle. Add patterns or textures for critical distinctions when color alone isn’t sufficient.

Statistical rigor: Violin plots show distributions beautifully but don’t replace statistical tests. They’re exploratory tools. When you spot differences, confirm them with appropriate hypothesis tests.

Bandwidth selection: Plotly uses automatic bandwidth selection for kernel density estimation. For unusual distributions, you might need to preprocess data or switch to histograms. There’s no exposed bandwidth parameter in Plotly’s violin plots.

Export your plots for presentations using fig.write_html('plot.html') for interactive versions or fig.write_image('plot.png', width=1200, height=600) for static images (requires kaleido: pip install kaleido).

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.