How to Create a Violin Plot in Seaborn
Violin plots are one of the most underutilized visualization tools in data science. While box plots show you quartiles and outliers, they hide the actual distribution shape. Histograms show...
Key Insights
- Violin plots combine box plots and kernel density estimation to show both summary statistics and the full distribution shape, making them superior to box plots for revealing multimodal distributions and subtle patterns in your data
- The
hueparameter enables powerful side-by-side comparisons by splitting violins or placing them adjacent to each other, essential for A/B testing and categorical analysis - Overlaying swarm or strip plots on violin plots provides the best of both worlds: distribution shape plus individual data point visibility, particularly valuable for smaller datasets
Introduction to Violin Plots
Violin plots are one of the most underutilized visualization tools in data science. While box plots show you quartiles and outliers, they hide the actual distribution shape. Histograms show distribution but lack statistical summaries. Violin plots solve both problems by combining a box plot with a kernel density estimation (KDE) plot rotated and mirrored on both sides.
The width of the violin at any point represents the density of data at that value. This makes violin plots particularly valuable when you’re dealing with multimodal distributions—something box plots completely miss. If your salary data has two peaks (junior and senior employees), a box plot shows you a meaningless average, while a violin plot reveals the true story.
Use violin plots when you need to compare distributions across multiple categories, when your data might have multiple modes, or when you want to communicate both the shape and statistical summary of your data in a single visualization.
Basic Violin Plot Setup
Let’s start with the fundamentals. You’ll need seaborn, matplotlib, and pandas. Seaborn handles the heavy lifting, matplotlib provides fine-tuned control, and pandas manages your data.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load sample dataset
tips = sns.load_dataset('tips')
# Create basic violin plot
plt.figure(figsize=(10, 6))
sns.violinplot(data=tips, x='day', y='total_bill')
plt.title('Distribution of Total Bill by Day')
plt.xlabel('Day of Week')
plt.ylabel('Total Bill ($)')
plt.tight_layout()
plt.show()
This creates a violin plot showing how total bills are distributed across different days. The width at each point tells you where data concentrates. Notice how Thursday shows a different distribution shape compared to weekend days—this is information a box plot would obscure.
The default violin plot includes a small box plot inside showing quartiles and median. This combination gives you both the distribution shape and traditional statistical measures.
Customizing Violin Plot Appearance
Seaborn provides extensive customization options. The inner parameter controls what appears inside the violin: ‘box’ (default), ‘quartile’ (lines at quartiles), ‘point’ (individual points), ‘stick’ (individual lines), or None.
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Different inner representations
sns.violinplot(data=tips, x='day', y='total_bill', inner='box', ax=axes[0, 0])
axes[0, 0].set_title('Inner: Box (Default)')
sns.violinplot(data=tips, x='day', y='total_bill', inner='quartile', ax=axes[0, 1])
axes[0, 1].set_title('Inner: Quartile Lines')
sns.violinplot(data=tips, x='day', y='total_bill', inner='stick', ax=axes[1, 0])
axes[1, 0].set_title('Inner: Stick (Individual Points)')
# Custom color palette
sns.violinplot(data=tips, x='day', y='total_bill', palette='Set2',
inner='box', ax=axes[1, 1])
axes[1, 1].set_title('Custom Color Palette')
plt.tight_layout()
plt.show()
The palette parameter accepts any matplotlib colormap or seaborn palette. For categorical comparisons, use qualitative palettes like ‘Set2’, ‘Pastel’, or ‘Dark2’. For ordered categories, consider sequential palettes.
Orientation matters for readability. When category names are long, switch to horizontal:
plt.figure(figsize=(10, 6))
sns.violinplot(data=tips, x='total_bill', y='day', orient='h', palette='muted')
plt.title('Horizontal Orientation for Better Label Readability')
plt.show()
Grouping and Categorical Comparisons
The real power of violin plots emerges when comparing distributions across multiple dimensions. The hue parameter adds a second categorical variable, creating split or side-by-side violins.
plt.figure(figsize=(12, 6))
# Split violins for direct comparison
sns.violinplot(data=tips, x='day', y='total_bill', hue='sex',
split=True, palette='Set1', inner='quartile')
plt.title('Total Bill Distribution by Day and Gender (Split)')
plt.legend(title='Gender', loc='upper right')
plt.tight_layout()
plt.show()
Split violins (split=True) place male and female distributions on opposite sides of the same vertical axis, making direct comparison intuitive. This is perfect for A/B testing scenarios where you want to see if treatment and control groups differ.
For more than two categories in your hue variable, use side-by-side violins:
plt.figure(figsize=(14, 6))
sns.violinplot(data=tips, x='day', y='total_bill', hue='time',
palette='coolwarm', inner='box')
plt.title('Total Bill Distribution by Day and Meal Time')
plt.legend(title='Meal Time')
plt.tight_layout()
plt.show()
Advanced Customization
Fine-tuning violin plots requires understanding bandwidth and scale parameters. The bw_adjust parameter controls KDE smoothing—lower values show more detail, higher values smooth out noise.
fig, axes = plt.subplots(1, 3, figsize=(16, 5))
# Different bandwidth adjustments
sns.violinplot(data=tips, x='day', y='total_bill', bw_adjust=0.5, ax=axes[0])
axes[0].set_title('bw_adjust=0.5 (More Detail)')
sns.violinplot(data=tips, x='day', y='total_bill', bw_adjust=1.0, ax=axes[1])
axes[1].set_title('bw_adjust=1.0 (Default)')
sns.violinplot(data=tips, x='day', y='total_bill', bw_adjust=2.0, ax=axes[2])
axes[2].set_title('bw_adjust=2.0 (Smoother)')
plt.tight_layout()
plt.show()
Overlaying swarm or strip plots adds individual data points, crucial for smaller datasets or when you need to show actual observations:
plt.figure(figsize=(12, 6))
# Violin plot with overlaid strip plot
sns.violinplot(data=tips, x='day', y='total_bill', inner=None,
palette='pastel', alpha=0.6)
sns.stripplot(data=tips, x='day', y='total_bill',
size=3, color='black', alpha=0.3)
plt.title('Violin Plot with Individual Data Points')
plt.xlabel('Day of Week')
plt.ylabel('Total Bill ($)')
plt.grid(axis='y', alpha=0.3, linestyle='--')
plt.tight_layout()
plt.show()
The alpha parameter on the violin plot makes it semi-transparent so underlying points remain visible. This combination works exceptionally well for datasets with 50-500 observations per category.
Practical Use Cases
Let’s examine a realistic scenario: analyzing employee salary distributions across departments and experience levels.
# Create realistic salary dataset
import numpy as np
np.random.seed(42)
departments = ['Engineering', 'Sales', 'Marketing', 'Support']
experience = ['Junior', 'Mid', 'Senior']
data = []
for dept in departments:
for exp in experience:
# Base salary varies by department and experience
if dept == 'Engineering':
base = {'Junior': 75000, 'Mid': 110000, 'Senior': 155000}
elif dept == 'Sales':
base = {'Junior': 65000, 'Mid': 95000, 'Senior': 140000}
elif dept == 'Marketing':
base = {'Junior': 60000, 'Mid': 85000, 'Senior': 120000}
else: # Support
base = {'Junior': 55000, 'Mid': 75000, 'Senior': 100000}
# Generate salaries with some variation
salaries = np.random.normal(base[exp], base[exp] * 0.15, 30)
for salary in salaries:
data.append({
'Department': dept,
'Experience': exp,
'Salary': max(salary, 40000) # Floor at 40k
})
salary_df = pd.DataFrame(data)
# Create comprehensive visualization
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
# Compare departments
sns.violinplot(data=salary_df, x='Department', y='Salary',
hue='Experience', palette='viridis',
inner='quartile', ax=axes[0])
axes[0].set_title('Salary Distribution by Department and Experience',
fontsize=14, fontweight='bold')
axes[0].set_ylabel('Annual Salary ($)', fontsize=12)
axes[0].set_xlabel('Department', fontsize=12)
axes[0].legend(title='Experience Level', loc='upper left')
axes[0].grid(axis='y', alpha=0.3, linestyle='--')
# Compare experience levels with split violins
engineering_df = salary_df[salary_df['Department'] == 'Engineering']
sns.violinplot(data=engineering_df, x='Experience', y='Salary',
palette='rocket', inner='box', ax=axes[1])
axes[1].set_title('Engineering Salary Distribution by Experience',
fontsize=14, fontweight='bold')
axes[1].set_ylabel('Annual Salary ($)', fontsize=12)
axes[1].set_xlabel('Experience Level', fontsize=12)
axes[1].grid(axis='y', alpha=0.3, linestyle='--')
plt.tight_layout()
plt.show()
# Print summary statistics
print("\nSalary Statistics by Department and Experience:")
print(salary_df.groupby(['Department', 'Experience'])['Salary'].describe()[['mean', '50%', 'std']])
This example reveals several insights: Engineering shows the widest salary ranges at senior levels, indicating performance-based variation. Sales senior roles have a bimodal distribution (base salary plus high performers with commissions). The violin shapes tell stories that summary statistics alone would miss.
Conclusion and Best Practices
Choose violin plots when distribution shape matters. If you only need median and quartiles, stick with box plots—they’re cleaner. If you need to show individual points with fewer than 50 observations, consider swarm plots instead.
Avoid these common mistakes: Don’t use violin plots with very small sample sizes (under 20 points)—the KDE becomes meaningless. Don’t compare violins with dramatically different sample sizes without noting this in your labels. Don’t over-smooth with high bw_adjust values, as you’ll hide real distribution features.
Always include context: label axes clearly, add titles that explain what’s being compared, and use color purposefully. Split violins work best for binary comparisons; use side-by-side for three or more categories. Overlay strip plots when showing individual points adds value without cluttering.
Violin plots excel at revealing the unexpected—bimodal distributions, skewness, and subtle patterns that aggregate statistics conceal. Master them, and you’ll communicate your data’s true story more effectively than 90% of data scientists who default to bar charts and box plots.