R ggplot2 - Violin Plot
• Violin plots combine box plots with kernel density estimation to show the full distribution shape of your data, making them superior for revealing multimodal distributions and data density patterns...
Key Insights
• Violin plots combine box plots with kernel density estimation to show the full distribution shape of your data, making them superior for revealing multimodal distributions and data density patterns that box plots miss
• The geom_violin() function in ggplot2 provides extensive customization options including trim, scale, and draw_quantiles parameters that control how distributions are rendered and compared across groups
• Layering violin plots with box plots, jitter points, or statistical summaries creates publication-ready visualizations that communicate both distribution shape and key statistical measures simultaneously
Understanding Violin Plots
Violin plots display the probability density of data at different values, mirrored to create a symmetrical shape. Unlike box plots that only show summary statistics (median, quartiles, outliers), violin plots reveal the entire distribution including peaks, valleys, and multiple modes.
The width of the violin at any point represents the density of observations at that value. Wider sections indicate more data points, while narrow sections show fewer observations.
library(ggplot2)
library(dplyr)
# Create sample dataset
set.seed(123)
data <- data.frame(
group = rep(c("A", "B", "C"), each = 100),
value = c(rnorm(100, mean = 10, sd = 2),
rnorm(100, mean = 15, sd = 3),
c(rnorm(50, mean = 12, sd = 1.5), rnorm(50, mean = 18, sd = 1.5)))
)
# Basic violin plot
ggplot(data, aes(x = group, y = value)) +
geom_violin() +
theme_minimal()
Essential Violin Plot Parameters
The geom_violin() function accepts several parameters that control distribution rendering. The trim parameter determines whether tails extend to extreme values or stop at the range of data. The scale parameter controls how violins are scaled relative to each other.
# Compare trim options
p1 <- ggplot(data, aes(x = group, y = value)) +
geom_violin(trim = TRUE) +
labs(title = "trim = TRUE") +
theme_minimal()
p2 <- ggplot(data, aes(x = group, y = value)) +
geom_violin(trim = FALSE) +
labs(title = "trim = FALSE") +
theme_minimal()
# Scale options: area (default), count, width
p3 <- ggplot(data, aes(x = group, y = value)) +
geom_violin(scale = "area") +
labs(title = "scale = 'area'") +
theme_minimal()
p4 <- ggplot(data, aes(x = group, y = value)) +
geom_violin(scale = "count") +
labs(title = "scale = 'count'") +
theme_minimal()
The scale = "area" option makes all violins have the same area, useful for comparing shapes. The scale = "count" option scales violin width by the number of observations, making sample size immediately visible. The scale = "width" option sets the maximum width of all violins to be the same.
Adding Quantiles and Statistical Overlays
Violin plots benefit from overlaying statistical summaries. The draw_quantiles parameter adds horizontal lines at specified quantiles, while layering box plots or points adds reference information.
# Violin with quartile lines
ggplot(data, aes(x = group, y = value)) +
geom_violin(draw_quantiles = c(0.25, 0.5, 0.75)) +
theme_minimal() +
labs(y = "Value", x = "Group")
# Violin with embedded box plot
ggplot(data, aes(x = group, y = value)) +
geom_violin(fill = "lightblue", alpha = 0.5) +
geom_boxplot(width = 0.1, fill = "white", outlier.shape = NA) +
theme_minimal()
# Violin with median points and error bars
ggplot(data, aes(x = group, y = value)) +
geom_violin(fill = "gray80") +
stat_summary(fun = median, geom = "point", size = 3, color = "red") +
stat_summary(fun.data = mean_se, geom = "errorbar",
width = 0.1, color = "red") +
theme_minimal()
Customizing Appearance
Control fill colors, transparency, and borders to create polished visualizations. Use aesthetic mappings to color violins by groups automatically.
# Color by group
ggplot(data, aes(x = group, y = value, fill = group)) +
geom_violin(alpha = 0.7) +
scale_fill_manual(values = c("#E69F00", "#56B4E9", "#009E73")) +
theme_minimal() +
theme(legend.position = "none")
# Custom styling
ggplot(data, aes(x = group, y = value)) +
geom_violin(fill = "#4ECDC4", color = "#1A535C",
size = 1, alpha = 0.8) +
theme_minimal() +
theme(
panel.grid.major.x = element_blank(),
axis.text = element_text(size = 12),
axis.title = element_text(size = 14, face = "bold")
)
Split Violins for Comparison
Split violins display two distributions side-by-side within the same x-axis category, ideal for before/after or treatment/control comparisons.
# Create paired data
paired_data <- data.frame(
category = rep(c("Control", "Treatment"), each = 200),
timepoint = rep(rep(c("Before", "After"), each = 100), 2),
value = c(rnorm(100, 10, 2), rnorm(100, 11, 2), # Control
rnorm(100, 10, 2), rnorm(100, 14, 2.5)) # Treatment
)
# Split violin plot
ggplot(paired_data, aes(x = category, y = value, fill = timepoint)) +
geom_violin(position = position_dodge(width = 0.9)) +
scale_fill_manual(values = c("#F8766D", "#00BFC4")) +
theme_minimal() +
labs(y = "Measurement", x = "Group", fill = "Timepoint")
Horizontal Violins
Flip violins horizontally for better label readability or when dealing with many categories.
ggplot(data, aes(x = value, y = group)) +
geom_violin(fill = "steelblue", alpha = 0.7) +
geom_boxplot(width = 0.1, fill = "white", outlier.shape = NA) +
theme_minimal() +
labs(x = "Value", y = "Group")
Faceted Violin Plots
Use faceting to compare distributions across multiple dimensions simultaneously.
# Create multi-factor dataset
multi_data <- data.frame(
treatment = rep(c("Drug A", "Drug B"), each = 300),
dose = rep(rep(c("Low", "Medium", "High"), each = 100), 2),
response = c(rnorm(100, 10, 2), rnorm(100, 12, 2.5), rnorm(100, 14, 3),
rnorm(100, 11, 2), rnorm(100, 15, 2.5), rnorm(100, 18, 3))
)
ggplot(multi_data, aes(x = dose, y = response, fill = dose)) +
geom_violin(alpha = 0.7) +
facet_wrap(~treatment) +
theme_minimal() +
theme(legend.position = "none") +
labs(y = "Response", x = "Dose Level")
Combining with Raw Data Points
Overlay individual observations to show both distribution and raw data, particularly useful for smaller datasets.
# Violin with jittered points
ggplot(data, aes(x = group, y = value)) +
geom_violin(fill = "gray90", color = "gray50") +
geom_jitter(width = 0.1, alpha = 0.3, size = 1.5) +
stat_summary(fun = median, geom = "point",
size = 3, color = "red", shape = 18) +
theme_minimal()
# Violin with sina plot (density-aware jitter)
library(ggforce)
ggplot(data, aes(x = group, y = value)) +
geom_violin(fill = "lightblue", alpha = 0.5) +
geom_sina(alpha = 0.4, size = 1) +
theme_minimal()
Real-World Example: Clinical Trial Data
# Simulate clinical trial data
set.seed(456)
clinical_data <- data.frame(
patient_id = 1:240,
treatment = rep(c("Placebo", "Low Dose", "High Dose"), each = 80),
baseline_score = rnorm(240, 50, 10),
week_12_score = c(
rnorm(80, 48, 12), # Placebo: slight decline
rnorm(80, 55, 10), # Low dose: improvement
rnorm(80, 62, 8) # High dose: greater improvement
)
)
# Calculate change from baseline
clinical_data$change <- clinical_data$week_12_score - clinical_data$baseline_score
# Comprehensive visualization
ggplot(clinical_data, aes(x = treatment, y = change, fill = treatment)) +
geom_violin(alpha = 0.6, trim = FALSE) +
geom_boxplot(width = 0.15, fill = "white", outlier.shape = NA) +
geom_jitter(width = 0.05, alpha = 0.2, size = 1) +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
scale_fill_brewer(palette = "Set2") +
theme_minimal() +
theme(legend.position = "none") +
labs(
y = "Change from Baseline (points)",
x = "Treatment Group",
caption = "Dashed line indicates no change"
)
This visualization reveals that the high-dose group shows a clear positive shift with tight distribution, the low-dose group shows moderate improvement, and the placebo group clusters around zero change with wider variability. The violin shape immediately communicates the distribution characteristics that summary statistics alone would miss.