How to Create a Ridgeline Plot in ggplot2

Ridgeline plots—also called joyplots—display multiple density distributions stacked vertically with controlled overlap. They're named after the iconic Unknown Pleasures album cover by Joy Division....

Key Insights

  • Ridgeline plots excel at visualizing distribution changes across ordered categories like time periods or rankings, making patterns immediately visible that would be obscured in traditional side-by-side plots
  • The ggridges package extends ggplot2 with specialized geoms that handle the complex layering and spacing calculations automatically, but you need to understand the scale parameter to control overlap effectively
  • Always order your categories meaningfully (chronologically or by a statistical measure like median) and limit yourself to 15-20 categories maximum to maintain readability

What Are Ridgeline Plots and When Should You Use Them?

Ridgeline plots—also called joyplots—display multiple density distributions stacked vertically with controlled overlap. They’re named after the iconic Unknown Pleasures album cover by Joy Division. Unlike faceted density plots or violin plots, ridgeline plots leverage vertical space efficiently and make trends across categories immediately apparent.

Use ridgeline plots when you need to compare distributions across an ordered categorical variable: temperature patterns across months, response times across different hours of the day, or price distributions across product categories ranked by median price. The key word is “ordered”—ridgeline plots work best when your categories have a natural sequence.

The ggridges package by Claus O. Wilke provides the specialized geoms needed to create these visualizations in R’s ggplot2 ecosystem.

Setup and Required Packages

Install the necessary packages if you haven’t already:

install.packages(c("ggplot2", "ggridges", "dplyr"))

Load the libraries:

library(ggplot2)
library(ggridges)
library(dplyr)

Creating Your First Ridgeline Plot

Let’s start with the diamonds dataset to visualize price distributions across different cut qualities:

ggplot(diamonds, aes(x = price, y = cut)) +
  geom_density_ridges() +
  theme_ridges()

This creates a basic ridgeline plot with default settings. The theme_ridges() function provides a clean theme optimized for ridgeline plots, removing unnecessary grid lines and adjusting spacing.

The syntax follows standard ggplot2 conventions: map your continuous variable to the x-axis and your categorical variable to the y-axis. The geom_density_ridges() function handles the density calculation and stacking automatically.

For better readability, let’s add labels:

ggplot(diamonds, aes(x = price, y = cut)) +
  geom_density_ridges() +
  theme_ridges() +
  labs(
    title = "Diamond Price Distributions by Cut Quality",
    x = "Price (USD)",
    y = "Cut Quality"
  )

Customizing Aesthetics for Better Communication

The default plot works, but customization makes your visualization more effective. The scale parameter controls overlap—higher values create more separation, lower values increase overlap:

ggplot(diamonds, aes(x = price, y = cut, fill = cut)) +
  geom_density_ridges(
    alpha = 0.7,
    scale = 1.5,
    rel_min_height = 0.01
  ) +
  theme_ridges() +
  scale_fill_viridis_d() +
  labs(
    title = "Diamond Price Distributions by Cut Quality",
    x = "Price (USD)",
    y = "Cut Quality"
  ) +
  theme(legend.position = "none")

Here’s what each parameter does:

  • alpha = 0.7: Sets transparency to 70%, allowing you to see overlapping regions
  • scale = 1.5: Increases overlap by 50% compared to default (scale = 1 means no overlap, scale = 2 means each ridge is twice as tall)
  • rel_min_height = 0.01: Trims the long tails by cutting off densities below 1% of the maximum height

The scale_fill_viridis_d() function applies a colorblind-friendly palette. Remove the legend since the y-axis already labels each category.

For gradient fills based on the x-axis value, use geom_density_ridges_gradient():

ggplot(diamonds, aes(x = price, y = cut, fill = after_stat(x))) +
  geom_density_ridges_gradient(
    alpha = 0.8,
    scale = 1.5
  ) +
  theme_ridges() +
  scale_fill_viridis_c(name = "Price", option = "plasma") +
  labs(
    title = "Diamond Price Distributions by Cut Quality",
    x = "Price (USD)",
    y = "Cut Quality"
  )

This creates a gradient where color intensity corresponds to price, making it easier to identify where high-priced diamonds cluster.

Advanced Techniques for Deeper Insights

Add quantile lines to show median and quartile positions:

ggplot(diamonds, aes(x = price, y = cut, fill = cut)) +
  geom_density_ridges(
    alpha = 0.7,
    scale = 1.5,
    quantile_lines = TRUE,
    quantiles = 2
  ) +
  theme_ridges() +
  scale_fill_viridis_d() +
  labs(
    title = "Diamond Price Distributions with Medians",
    x = "Price (USD)",
    y = "Cut Quality"
  ) +
  theme(legend.position = "none")

Setting quantiles = 2 draws a line at the median. Use quantiles = c(0.25, 0.5, 0.75) to show quartiles.

For datasets with discrete or sparse data, add jittered points:

ggplot(diamonds %>% sample_n(2000), aes(x = price, y = cut, fill = cut)) +
  geom_density_ridges(
    alpha = 0.6,
    scale = 1.5,
    jittered_points = TRUE,
    point_alpha = 0.3,
    point_size = 0.5
  ) +
  theme_ridges() +
  scale_fill_viridis_d() +
  labs(
    title = "Diamond Price Distributions with Sample Points",
    x = "Price (USD)",
    y = "Cut Quality"
  ) +
  theme(legend.position = "none")

Note that we sample 2000 points to avoid overplotting. The points appear below each density curve, showing the actual data distribution.

Real-World Example: Temperature Patterns Across Months

Let’s create a practical example analyzing temperature distributions. We’ll simulate monthly temperature data:

set.seed(42)
temperature_data <- data.frame(
  month = rep(month.name, each = 100),
  temperature = c(
    rnorm(100, 32, 8),  # January
    rnorm(100, 35, 9),  # February
    rnorm(100, 45, 10), # March
    rnorm(100, 58, 8),  # April
    rnorm(100, 68, 7),  # May
    rnorm(100, 77, 6),  # June
    rnorm(100, 82, 5),  # July
    rnorm(100, 80, 5),  # August
    rnorm(100, 72, 6),  # September
    rnorm(100, 60, 8),  # October
    rnorm(100, 48, 9),  # November
    rnorm(100, 36, 8)   # December
  )
)

# Ensure months are ordered correctly
temperature_data$month <- factor(
  temperature_data$month,
  levels = month.name
)

# Create the ridgeline plot
ggplot(temperature_data, aes(x = temperature, y = month, fill = after_stat(x))) +
  geom_density_ridges_gradient(
    scale = 2.5,
    rel_min_height = 0.01,
    gradient_lwd = 0.5
  ) +
  theme_ridges(grid = FALSE) +
  scale_fill_viridis_c(name = "Temp (°F)", option = "inferno") +
  labs(
    title = "Annual Temperature Distribution Patterns",
    subtitle = "Showing seasonal variation and spread",
    x = "Temperature (°F)",
    y = NULL
  ) +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    axis.text = element_text(size = 11)
  )

This visualization immediately reveals seasonal patterns: summer months (June-August) show tight, high-temperature distributions, while winter months show wider spreads at lower temperatures. The gradient fill emphasizes the temperature values themselves.

Common Pitfalls and Best Practices

Too Many Categories: Ridgeline plots become unreadable beyond 20 categories. If you have more, consider faceting by a higher-level grouping or filtering to the most relevant categories.

Unordered Categories: Always order categories meaningfully. For non-temporal data, order by a statistic like median:

diamonds_ordered <- diamonds %>%
  group_by(cut) %>%
  mutate(median_price = median(price)) %>%
  ungroup() %>%
  mutate(cut = reorder(cut, median_price))

Excessive Overlap: While overlap is the point, too much (scale > 3) creates confusion. Start with scale = 1.5 to 2.5 and adjust based on your data.

Wrong Plot Choice: Ridgeline plots aren’t ideal for unordered categories or when you need precise value comparisons. Use violin plots or boxplots for unordered categories, and use faceted histograms when exact values matter more than overall patterns.

Ignoring Tail Behavior: Long-tailed distributions can dominate the plot. Use rel_min_height to trim tails or consider log-transforming your data if appropriate.

Ridgeline plots are powerful tools for revealing distribution patterns across ordered categories. Master the scale parameter, order your categories thoughtfully, and use color strategically to create visualizations that communicate complex distributional information at a glance.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.