How to Create a Density Plot in ggplot2

Density plots represent the distribution of a continuous variable as a smooth curve rather than discrete bins. While histograms divide data into bins and count observations, density plots use kernel...

Key Insights

  • Density plots smooth histograms into continuous curves using kernel density estimation, making them ideal for comparing distributions and identifying patterns in large datasets without arbitrary bin choices.
  • The adjust parameter in geom_density() controls bandwidth smoothing—values below 1 reveal more detail while values above 1 create smoother curves; start with the default and adjust based on your data’s characteristics.
  • Overlaying multiple density plots with semi-transparent fills (alpha = 0.5-0.7) effectively reveals distribution differences between groups, but limit comparisons to 3-4 categories to maintain readability.

Introduction to Density Plots

Density plots represent the distribution of a continuous variable as a smooth curve rather than discrete bins. While histograms divide data into bins and count observations, density plots use kernel density estimation to create a continuous probability density function. The area under the curve equals one, making it a true probability distribution.

Use density plots when you need to compare multiple distributions simultaneously, when your dataset is large enough that binning becomes arbitrary, or when you want to emphasize the overall shape rather than exact counts. They’re particularly effective for identifying modality (single vs. multiple peaks), skewness, and outliers.

Here’s a direct comparison:

library(ggplot2)
library(gridExtra)

# Create histogram
p1 <- ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(bins = 10, fill = "steelblue", color = "white") +
  labs(title = "Histogram", y = "Count")

# Create density plot
p2 <- ggplot(mtcars, aes(x = mpg)) +
  geom_density(fill = "steelblue", alpha = 0.7) +
  labs(title = "Density Plot", y = "Density")

grid.arrange(p1, p2, ncol = 2)

The density plot reveals the distribution’s shape without the visual noise of bin edges. Notice how the smooth curve makes it easier to identify the slight bimodal tendency in the MPG data.

Basic Density Plot with geom_density()

Creating a density plot in ggplot2 requires three components: your data frame, aesthetic mappings that specify which variable to plot, and the geom_density() layer that performs the kernel density estimation.

The basic syntax follows ggplot2’s grammar of graphics:

library(ggplot2)

# Basic density plot
ggplot(data = iris, aes(x = Sepal.Length)) +
  geom_density()

This produces a simple black outline showing the distribution of sepal lengths. The x-axis represents the variable’s values, while the y-axis shows density—the probability per unit on the x-axis.

You can add fill to make the distribution more visible:

ggplot(iris, aes(x = Sepal.Length)) +
  geom_density(fill = "coral", alpha = 0.5) +
  labs(
    title = "Distribution of Iris Sepal Length",
    x = "Sepal Length (cm)",
    y = "Density"
  )

The alpha parameter controls transparency, making the filled area less opaque. This becomes crucial when overlaying multiple distributions.

Customizing Appearance

The geom_density() function offers several parameters to control appearance and calculation:

# Demonstrate different customization options
ggplot(mtcars, aes(x = mpg)) +
  geom_density(
    fill = "#69b3a2",      # Fill color
    color = "#2d5f4f",     # Outline color
    alpha = 0.6,           # Transparency
    linewidth = 1.2        # Line thickness
  ) +
  theme_minimal()

The adjust parameter controls bandwidth—the degree of smoothing applied to the curve. Default is 1; smaller values show more detail, larger values create smoother curves:

library(patchwork)

# Too smooth
p1 <- ggplot(faithful, aes(x = eruptions)) +
  geom_density(adjust = 2, fill = "steelblue", alpha = 0.6) +
  labs(title = "adjust = 2 (oversmoothed)")

# Default
p2 <- ggplot(faithful, aes(x = eruptions)) +
  geom_density(adjust = 1, fill = "steelblue", alpha = 0.6) +
  labs(title = "adjust = 1 (default)")

# More detail
p3 <- ggplot(faithful, aes(x = eruptions)) +
  geom_density(adjust = 0.5, fill = "steelblue", alpha = 0.6) +
  labs(title = "adjust = 0.5 (detailed)")

p1 / p2 / p3

For the Old Faithful eruption data, which has a clear bimodal distribution, adjust = 2 obscures the two peaks, while adjust = 0.5 reveals them clearly. The default works well for most cases, but always visualize your data with different values.

Overlapping Multiple Distributions

Comparing distributions across groups is where density plots truly shine. Map your grouping variable to the fill aesthetic:

# Compare MPG distributions across cylinder counts
ggplot(mtcars, aes(x = mpg, fill = factor(cyl))) +
  geom_density(alpha = 0.6) +
  scale_fill_manual(
    values = c("4" = "#2ecc71", "6" = "#3498db", "8" = "#e74c3c"),
    name = "Cylinders"
  ) +
  labs(
    title = "MPG Distribution by Cylinder Count",
    x = "Miles Per Gallon",
    y = "Density"
  ) +
  theme_minimal()

The overlapping areas show where distributions coincide. Here, you can clearly see that 4-cylinder cars cluster at higher MPG values, while 8-cylinder cars center around lower values.

For more than three groups, consider faceting instead:

# Alternative: faceted density plots
ggplot(iris, aes(x = Sepal.Length, fill = Species)) +
  geom_density(alpha = 0.7) +
  facet_wrap(~Species, ncol = 1) +
  scale_fill_brewer(palette = "Set2") +
  theme_minimal() +
  theme(legend.position = "none")

Faceting eliminates overlap confusion and makes it easier to compare shapes directly, especially when distributions have similar ranges.

Advanced Styling and Annotations

Professional density plots often include reference lines for central tendency and custom styling:

# Calculate statistics
mean_mpg <- mean(mtcars$mpg)
median_mpg <- median(mtcars$mpg)

# Create polished plot with annotations
ggplot(mtcars, aes(x = mpg)) +
  geom_density(fill = "#3498db", alpha = 0.6, color = "#2c3e50", linewidth = 1) +
  geom_vline(
    aes(xintercept = mean_mpg, linetype = "Mean"),
    color = "#e74c3c",
    linewidth = 1
  ) +
  geom_vline(
    aes(xintercept = median_mpg, linetype = "Median"),
    color = "#f39c12",
    linewidth = 1
  ) +
  scale_linetype_manual(
    name = "Statistics",
    values = c("Mean" = "dashed", "Median" = "dotted")
  ) +
  labs(
    title = "Distribution of Fuel Efficiency in 1974 Motor Trend Cars",
    subtitle = paste0("Mean: ", round(mean_mpg, 1), " mpg | Median: ", round(median_mpg, 1), " mpg"),
    x = "Miles Per Gallon",
    y = "Density",
    caption = "Data: 1974 Motor Trend US magazine"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(color = "#7f8c8d"),
    panel.grid.minor = element_blank()
  )

You can also combine density plots with rug plots to show individual data points:

ggplot(iris, aes(x = Petal.Length, fill = Species)) +
  geom_density(alpha = 0.5) +
  geom_rug(aes(color = Species), alpha = 0.7) +
  scale_fill_brewer(palette = "Dark2") +
  scale_color_brewer(palette = "Dark2") +
  theme_minimal()

The rug plot adds tick marks along the x-axis showing actual observations, helping readers understand data density and identify potential outliers.

Common Use Cases and Best Practices

When to use density plots:

  • Comparing 2-4 distributions simultaneously
  • Large datasets (n > 100) where histogram bins become arbitrary
  • Emphasizing distribution shape over exact frequencies
  • Identifying multimodality or skewness

When to avoid them:

  • Small datasets (n < 30) where smoothing obscures actual data
  • Discrete or categorical variables
  • When exact counts matter more than overall shape
  • Audiences unfamiliar with probability density interpretation

Bandwidth selection matters significantly:

# Problematic bandwidth choices
p1 <- ggplot(faithful, aes(x = eruptions)) +
  geom_density(adjust = 0.2, fill = "steelblue", alpha = 0.6) +
  labs(title = "Too Rough (adjust = 0.2)")

p2 <- ggplot(faithful, aes(x = eruptions)) +
  geom_density(adjust = 1, fill = "steelblue", alpha = 0.6) +
  labs(title = "Appropriate (adjust = 1)")

p3 <- ggplot(faithful, aes(x = eruptions)) +
  geom_density(adjust = 5, fill = "steelblue", alpha = 0.6) +
  labs(title = "Too Smooth (adjust = 5)")

p1 + p2 + p3

The first plot shows excessive noise, the third obscures the bimodal nature, while the middle reveals the true distribution structure. Always experiment with bandwidth when first exploring your data.

Key recommendations:

  • Start with default bandwidth (adjust = 1) and modify only if needed
  • Use transparency (alpha = 0.5-0.7) when overlaying distributions
  • Limit overlapping plots to 3-4 groups maximum
  • Always label axes with units
  • Consider your audience’s statistical literacy—add explanatory subtitles if needed
  • Combine with summary statistics (mean, median, quartiles) for context

Density plots are powerful tools for distribution analysis, but they require thoughtful parameterization. The smoothing inherent in kernel density estimation can both reveal patterns and hide important details, so always validate your visualization choices against the underlying data structure.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.