How to Plot the Gamma Distribution in R
The gamma distribution is a continuous probability distribution that appears constantly in applied statistics. If you're modeling wait times, insurance claim amounts, rainfall totals, or any...
Key Insights
- R provides four core gamma functions—
dgamma(),pgamma(),qgamma(), andrgamma()—that handle density, cumulative probability, quantiles, and random sampling respectively. - The shape parameter controls the distribution’s skewness (higher values approach normality), while the rate parameter controls spread—understanding this relationship is essential for effective visualization.
- Overlaying theoretical density curves on histograms of random samples validates your understanding and helps communicate statistical concepts clearly.
Introduction to the Gamma Distribution
The gamma distribution is a continuous probability distribution that appears constantly in applied statistics. If you’re modeling wait times, insurance claim amounts, rainfall totals, or any positive-valued data with right skew, you’re likely working with gamma-distributed data.
Two parameters define the gamma distribution: shape (often denoted α or k) and rate (β) or scale (θ = 1/β). The shape parameter controls the distribution’s form—values below 1 create exponential-like decay, values around 1-3 produce classic right-skewed curves, and larger values push the distribution toward symmetry. The rate parameter controls how spread out the distribution is.
R makes working with the gamma distribution straightforward, but plotting it effectively requires understanding both the statistical functions and visualization tools. This article walks through everything you need to create publication-quality gamma distribution plots.
Understanding Gamma Distribution Functions in R
R provides four functions for working with the gamma distribution, following its standard naming convention for probability distributions:
| Function | Purpose | Returns |
|---|---|---|
dgamma() |
Probability density function (PDF) | Density at given x values |
pgamma() |
Cumulative distribution function (CDF) | P(X ≤ x) |
qgamma() |
Quantile function | x value for given probability |
rgamma() |
Random number generation | Random samples |
Here’s how each function works in practice:
# dgamma(): Get density at specific x values
dgamma(x = 2, shape = 3, rate = 1)
# [1] 0.2706706
# pgamma(): Get cumulative probability P(X <= x)
pgamma(q = 2, shape = 3, rate = 1)
# [1] 0.3233236
# qgamma(): Get x value for a given probability (inverse of pgamma)
qgamma(p = 0.5, shape = 3, rate = 1)
# [1] 2.674051
# rgamma(): Generate random samples
set.seed(42)
rgamma(n = 5, shape = 3, rate = 1)
# [1] 1.628670 4.494618 2.889857 3.967657 1.528863
Note that R uses rate by default, but you can specify scale instead. These are reciprocals: rate = 1/scale. Pick one convention and stick with it to avoid confusion.
Plotting the Probability Density Function (PDF)
The PDF shows the relative likelihood of different values. Let’s start with base R’s curve() function for a quick visualization:
# Simple PDF plot with base R
curve(dgamma(x, shape = 3, rate = 1),
from = 0, to = 10,
ylab = "Density",
xlab = "x",
main = "Gamma Distribution (shape=3, rate=1)",
col = "steelblue",
lwd = 2)
For comparing multiple parameter combinations, ggplot2 offers more flexibility:
library(ggplot2)
# Create a data frame with multiple gamma distributions
x_vals <- seq(0, 15, length.out = 500)
gamma_data <- data.frame(
x = rep(x_vals, 4),
density = c(
dgamma(x_vals, shape = 1, rate = 0.5),
dgamma(x_vals, shape = 2, rate = 0.5),
dgamma(x_vals, shape = 3, rate = 0.5),
dgamma(x_vals, shape = 5, rate = 1)
),
params = rep(c("shape=1, rate=0.5",
"shape=2, rate=0.5",
"shape=3, rate=0.5",
"shape=5, rate=1"),
each = length(x_vals))
)
ggplot(gamma_data, aes(x = x, y = density, color = params)) +
geom_line(linewidth = 1.2) +
labs(
title = "Gamma Distribution PDF",
subtitle = "Effect of shape and rate parameters",
x = "x",
y = "Density",
color = "Parameters"
) +
theme_minimal() +
scale_color_brewer(palette = "Set1")
This comparison reveals the gamma distribution’s flexibility. With shape = 1, it becomes an exponential distribution. As shape increases, the distribution becomes more symmetric and bell-shaped.
Plotting the Cumulative Distribution Function (CDF)
The CDF shows P(X ≤ x)—the probability that a random variable falls at or below a given value. This is often more useful than the PDF for practical applications like determining thresholds or computing probabilities.
Base R approach:
# CDF with base R
curve(pgamma(x, shape = 3, rate = 1),
from = 0, to = 10,
ylab = "Cumulative Probability",
xlab = "x",
main = "Gamma CDF (shape=3, rate=1)",
col = "darkred",
lwd = 2)
# Add reference lines for median
abline(h = 0.5, lty = 2, col = "gray50")
abline(v = qgamma(0.5, shape = 3, rate = 1), lty = 2, col = "gray50")
The ggplot2 version using stat_function():
ggplot(data.frame(x = c(0, 10)), aes(x = x)) +
stat_function(
fun = pgamma,
args = list(shape = 3, rate = 1),
linewidth = 1.2,
color = "darkred"
) +
geom_hline(yintercept = 0.5, linetype = "dashed", color = "gray50") +
geom_vline(
xintercept = qgamma(0.5, shape = 3, rate = 1),
linetype = "dashed",
color = "gray50"
) +
annotate(
"text",
x = qgamma(0.5, shape = 3, rate = 1) + 0.3,
y = 0.45,
label = paste("Median =", round(qgamma(0.5, shape = 3, rate = 1), 2)),
hjust = 0
) +
labs(
title = "Gamma CDF with Median",
x = "x",
y = "P(X ≤ x)"
) +
theme_minimal()
Visualizing Random Samples with Histograms
Generating random samples and comparing them to the theoretical distribution is essential for validating models and teaching statistical concepts. Here’s how to overlay the theoretical density on a histogram of samples:
set.seed(123)
n_samples <- 1000
shape_param <- 3
rate_param <- 1
# Generate random samples
samples <- rgamma(n_samples, shape = shape_param, rate = rate_param)
# Create histogram with density overlay
ggplot(data.frame(x = samples), aes(x = x)) +
geom_histogram(
aes(y = after_stat(density)),
bins = 40,
fill = "lightblue",
color = "white",
alpha = 0.7
) +
stat_function(
fun = dgamma,
args = list(shape = shape_param, rate = rate_param),
linewidth = 1.2,
color = "darkblue"
) +
labs(
title = "Random Gamma Samples vs Theoretical Density",
subtitle = paste("n =", n_samples, "| shape =", shape_param, "| rate =", rate_param),
x = "Value",
y = "Density"
) +
theme_minimal()
The critical detail here is aes(y = after_stat(density)). This scales the histogram to show density rather than counts, allowing direct comparison with the PDF curve. Without this, your overlay will be meaningless.
For smaller sample sizes, you’ll see more deviation from the theoretical curve—this is expected and illustrates sampling variability:
# Compare different sample sizes
set.seed(456)
sample_sizes <- c(50, 200, 1000, 5000)
sample_data <- do.call(rbind, lapply(sample_sizes, function(n) {
data.frame(
x = rgamma(n, shape = 3, rate = 1),
n = paste("n =", n)
)
}))
ggplot(sample_data, aes(x = x)) +
geom_histogram(
aes(y = after_stat(density)),
bins = 30,
fill = "steelblue",
alpha = 0.6
) +
stat_function(
fun = dgamma,
args = list(shape = 3, rate = 1),
color = "red",
linewidth = 1
) +
facet_wrap(~n, scales = "free_y") +
labs(
title = "Convergence to Theoretical Distribution",
x = "Value",
y = "Density"
) +
theme_minimal()
Customizing Your Plots
For publication or presentation, you need polished graphics. Here’s a complete example with professional styling:
library(ggplot2)
library(scales)
# Define parameters
params <- list(
list(shape = 1, rate = 0.5, label = "Exponential-like (α=1)"),
list(shape = 3, rate = 1, label = "Moderate skew (α=3)"),
list(shape = 9, rate = 2, label = "Near-symmetric (α=9)")
)
# Build data
x_seq <- seq(0, 12, length.out = 500)
plot_data <- do.call(rbind, lapply(params, function(p) {
data.frame(
x = x_seq,
density = dgamma(x_seq, shape = p$shape, rate = p$rate),
Distribution = p$label
)
}))
# Custom color palette
custom_colors <- c("#E63946", "#457B9D", "#2A9D8F")
# Create publication-ready plot
ggplot(plot_data, aes(x = x, y = density, color = Distribution)) +
geom_line(linewidth = 1.3) +
scale_color_manual(values = custom_colors) +
scale_y_continuous(labels = label_number(accuracy = 0.01)) +
labs(
title = "Gamma Distribution: Shape Parameter Effects",
subtitle = "Higher shape values produce more symmetric distributions",
x = "Random Variable (x)",
y = "Probability Density f(x)",
caption = "Rate parameter held constant where applicable"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 14),
plot.subtitle = element_text(color = "gray40", size = 11),
legend.position = "bottom",
legend.title = element_blank(),
panel.grid.minor = element_blank(),
axis.title = element_text(size = 11)
) +
guides(color = guide_legend(nrow = 1))
Key customization principles: use theme_minimal() as a clean starting point, remove minor gridlines for clarity, position legends at the bottom for wide figures, and always include informative titles and axis labels.
Conclusion
Plotting the gamma distribution in R comes down to mastering four functions and applying standard visualization techniques. Use dgamma() for density plots, pgamma() for CDFs, and rgamma() with histogram overlays to compare samples against theory.
The gamma distribution connects to other important distributions worth exploring. When shape = 1, it reduces to the exponential distribution. When shape = n/2 and rate = 1/2, it becomes the chi-squared distribution with n degrees of freedom. Understanding these relationships deepens your statistical intuition and helps you recognize gamma distributions in the wild.
Start with base R’s curve() for quick exploration, then move to ggplot2 when you need publication-quality graphics or complex multi-panel comparisons. The code patterns shown here transfer directly to other continuous distributions—just swap the function names and parameters.