How to Plot the Exponential Distribution in R
The exponential distribution models the time between events in a Poisson process. If events occur continuously and independently at a constant average rate, the waiting time until the next event...
Key Insights
- R provides four core functions for the exponential distribution (
dexp,pexp,qexp,rexp) that handle density, cumulative probability, quantiles, and random generation respectively - The rate parameter λ controls the distribution’s shape—higher rates produce steeper curves with faster decay, while lower rates create flatter, more spread-out distributions
- Overlaying theoretical density curves on simulated data histograms validates your understanding and helps communicate statistical concepts effectively
Introduction to the Exponential Distribution
The exponential distribution models the time between events in a Poisson process. If events occur continuously and independently at a constant average rate, the waiting time until the next event follows an exponential distribution. This makes it fundamental in reliability engineering, queuing theory, and survival analysis.
The distribution has a single parameter: the rate λ (lambda). A higher rate means events occur more frequently, so waiting times are shorter. The mean of an exponential distribution is 1/λ, and its variance is 1/λ². This memoryless property—where the probability of waiting another t units is independent of how long you’ve already waited—makes it unique among continuous distributions.
Common applications include modeling:
- Time until a machine fails
- Time between customer arrivals at a service counter
- Radioactive decay intervals
- Time until the next earthquake in a region
R provides excellent tools for working with this distribution. Let’s explore them systematically.
Understanding R’s Exponential Functions
R follows a consistent naming convention for probability distributions. For the exponential distribution, you get four functions:
# dexp() - Probability Density Function (PDF)
# Returns the height of the density curve at a given point
dexp(x = 2, rate = 1)
# [1] 0.1353353
# pexp() - Cumulative Distribution Function (CDF)
# Returns P(X <= x), the probability of observing a value less than or equal to x
pexp(q = 2, rate = 1)
# [1] 0.8646647
# qexp() - Quantile Function (inverse CDF)
# Returns the value x such that P(X <= x) = p
qexp(p = 0.5, rate = 1)
# [1] 0.6931472
# rexp() - Random Generation
# Generates n random samples from the distribution
set.seed(42)
rexp(n = 5, rate = 1)
# [1] 1.4299282 0.8803239 0.2418025 0.8696498 0.9261889
The rate parameter defaults to 1 if omitted. Note that some textbooks parameterize the exponential distribution using the scale parameter β = 1/λ. R uses the rate parameterization, so be careful when translating formulas from different sources.
Plotting the Probability Density Function (PDF)
The PDF shows how probability density is distributed across possible values. For the exponential distribution, the density starts at λ when x = 0 and decays exponentially toward zero.
Base R Approach
The curve() function provides the quickest path to a basic plot:
# Simple PDF plot with base R
curve(dexp(x, rate = 1),
from = 0,
to = 6,
main = "Exponential Distribution PDF (λ = 1)",
xlab = "x",
ylab = "Density",
col = "steelblue",
lwd = 2)
# Add a grid for readability
grid()
ggplot2 Approach
For publication-quality graphics, ggplot2 offers more control:
library(ggplot2)
ggplot(data.frame(x = c(0, 6)), aes(x = x)) +
stat_function(fun = dexp,
args = list(rate = 1),
color = "steelblue",
linewidth = 1.2) +
labs(title = "Exponential Distribution PDF (λ = 1)",
x = "x",
y = "Density") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5))
The stat_function() layer evaluates the specified function across the x-axis range. The args parameter passes additional arguments to your function—in this case, the rate parameter.
Plotting the Cumulative Distribution Function (CDF)
The CDF shows the probability that a random variable takes a value less than or equal to x. For the exponential distribution, this probability increases from 0 and asymptotically approaches 1.
Base R Approach
# CDF plot with base R
curve(pexp(x, rate = 1),
from = 0,
to = 6,
main = "Exponential Distribution CDF (λ = 1)",
xlab = "x",
ylab = "Cumulative Probability",
col = "darkred",
lwd = 2)
# Add reference lines at common quantiles
abline(h = c(0.5, 0.95), lty = 2, col = "gray50")
grid()
ggplot2 Approach
ggplot(data.frame(x = c(0, 6)), aes(x = x)) +
stat_function(fun = pexp,
args = list(rate = 1),
color = "darkred",
linewidth = 1.2) +
geom_hline(yintercept = c(0.5, 0.95),
linetype = "dashed",
color = "gray50",
alpha = 0.7) +
labs(title = "Exponential Distribution CDF (λ = 1)",
x = "x",
y = "Cumulative Probability") +
scale_y_continuous(breaks = seq(0, 1, 0.25)) +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5))
The horizontal reference lines at 0.5 and 0.95 help readers identify the median and 95th percentile visually.
Comparing Multiple Rate Parameters
Understanding how λ affects the distribution shape is crucial. Higher rates concentrate probability mass near zero, while lower rates spread it out.
# Create data for multiple rates
x_vals <- seq(0, 6, length.out = 200)
rates <- c(0.5, 1, 2)
# Build a data frame for ggplot
df <- data.frame(
x = rep(x_vals, times = length(rates)),
rate = factor(rep(rates, each = length(x_vals)))
)
df$density <- dexp(df$x, rate = as.numeric(as.character(df$rate)))
# Plot with ggplot2
ggplot(df, aes(x = x, y = density, color = rate)) +
geom_line(linewidth = 1.2) +
scale_color_manual(values = c("0.5" = "#E69F00",
"1" = "#56B4E9",
"2" = "#009E73"),
labels = c("λ = 0.5", "λ = 1", "λ = 2")) +
labs(title = "Exponential Distribution: Effect of Rate Parameter",
x = "x",
y = "Density",
color = "Rate") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
legend.position = c(0.8, 0.8))
For a base R alternative that’s equally effective:
# Base R multi-curve plot
curve(dexp(x, rate = 0.5), from = 0, to = 6,
col = "#E69F00", lwd = 2,
main = "Exponential Distribution: Effect of Rate Parameter",
xlab = "x", ylab = "Density",
ylim = c(0, 2))
curve(dexp(x, rate = 1), add = TRUE, col = "#56B4E9", lwd = 2)
curve(dexp(x, rate = 2), add = TRUE, col = "#009E73", lwd = 2)
legend("topright",
legend = c("λ = 0.5", "λ = 1", "λ = 2"),
col = c("#E69F00", "#56B4E9", "#009E73"),
lwd = 2)
Notice how λ = 2 produces a steep curve concentrated near zero (mean = 0.5), while λ = 0.5 creates a flatter curve extending further right (mean = 2).
Plotting Simulated Data with Histogram Overlay
Combining simulated samples with theoretical curves validates your understanding and creates compelling visualizations for presentations.
# Generate random samples
set.seed(123)
n_samples <- 1000
rate_param <- 1.5
samples <- rexp(n_samples, rate = rate_param)
# Create histogram with density overlay using ggplot2
ggplot(data.frame(x = samples), aes(x = x)) +
geom_histogram(aes(y = after_stat(density)),
bins = 30,
fill = "lightblue",
color = "white",
alpha = 0.7) +
stat_function(fun = dexp,
args = list(rate = rate_param),
color = "darkblue",
linewidth = 1.2) +
labs(title = paste0("Simulated Exponential Data (n = ", n_samples, ", λ = ", rate_param, ")"),
subtitle = "Histogram with theoretical density overlay",
x = "Value",
y = "Density") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5, color = "gray50"))
The key detail here is aes(y = after_stat(density)). This scales the histogram to show density rather than counts, allowing direct comparison with the theoretical PDF.
For base R:
# Base R histogram with overlay
hist(samples,
breaks = 30,
freq = FALSE, # Show density, not frequency
main = paste0("Simulated Exponential Data (n = ", n_samples, ", λ = ", rate_param, ")"),
xlab = "Value",
col = "lightblue",
border = "white")
curve(dexp(x, rate = rate_param),
add = TRUE,
col = "darkblue",
lwd = 2)
Practical Applications and Customization Tips
Styling for Publication
When preparing plots for papers or reports, consider these adjustments:
# Publication-ready theme
theme_publication <- theme_minimal() +
theme(
text = element_text(family = "serif", size = 12),
plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
axis.title = element_text(size = 11),
legend.position = "bottom",
panel.grid.minor = element_blank()
)
# Apply to your plot
ggplot(data.frame(x = c(0, 6)), aes(x = x)) +
stat_function(fun = dexp, args = list(rate = 1), linewidth = 1) +
labs(x = "Time (hours)", y = "Probability Density") +
theme_publication
Adding Annotations
Annotations help readers interpret results:
# Add mean and median annotations
mean_val <- 1 / rate_param
median_val <- qexp(0.5, rate = rate_param)
ggplot(data.frame(x = c(0, 5)), aes(x = x)) +
stat_function(fun = dexp, args = list(rate = rate_param), linewidth = 1.2) +
geom_vline(xintercept = mean_val, linetype = "dashed", color = "red") +
geom_vline(xintercept = median_val, linetype = "dotted", color = "blue") +
annotate("text", x = mean_val + 0.3, y = 1.2, label = "Mean", color = "red") +
annotate("text", x = median_val + 0.3, y = 1.0, label = "Median", color = "blue") +
theme_minimal()
Real-World Interpretation
When presenting exponential distribution analyses, translate statistical results into domain language. Instead of saying “λ = 0.5,” explain that “on average, one event occurs every 2 hours.” Calculate probabilities that answer practical questions:
# What's the probability a customer waits more than 5 minutes?
# If average service rate is 3 customers per minute (λ = 3)
1 - pexp(5, rate = 3)
# [1] 3.059023e-07 (essentially zero)
# What waiting time will 90% of customers experience or less?
qexp(0.90, rate = 3)
# [1] 0.7675284 minutes (about 46 seconds)
These calculations, combined with clear visualizations, make your statistical analyses accessible to non-technical stakeholders. The exponential distribution’s simplicity—one parameter controlling everything—makes it an excellent teaching tool and a practical workhorse for modeling waiting times and failure rates.