Rayleigh Distribution in R: Complete Guide

Key Insights

The Rayleigh distribution is defined by a single scale parameter σ and naturally arises when modeling the magnitude of two-dimensional vectors with independent Gaussian components—making it ideal for wind speed, signal processing, and medical imaging applications.
While base R lacks built-in Rayleigh functions, implementing drayleigh(), prayleigh(), qrayleigh(), and rrayleigh() from scratch requires only a few lines of code using the distribution’s straightforward mathematical formulas.
Parameter estimation via maximum likelihood has a closed-form solution for the Rayleigh distribution, enabling fast and reliable fitting to real-world data without iterative optimization.

Introduction to the Rayleigh Distribution

The Rayleigh distribution describes the magnitude of a two-dimensional vector whose components are independent, zero-mean Gaussian random variables with equal variance. This makes it a natural choice for modeling phenomena where you’re measuring distance from an origin in a noisy environment.

The probability density function (PDF) is:

$$f(x; \sigma) = \frac{x}{\sigma^2} \exp\left(-\frac{x^2}{2\sigma^2}\right), \quad x \geq 0$$

The cumulative distribution function (CDF) is:

$$F(x; \sigma) = 1 - \exp\left(-\frac{x^2}{2\sigma^2}\right)$$

The single parameter σ (sigma) controls the scale. The mean is $\sigma\sqrt{\pi/2}$ and the mode occurs at σ.

Real-world applications include wind speed modeling (where horizontal velocity components are approximately Gaussian), wireless signal envelope detection, and MRI image noise characterization.

# Visualize Rayleigh PDF for different scale parameters
x <- seq(0, 10, length.out = 200)

# Define Rayleigh PDF
drayleigh <- function(x, sigma) {
  (x / sigma^2) * exp(-x^2 / (2 * sigma^2))
}

# Plot for different sigma values
plot(x, drayleigh(x, 1), type = "l", lwd = 2, col = "blue",
     xlab = "x", ylab = "Density", main = "Rayleigh Distribution PDF")
lines(x, drayleigh(x, 2), lwd = 2, col = "red")
lines(x, drayleigh(x, 3), lwd = 2, col = "green")
legend("topright", legend = c("σ = 1", "σ = 2", "σ = 3"),
       col = c("blue", "red", "green"), lwd = 2)

Implementing Rayleigh Functions in Base R

R doesn’t include Rayleigh distribution functions out of the box, but building them is straightforward. Following R’s naming conventions, we create four functions: density (d), distribution (p), quantile (q), and random generation (r).

# Density function
drayleigh <- function(x, sigma, log = FALSE) {
  if (any(sigma <= 0)) stop("sigma must be positive")
  
  # Handle negative x values
  density <- ifelse(x < 0, 0, 
                    (x / sigma^2) * exp(-x^2 / (2 * sigma^2)))
  
  if (log) log(density) else density
}

# Distribution function (CDF)
prayleigh <- function(q, sigma, lower.tail = TRUE, log.p = FALSE) {
  if (any(sigma <= 0)) stop("sigma must be positive")
  
  p <- ifelse(q < 0, 0, 1 - exp(-q^2 / (2 * sigma^2)))
  
  if (!lower.tail) p <- 1 - p
  if (log.p) log(p) else p
}

# Quantile function (inverse CDF)
qrayleigh <- function(p, sigma, lower.tail = TRUE, log.p = FALSE) {
  if (any(sigma <= 0)) stop("sigma must be positive")
  
  if (log.p) p <- exp(p)
  if (!lower.tail) p <- 1 - p
  
  if (any(p < 0 | p > 1)) stop("p must be in [0, 1]")
  
  sigma * sqrt(-2 * log(1 - p))
}

# Random generation
rrayleigh <- function(n, sigma) {
  if (any(sigma <= 0)) stop("sigma must be positive")
  
  # Use inverse transform sampling
  u <- runif(n)
  sigma * sqrt(-2 * log(u))
}

# Test the functions
set.seed(42)
samples <- rrayleigh(1000, sigma = 2)
cat("Sample mean:", mean(samples), "\n")
cat("Theoretical mean:", 2 * sqrt(pi/2), "\n")
cat("P(X < 3):", prayleigh(3, sigma = 2), "\n")
cat("95th percentile:", qrayleigh(0.95, sigma = 2), "\n")

The random generation function uses inverse transform sampling—generate uniform random numbers and pass them through the quantile function. This approach is both elegant and efficient.

Using R Packages for Rayleigh Distribution

Several packages provide Rayleigh distribution functions. Here’s how they compare:

# Install if needed
# install.packages(c("VGAM", "extraDistr", "flexsurv"))

library(VGAM)
library(extraDistr)

# VGAM uses a different parameterization
# VGAM's scale = sigma * sqrt(2)
sigma <- 2
vgam_scale <- sigma * sqrt(2)

# Generate samples
set.seed(123)
samples_vgam <- rrayleigh(1000, scale = vgam_scale)  # VGAM
samples_extra <- rrayleigh(1000, sigma = sigma)       # extraDistr (our custom)

# Compare densities at x = 3
cat("VGAM density:", VGAM::drayleigh(3, scale = vgam_scale), "\n")
cat("extraDistr density:", extraDistr::drayleigh(3, sigma = sigma), "\n")

# Compute probabilities
cat("VGAM P(X < 3):", VGAM::prayleigh(3, scale = vgam_scale), "\n")
cat("extraDistr P(X < 3):", extraDistr::prayleigh(3, sigma = sigma), "\n")

Watch the parameterization carefully. VGAM uses a scale parameter equal to σ√2, while extraDistr uses σ directly. Always check package documentation to avoid errors.

For survival analysis, flexsurv offers the Rayleigh as a special case of the generalized gamma family, which is useful when comparing nested models.

Parameter Estimation and Fitting

The maximum likelihood estimator for σ has a closed-form solution:

$$\hat{\sigma}{MLE} = \sqrt{\frac{1}{2n}\sum{i=1}^{n} x_i^2}$$

This is remarkably convenient—no iterative optimization required.

# MLE for Rayleigh scale parameter
rayleigh_mle <- function(x) {
  sqrt(sum(x^2) / (2 * length(x)))
}

# Method of moments estimator (using the mean)
rayleigh_mom <- function(x) {
  mean(x) / sqrt(pi / 2)
}

# Simulate wind speed data (true sigma = 4)
set.seed(456)
true_sigma <- 4
wind_speeds <- rrayleigh(500, sigma = true_sigma)

# Estimate parameters
sigma_mle <- rayleigh_mle(wind_speeds)
sigma_mom <- rayleigh_mom(wind_speeds)

cat("True sigma:", true_sigma, "\n")
cat("MLE estimate:", sigma_mle, "\n")
cat("MoM estimate:", sigma_mom, "\n")

# Using fitdistrplus for a more complete analysis
library(fitdistrplus)

# Define custom distribution for fitdist
drayleigh_fit <- function(x, sigma) {
  (x / sigma^2) * exp(-x^2 / (2 * sigma^2))
}

prayleigh_fit <- function(q, sigma) {
  1 - exp(-q^2 / (2 * sigma^2))
}

qrayleigh_fit <- function(p, sigma) {
  sigma * sqrt(-2 * log(1 - p))
}

fit <- fitdist(wind_speeds, "rayleigh_fit", 
               start = list(sigma = 1),
               method = "mle")
summary(fit)

The MLE is generally preferred over the method of moments because it’s more efficient (lower variance) and has known asymptotic properties.

Visualization and Diagnostics

Good diagnostic plots reveal whether your data actually follows a Rayleigh distribution. Build a panel that includes a histogram with fitted density, Q-Q plot, and empirical vs. theoretical CDF comparison.

library(ggplot2)
library(patchwork)

# Generate data and fit
set.seed(789)
data <- rrayleigh(300, sigma = 3)
sigma_hat <- rayleigh_mle(data)

# Create data frame for plotting
df <- data.frame(x = data)

# Histogram with fitted density
p1 <- ggplot(df, aes(x = x)) +

  geom_histogram(aes(y = after_stat(density)), 
                 bins = 30, fill = "steelblue", alpha = 0.7) +
  stat_function(fun = drayleigh, args = list(sigma = sigma_hat),
                color = "red", linewidth = 1.2) +
  labs(title = "Histogram with Fitted Rayleigh",
       x = "Value", y = "Density") +
  theme_minimal()

# Q-Q plot
theoretical_quantiles <- qrayleigh(ppoints(length(data)), sigma = sigma_hat)
qq_df <- data.frame(
  theoretical = theoretical_quantiles,
  sample = sort(data)
)

p2 <- ggplot(qq_df, aes(x = theoretical, y = sample)) +
  geom_point(alpha = 0.6, color = "steelblue") +
  geom_abline(slope = 1, intercept = 0, color = "red", linewidth = 1) +
  labs(title = "Q-Q Plot",
       x = "Theoretical Quantiles", y = "Sample Quantiles") +
  theme_minimal()

# Empirical vs theoretical CDF
ecdf_data <- data.frame(
  x = sort(data),
  empirical = ecdf(data)(sort(data)),
  theoretical = prayleigh(sort(data), sigma = sigma_hat)
)

p3 <- ggplot(ecdf_data, aes(x = x)) +
  geom_step(aes(y = empirical), color = "steelblue", linewidth = 1) +
  geom_line(aes(y = theoretical), color = "red", linewidth = 1) +
  labs(title = "Empirical vs Theoretical CDF",
       x = "Value", y = "Cumulative Probability") +
  theme_minimal()

# Combine plots
(p1 | p2) / p3

Points falling along the diagonal in the Q-Q plot indicate good fit. Systematic deviations suggest the Rayleigh may not be appropriate.

Practical Application: Wind Speed Analysis

Let’s work through a complete analysis pipeline using realistic wind speed data.

library(ggplot2)
library(fitdistrplus)

# Simulate realistic hourly wind speeds (m/s) for one year
set.seed(2024)
n_hours <- 8760  # hours in a year
true_sigma <- 4.2

# Add some measurement noise and ensure positive values
wind_data <- rrayleigh(n_hours, sigma = true_sigma)
wind_data <- pmax(wind_data, 0.1)  # Minimum measurable speed

# Exploratory analysis
cat("Wind Speed Summary Statistics\n")
cat("-----------------------------\n")
cat("Mean:", round(mean(wind_data), 2), "m/s\n")
cat("Median:", round(median(wind_data), 2), "m/s\n")
cat("Max:", round(max(wind_data), 2), "m/s\n")
cat("SD:", round(sd(wind_data), 2), "m/s\n")

# Fit Rayleigh distribution
sigma_mle <- rayleigh_mle(wind_data)
cat("\nEstimated scale parameter (σ):", round(sigma_mle, 3), "\n")

# Goodness-of-fit tests
# Kolmogorov-Smirnov test
ks_result <- ks.test(wind_data, "prayleigh", sigma = sigma_mle)
cat("\nKolmogorov-Smirnov Test\n")
cat("D statistic:", round(ks_result$statistic, 4), "\n")
cat("p-value:", round(ks_result$p.value, 4), "\n")

# Anderson-Darling test (requires goftest package)
library(goftest)

# Custom CDF wrapper for AD test
ad_result <- ad.test(wind_data, null = "prayleigh", sigma = sigma_mle)
cat("\nAnderson-Darling Test\n")
cat("A statistic:", round(ad_result$statistic, 4), "\n")
cat("p-value:", round(ad_result$p.value, 4), "\n")

# Calculate practical quantities
prob_above_10 <- 1 - prayleigh(10, sigma = sigma_mle)
cat("\nPractical Results\n")
cat("-----------------\n")
cat("P(wind > 10 m/s):", round(prob_above_10 * 100, 1), "%\n")
cat("95th percentile:", round(qrayleigh(0.95, sigma = sigma_mle), 2), "m/s\n")

# Expected power output (assuming P ∝ v³ for wind turbines)
# E[V³] for Rayleigh = 3σ³√(π/2)
expected_v_cubed <- 3 * sigma_mle^3 * sqrt(pi/2)
cat("E[V³] for power estimation:", round(expected_v_cubed, 1), "\n")

The Kolmogorov-Smirnov and Anderson-Darling tests provide formal hypothesis testing. A high p-value suggests the Rayleigh model is appropriate; a low p-value indicates you should consider alternatives.

Summary and Further Resources

The Rayleigh distribution is a practical choice when modeling non-negative data arising from the magnitude of bivariate Gaussian processes. Its single-parameter simplicity makes estimation straightforward, and the closed-form MLE eliminates computational overhead.

When should you consider alternatives? If your data shows heavier tails, the Weibull distribution (which includes Rayleigh as a special case with shape = 2) offers more flexibility. For data with a non-zero mean in the underlying Gaussian components, the Rice distribution is more appropriate.

Key resources for further study:

The VGAM package documentation for extended distribution families
fitdistrplus vignettes for comprehensive distribution fitting workflows
Johnson, Kotz, and Balakrishnan’s Continuous Univariate Distributions for theoretical depth

Start with the custom functions presented here for learning purposes, then transition to package implementations for production code. The mathematical transparency of hand-rolled functions builds intuition that pays dividends when debugging more complex statistical workflows.