Rayleigh Distribution in Python: Complete Guide

The Rayleigh distribution emerges naturally when you take the magnitude of a two-dimensional vector whose components are independent, zero-mean Gaussian random variables with equal variance. If X and...

Key Insights

  • The Rayleigh distribution models the magnitude of two-dimensional vectors with independent, identically distributed Gaussian components—making it essential for signal processing, wind speed analysis, and wireless communications.
  • SciPy’s scipy.stats.rayleigh uses a scale parameter equal to σ (not σ²), which directly controls the distribution’s spread and determines all statistical moments.
  • Parameter estimation via maximum likelihood provides the most statistically efficient estimates, but method of moments offers a quick, intuitive alternative when computational simplicity matters.

Introduction to the Rayleigh Distribution

The Rayleigh distribution emerges naturally when you take the magnitude of a two-dimensional vector whose components are independent, zero-mean Gaussian random variables with equal variance. If X and Y are both N(0, σ²), then R = √(X² + Y²) follows a Rayleigh distribution.

This mathematical property makes the Rayleigh distribution appear across engineering and science. In wireless communications, it models signal envelope fading. In wind energy, it approximates wind speed distributions. In image processing, it describes noise magnitude in complex-valued data.

The probability density function (PDF) is:

$$f(x; \sigma) = \frac{x}{\sigma^2} \exp\left(-\frac{x^2}{2\sigma^2}\right), \quad x \geq 0$$

The cumulative distribution function (CDF) has a closed form:

$$F(x; \sigma) = 1 - \exp\left(-\frac{x^2}{2\sigma^2}\right)$$

The single parameter σ (scale) controls everything about the distribution’s shape and spread.

Implementing Rayleigh Distribution with SciPy

SciPy provides a complete implementation through scipy.stats.rayleigh. The key detail: SciPy’s scale parameter equals σ directly, not σ².

import numpy as np
from scipy import stats

# Create a Rayleigh distribution with scale parameter σ = 2
sigma = 2.0
rayleigh_dist = stats.rayleigh(scale=sigma)

# Generate random samples
np.random.seed(42)
samples = rayleigh_dist.rvs(size=10000)

# Calculate PDF and CDF at specific points
x_values = np.array([1.0, 2.0, 3.0, 4.0])
pdf_values = rayleigh_dist.pdf(x_values)
cdf_values = rayleigh_dist.cdf(x_values)

print("x\t\tPDF\t\tCDF")
print("-" * 40)
for x, pdf, cdf in zip(x_values, pdf_values, cdf_values):
    print(f"{x:.1f}\t\t{pdf:.4f}\t\t{cdf:.4f}")

# Inverse CDF (quantile function)
percentiles = [0.25, 0.50, 0.75, 0.95]
quantiles = rayleigh_dist.ppf(percentiles)
print("\nQuantiles:")
for p, q in zip(percentiles, quantiles):
    print(f"  {int(p*100)}th percentile: {q:.3f}")

Output:

x		PDF		CDF
----------------------------------------
1.0		0.2206		0.1175
2.0		0.3033		0.3935
3.0		0.2371		0.6753
4.0		0.1353		0.8647

Quantiles:
  25th percentile: 1.517
  50th percentile: 2.355
  75th percentile: 3.325
  95th percentile: 4.892

Visualizing the Rayleigh Distribution

Visualization reveals how the scale parameter affects distribution shape. Larger σ values spread the distribution rightward and reduce the peak height.

import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Define scale parameters to compare
scales = [0.5, 1.0, 2.0, 3.0]
colors = ['#2ecc71', '#3498db', '#9b59b6', '#e74c3c']
x = np.linspace(0, 10, 500)

# PDF plot
ax1 = axes[0]
for sigma, color in zip(scales, colors):
    dist = stats.rayleigh(scale=sigma)
    ax1.plot(x, dist.pdf(x), color=color, linewidth=2, label=f'σ = {sigma}')

ax1.set_xlabel('x', fontsize=12)
ax1.set_ylabel('Probability Density', fontsize=12)
ax1.set_title('Rayleigh PDF for Different Scale Parameters', fontsize=13)
ax1.legend()
ax1.set_xlim(0, 10)
ax1.set_ylim(0, 1.2)
ax1.grid(True, alpha=0.3)

# CDF plot
ax2 = axes[1]
for sigma, color in zip(scales, colors):
    dist = stats.rayleigh(scale=sigma)
    ax2.plot(x, dist.cdf(x), color=color, linewidth=2, label=f'σ = {sigma}')

ax2.set_xlabel('x', fontsize=12)
ax2.set_ylabel('Cumulative Probability', fontsize=12)
ax2.set_title('Rayleigh CDF for Different Scale Parameters', fontsize=13)
ax2.legend()
ax2.set_xlim(0, 10)
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('rayleigh_pdf_cdf.png', dpi=150)
plt.show()

# Histogram with theoretical overlay
fig, ax = plt.subplots(figsize=(8, 5))
sigma = 2.0
samples = stats.rayleigh(scale=sigma).rvs(size=5000, random_state=42)

ax.hist(samples, bins=50, density=True, alpha=0.7, 
        color='#3498db', edgecolor='white', label='Samples (n=5000)')
ax.plot(x, stats.rayleigh(scale=sigma).pdf(x), 
        'r-', linewidth=2.5, label='Theoretical PDF')

ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('Density', fontsize=12)
ax.set_title(f'Rayleigh Distribution: Samples vs Theory (σ = {sigma})', fontsize=13)
ax.legend()
ax.set_xlim(0, 10)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('rayleigh_histogram.png', dpi=150)
plt.show()

Statistical Properties and Calculations

The Rayleigh distribution’s moments have closed-form expressions tied to σ:

  • Mean: μ = σ√(π/2) ≈ 1.253σ
  • Mode: σ (the peak of the PDF)
  • Median: σ√(2ln2) ≈ 1.177σ
  • Variance: (4 - π)/2 × σ² ≈ 0.429σ²
  • Standard deviation: σ√((4-π)/2) ≈ 0.655σ
import numpy as np
from scipy import stats

sigma = 2.5
dist = stats.rayleigh(scale=sigma)

# SciPy's built-in methods
mean_scipy = dist.mean()
var_scipy = dist.var()
std_scipy = dist.std()
median_scipy = dist.median()

# Manual calculations from formulas
mean_manual = sigma * np.sqrt(np.pi / 2)
var_manual = ((4 - np.pi) / 2) * sigma**2
std_manual = np.sqrt(var_manual)
median_manual = sigma * np.sqrt(2 * np.log(2))
mode_manual = sigma  # Mode always equals sigma

print(f"Scale parameter σ = {sigma}")
print(f"\n{'Property':<15} {'SciPy':<12} {'Formula':<12} {'Match'}")
print("-" * 50)
print(f"{'Mean':<15} {mean_scipy:<12.6f} {mean_manual:<12.6f} {np.isclose(mean_scipy, mean_manual)}")
print(f"{'Variance':<15} {var_scipy:<12.6f} {var_manual:<12.6f} {np.isclose(var_scipy, var_manual)}")
print(f"{'Std Dev':<15} {std_scipy:<12.6f} {std_manual:<12.6f} {np.isclose(std_scipy, std_manual)}")
print(f"{'Median':<15} {median_scipy:<12.6f} {median_manual:<12.6f} {np.isclose(median_scipy, median_manual)}")
print(f"{'Mode':<15} {'N/A':<12} {mode_manual:<12.6f}")

# Higher moments: skewness and kurtosis
skewness = dist.stats(moments='s')
kurtosis = dist.stats(moments='k')
print(f"\nSkewness: {float(skewness):.6f}")
print(f"Excess Kurtosis: {float(kurtosis):.6f}")

Parameter Estimation and Fitting

When you have observed data, estimating σ requires choosing an estimation method. Maximum likelihood estimation (MLE) provides optimal statistical properties.

The MLE for σ is:

$$\hat{\sigma}{MLE} = \sqrt{\frac{1}{2n}\sum{i=1}^{n} x_i^2}$$

The method of moments estimator uses the sample mean:

$$\hat{\sigma}_{MoM} = \bar{x} \sqrt{\frac{2}{\pi}}$$

import numpy as np
from scipy import stats

# Generate synthetic data with known σ
true_sigma = 3.0
np.random.seed(123)
data = stats.rayleigh(scale=true_sigma).rvs(size=500)

# Method 1: SciPy's fit() function (uses MLE)
loc_fit, scale_fit = stats.rayleigh.fit(data, floc=0)  # Fix location at 0
print(f"True σ: {true_sigma}")
print(f"\nSciPy fit (MLE): σ = {scale_fit:.4f}")

# Method 2: Manual MLE calculation
sigma_mle = np.sqrt(np.sum(data**2) / (2 * len(data)))
print(f"Manual MLE: σ = {sigma_mle:.4f}")

# Method 3: Method of Moments
sigma_mom = np.mean(data) * np.sqrt(2 / np.pi)
print(f"Method of Moments: σ = {sigma_mom:.4f}")

# Compare estimation errors
print(f"\nEstimation Errors:")
print(f"  MLE error: {abs(sigma_mle - true_sigma):.4f} ({abs(sigma_mle - true_sigma)/true_sigma*100:.2f}%)")
print(f"  MoM error: {abs(sigma_mom - true_sigma):.4f} ({abs(sigma_mom - true_sigma)/true_sigma*100:.2f}%)")

# Goodness-of-fit test
ks_stat, p_value = stats.kstest(data, 'rayleigh', args=(0, scale_fit))
print(f"\nKolmogorov-Smirnov test:")
print(f"  Statistic: {ks_stat:.4f}")
print(f"  p-value: {p_value:.4f}")
print(f"  Conclusion: {'Good fit' if p_value > 0.05 else 'Poor fit'} (α=0.05)")

Practical Application: Wind Speed Analysis

Wind engineers frequently use the Rayleigh distribution to model wind speeds. This example demonstrates a complete analysis workflow.

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Simulate realistic wind speed data (m/s)
# In practice, you'd load actual measurements
np.random.seed(42)
true_sigma = 4.5  # Typical for moderate wind sites
wind_speeds = stats.rayleigh(scale=true_sigma).rvs(size=8760)  # One year hourly
wind_speeds = np.clip(wind_speeds, 0, 25)  # Physical bounds

# Fit the Rayleigh distribution
_, sigma_est = stats.rayleigh.fit(wind_speeds, floc=0)
fitted_dist = stats.rayleigh(scale=sigma_est)

print("Wind Speed Analysis Results")
print("=" * 40)
print(f"Sample size: {len(wind_speeds)} hours")
print(f"Estimated σ: {sigma_est:.3f} m/s")
print(f"\nDescriptive Statistics:")
print(f"  Mean wind speed: {np.mean(wind_speeds):.2f} m/s")
print(f"  Median wind speed: {np.median(wind_speeds):.2f} m/s")
print(f"  Max wind speed: {np.max(wind_speeds):.2f} m/s")

# Calculate exceedance probabilities (critical for turbine selection)
thresholds = [3, 5, 10, 15, 20]  # m/s
print(f"\nExceedance Probabilities:")
for v in thresholds:
    prob = 1 - fitted_dist.cdf(v)
    hours_per_year = prob * 8760
    print(f"  P(V > {v:2d} m/s) = {prob:.3f} ({hours_per_year:.0f} hours/year)")

# Calculate capacity factor estimate (simplified)
rated_speed = 12  # m/s
cut_in = 3  # m/s
cut_out = 25  # m/s

prob_operating = fitted_dist.cdf(cut_out) - fitted_dist.cdf(cut_in)
print(f"\nTurbine Operating Probability: {prob_operating:.1%}")

# Visualization
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Histogram with fit
ax1 = axes[0]
x = np.linspace(0, 20, 200)
ax1.hist(wind_speeds, bins=40, density=True, alpha=0.7, 
         color='#3498db', edgecolor='white', label='Observed')
ax1.plot(x, fitted_dist.pdf(x), 'r-', linewidth=2.5, 
         label=f'Rayleigh fit (σ={sigma_est:.2f})')
ax1.axvline(fitted_dist.mean(), color='green', linestyle='--', 
            linewidth=2, label=f'Mean = {fitted_dist.mean():.2f} m/s')
ax1.set_xlabel('Wind Speed (m/s)', fontsize=12)
ax1.set_ylabel('Probability Density', fontsize=12)
ax1.set_title('Wind Speed Distribution', fontsize=13)
ax1.legend()
ax1.set_xlim(0, 20)
ax1.grid(True, alpha=0.3)

# Exceedance curve
ax2 = axes[1]
speeds = np.linspace(0, 20, 200)
exceedance = 1 - fitted_dist.cdf(speeds)
ax2.plot(speeds, exceedance * 100, 'b-', linewidth=2.5)
ax2.fill_between(speeds, exceedance * 100, alpha=0.3)
ax2.set_xlabel('Wind Speed (m/s)', fontsize=12)
ax2.set_ylabel('Exceedance Probability (%)', fontsize=12)
ax2.set_title('Wind Speed Exceedance Curve', fontsize=13)
ax2.set_xlim(0, 20)
ax2.set_ylim(0, 100)
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('wind_analysis.png', dpi=150)
plt.show()

The Rayleigh distribution belongs to a family of related distributions. Understanding when to use each is crucial for accurate modeling.

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

x = np.linspace(0, 8, 300)
sigma = 1.5

fig, ax = plt.subplots(figsize=(10, 6))

# Rayleigh distribution
rayleigh = stats.rayleigh(scale=sigma)
ax.plot(x, rayleigh.pdf(x), 'b-', linewidth=2.5, label='Rayleigh (σ=1.5)')

# Chi distribution with 2 degrees of freedom (equivalent to Rayleigh when scaled)
chi2 = stats.chi(df=2, scale=sigma)
ax.plot(x, chi2.pdf(x), 'g--', linewidth=2.5, label='Chi (df=2, scale=1.5)')

# Rice distribution (generalizes Rayleigh with non-centrality)
rice = stats.rice(b=0, scale=sigma)  # b=0 reduces to Rayleigh
ax.plot(x, rice.pdf(x), 'r:', linewidth=2.5, label='Rice (ν=0, σ=1.5)')

# Weibull with shape=2 (equivalent to Rayleigh)
weibull_scale = sigma * np.sqrt(2)  # Conversion factor
weibull = stats.weibull_min(c=2, scale=weibull_scale)
ax.plot(x, weibull.pdf(x), 'm-.', linewidth=2.5, label=f'Weibull (k=2, λ={weibull_scale:.2f})')

ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('Probability Density', fontsize=12)
ax.set_title('Rayleigh and Related Distributions', fontsize=13)
ax.legend(fontsize=11)
ax.set_xlim(0, 8)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('distribution_comparison.png', dpi=150)
plt.show()

print("Distribution Selection Guide:")
print("-" * 50)
print("Rayleigh: Magnitude of 2D Gaussian vector, zero mean")
print("Rice: Magnitude of 2D Gaussian vector, non-zero mean")
print("Chi: Magnitude of n-dimensional Gaussian vector")
print("Weibull: Flexible shape parameter; k=2 equals Rayleigh")

When to choose each:

  • Rayleigh: Your data represents the magnitude of a 2D signal with zero-mean Gaussian components. Classic examples include envelope detection in communications and wind speed modeling.
  • Rice: Similar to Rayleigh but with a dominant signal component (non-zero mean). Use for fading channels with line-of-sight.
  • Weibull: When you need flexibility in the tail behavior. The shape parameter k lets you model heavier or lighter tails than Rayleigh allows.
  • Chi: When working with higher-dimensional Gaussian vectors (3D, 4D, etc.).

The Rayleigh distribution’s simplicity—a single parameter with closed-form expressions for everything—makes it the default choice when its assumptions hold. When they don’t, reach for its generalizations.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.