Cauchy Distribution in Python: Complete Guide
The Cauchy distribution is the troublemaker of probability theory. It looks deceptively similar to the normal distribution but breaks nearly every assumption you've learned about statistics.
Key Insights
- The Cauchy distribution has no defined mean or variance, making standard statistical measures like sample means meaningless and prone to wild fluctuations regardless of sample size.
- Unlike normal distributions, Cauchy samples don’t converge to a stable average—the law of large numbers doesn’t apply, which has profound implications for Monte Carlo simulations and financial risk modeling.
- SciPy’s
scipy.stats.cauchymodule provides complete functionality for working with Cauchy distributions, but parameter estimation requires robust methods like median and interquartile range rather than traditional moment-based approaches.
Introduction to the Cauchy Distribution
The Cauchy distribution is the troublemaker of probability theory. It looks deceptively similar to the normal distribution but breaks nearly every assumption you’ve learned about statistics.
The probability density function (PDF) is defined as:
$$f(x; x_0, \gamma) = \frac{1}{\pi\gamma\left[1 + \left(\frac{x - x_0}{\gamma}\right)^2\right]}$$
Where $x_0$ is the location parameter (peak of the distribution) and $\gamma$ is the scale parameter (half-width at half-maximum).
The cumulative distribution function (CDF) is:
$$F(x; x_0, \gamma) = \frac{1}{\pi}\arctan\left(\frac{x - x_0}{\gamma}\right) + \frac{1}{2}$$
The defining characteristic of the Cauchy distribution is that its mean and variance are undefined. Not infinite—undefined. The integrals simply don’t converge. This isn’t a mathematical curiosity; it has real consequences for anyone working with heavy-tailed data.
You’ll encounter Cauchy distributions in physics (resonance phenomena, spectral line shapes), finance (modeling extreme market movements), and signal processing (ratio of two independent normal random variables). If you’re building systems that need to handle outliers gracefully, understanding Cauchy behavior is essential.
Cauchy vs. Normal Distribution
At first glance, Cauchy and normal distributions look similar—both are symmetric and bell-shaped. The critical difference lies in the tails. Cauchy tails decay polynomially ($1/x^2$), while normal tails decay exponentially ($e^{-x^2}$). This means Cauchy distributions generate extreme values far more frequently.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
x = np.linspace(-10, 10, 1000)
# Standard normal and standard Cauchy
normal_pdf = stats.norm.pdf(x, loc=0, scale=1)
cauchy_pdf = stats.cauchy.pdf(x, loc=0, scale=1)
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
# Linear scale comparison
axes[0].plot(x, normal_pdf, label='Normal(0, 1)', linewidth=2)
axes[0].plot(x, cauchy_pdf, label='Cauchy(0, 1)', linewidth=2)
axes[0].set_xlabel('x')
axes[0].set_ylabel('Probability Density')
axes[0].set_title('PDF Comparison (Linear Scale)')
axes[0].legend()
axes[0].set_ylim(0, 0.5)
# Log scale to show tail behavior
axes[1].semilogy(x, normal_pdf, label='Normal(0, 1)', linewidth=2)
axes[1].semilogy(x, cauchy_pdf, label='Cauchy(0, 1)', linewidth=2)
axes[1].set_xlabel('x')
axes[1].set_ylabel('Probability Density (log scale)')
axes[1].set_title('PDF Comparison (Log Scale)')
axes[1].legend()
plt.tight_layout()
plt.show()
The log-scale plot reveals the truth: at $x = 5$, the normal PDF is essentially zero ($\approx 10^{-6}$), while the Cauchy PDF is still $\approx 0.01$. This 10,000x difference in tail probability explains why Cauchy samples produce extreme outliers regularly.
Standard statistical measures fail spectacularly with Cauchy data. The sample mean doesn’t converge to anything meaningful, and the sample variance explodes. If you apply normal-distribution assumptions to Cauchy data, your confidence intervals, hypothesis tests, and regression models will produce garbage.
Implementing Cauchy Distribution with SciPy
SciPy’s scipy.stats.cauchy provides a complete implementation following the standard frozen distribution pattern. The loc parameter corresponds to $x_0$, and scale corresponds to $\gamma$.
from scipy import stats
import numpy as np
# Create a Cauchy distribution with location=2, scale=0.5
cauchy_dist = stats.cauchy(loc=2, scale=0.5)
# Evaluate PDF and CDF at specific points
x_values = np.array([1.0, 2.0, 3.0, 4.0])
print("PDF values:", cauchy_dist.pdf(x_values))
print("CDF values:", cauchy_dist.cdf(x_values))
# Calculate percentiles (inverse CDF)
percentiles = [0.25, 0.50, 0.75, 0.95]
quantiles = cauchy_dist.ppf(percentiles)
print("\nPercentiles:")
for p, q in zip(percentiles, quantiles):
print(f" {p*100:.0f}th percentile: {q:.4f}")
# Interval containing 90% of the probability mass
lower, upper = cauchy_dist.interval(0.90)
print(f"\n90% interval: [{lower:.4f}, {upper:.4f}]")
# Fit Cauchy to sample data
np.random.seed(42)
sample_data = cauchy_dist.rvs(size=1000)
fitted_loc, fitted_scale = stats.cauchy.fit(sample_data)
print(f"\nFitted parameters: loc={fitted_loc:.4f}, scale={fitted_scale:.4f}")
print(f"True parameters: loc=2.0000, scale=0.5000")
Note that the interval method returns symmetric quantiles around the median, not a confidence interval in the frequentist sense. For Cauchy distributions, the 90% interval is surprisingly wide because of the heavy tails.
Generating Cauchy Random Samples
There are two common methods for generating Cauchy samples: the inverse transform method and the ratio-of-normals method.
The inverse transform uses the fact that if $U \sim \text{Uniform}(0,1)$, then $x_0 + \gamma \tan(\pi(U - 0.5))$ follows a Cauchy distribution.
The ratio method exploits the property that if $X, Y \sim N(0,1)$ independently, then $X/Y$ follows a standard Cauchy distribution.
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
np.random.seed(42)
n_samples = 50000
loc, scale = 0, 1
# Method 1: SciPy's built-in generator
samples_scipy = stats.cauchy.rvs(loc=loc, scale=scale, size=n_samples)
# Method 2: Inverse transform method
u = np.random.uniform(0, 1, n_samples)
samples_inverse = loc + scale * np.tan(np.pi * (u - 0.5))
# Method 3: Ratio of independent normals
x_normal = np.random.standard_normal(n_samples)
y_normal = np.random.standard_normal(n_samples)
samples_ratio = loc + scale * (x_normal / y_normal)
# Compare histograms to theoretical PDF
fig, axes = plt.subplots(1, 3, figsize=(14, 4))
x = np.linspace(-10, 10, 500)
theoretical_pdf = stats.cauchy.pdf(x, loc=loc, scale=scale)
methods = [
('SciPy', samples_scipy),
('Inverse Transform', samples_inverse),
('Ratio of Normals', samples_ratio)
]
for ax, (name, samples) in zip(axes, methods):
# Clip extreme values for visualization
clipped = samples[(samples > -10) & (samples < 10)]
ax.hist(clipped, bins=100, density=True, alpha=0.7, label='Samples')
ax.plot(x, theoretical_pdf, 'r-', linewidth=2, label='Theoretical PDF')
ax.set_xlabel('x')
ax.set_ylabel('Density')
ax.set_title(f'{name} Method')
ax.legend()
ax.set_xlim(-10, 10)
plt.tight_layout()
plt.show()
All three methods produce equivalent results. SciPy uses optimized C code internally, so prefer stats.cauchy.rvs() for production code. The manual implementations are useful for understanding and for environments where SciPy isn’t available.
Parameter Estimation Challenges
Maximum likelihood estimation for Cauchy distributions works differently than for normal distributions. The likelihood function has a well-defined maximum, but there’s no closed-form solution. SciPy uses numerical optimization internally.
More importantly, moment-based estimators (sample mean and variance) are useless. Instead, use robust estimators:
- Location: Use the sample median, which estimates $x_0$ consistently.
- Scale: Use half the interquartile range, which estimates $\gamma$ consistently.
import numpy as np
from scipy import stats
np.random.seed(123)
true_loc, true_scale = 3.0, 2.0
# Generate samples
samples = stats.cauchy.rvs(loc=true_loc, scale=true_scale, size=5000)
# Method 1: SciPy's MLE fit
mle_loc, mle_scale = stats.cauchy.fit(samples)
# Method 2: Robust estimators
robust_loc = np.median(samples)
q75, q25 = np.percentile(samples, [75, 25])
robust_scale = (q75 - q25) / 2 # Half IQR estimates gamma
# Method 3: Demonstrate why moments fail
sample_mean = np.mean(samples)
sample_std = np.std(samples)
print("True parameters:")
print(f" Location: {true_loc}, Scale: {true_scale}")
print("\nMLE estimates (scipy.stats.cauchy.fit):")
print(f" Location: {mle_loc:.4f}, Scale: {mle_scale:.4f}")
print("\nRobust estimates (median, half-IQR):")
print(f" Location: {robust_loc:.4f}, Scale: {robust_scale:.4f}")
print("\nMoment-based estimates (WRONG approach):")
print(f" Sample mean: {sample_mean:.4f}")
print(f" Sample std: {sample_std:.4f}")
# Show instability of sample mean across multiple runs
print("\nSample means from 10 independent samples of size 1000:")
means = [np.mean(stats.cauchy.rvs(loc=true_loc, scale=true_scale, size=1000))
for _ in range(10)]
print(f" {[f'{m:.1f}' for m in means]}")
print(" Notice the wild variation - the sample mean doesn't converge!")
The output demonstrates that while MLE and robust estimators give stable, accurate results, the sample mean varies wildly between runs. This isn’t a bug—it’s the fundamental nature of Cauchy distributions.
Practical Applications and Simulations
The most striking property of the Cauchy distribution is that the law of large numbers doesn’t apply. For normal distributions, the sample mean converges to the population mean as sample size increases. For Cauchy, it doesn’t converge at all.
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
np.random.seed(42)
max_n = 10000
sample_sizes = np.arange(1, max_n + 1)
# Generate large samples once, then compute running means
normal_samples = stats.norm.rvs(loc=0, scale=1, size=max_n)
cauchy_samples = stats.cauchy.rvs(loc=0, scale=1, size=max_n)
# Compute cumulative means
normal_running_mean = np.cumsum(normal_samples) / sample_sizes
cauchy_running_mean = np.cumsum(cauchy_samples) / sample_sizes
fig, axes = plt.subplots(2, 1, figsize=(12, 8), sharex=True)
axes[0].plot(sample_sizes, normal_running_mean, linewidth=0.5)
axes[0].axhline(y=0, color='r', linestyle='--', label='True mean = 0')
axes[0].set_ylabel('Running Sample Mean')
axes[0].set_title('Normal Distribution: Sample Mean Converges')
axes[0].legend()
axes[0].set_ylim(-1, 1)
axes[1].plot(sample_sizes, cauchy_running_mean, linewidth=0.5)
axes[1].axhline(y=0, color='r', linestyle='--', label='Location parameter = 0')
axes[1].set_xlabel('Sample Size')
axes[1].set_ylabel('Running Sample Mean')
axes[1].set_title('Cauchy Distribution: Sample Mean Does NOT Converge')
axes[1].legend()
plt.tight_layout()
plt.show()
# Monte Carlo simulation: estimate probability of extreme events
n_simulations = 10000
threshold = 10
normal_extreme_prob = np.mean(np.abs(stats.norm.rvs(size=n_simulations)) > threshold)
cauchy_extreme_prob = np.mean(np.abs(stats.cauchy.rvs(size=n_simulations)) > threshold)
print(f"\nProbability of |X| > {threshold}:")
print(f" Normal: {normal_extreme_prob:.6f} (theoretical: {2*stats.norm.sf(threshold):.2e})")
print(f" Cauchy: {cauchy_extreme_prob:.4f} (theoretical: {2*stats.cauchy.sf(threshold):.4f})")
This simulation is crucial for anyone doing Monte Carlo work. If your underlying data has Cauchy-like tails, naive averaging won’t give you reliable estimates no matter how many samples you collect. You need robust aggregation methods like medians or trimmed means.
Summary and Best Practices
Use the Cauchy distribution when modeling phenomena with genuinely heavy tails: resonance peaks in spectroscopy, ratio-based financial metrics, or any system where extreme events occur far more frequently than a normal distribution predicts.
Key practices:
- Never use sample means or variances with Cauchy data. Use medians and interquartile ranges instead.
- Use
scipy.stats.cauchy.fit()for parameter estimation—it handles the numerical optimization correctly. - When visualizing Cauchy samples, clip extreme values or use log scales. Raw histograms will be dominated by outliers.
- In Monte Carlo simulations with heavy-tailed data, report medians and quantiles rather than means and standard deviations.
- Test your data for heavy tails before assuming normality. A simple check: if your sample mean changes dramatically when you remove a few points, you might have Cauchy-like behavior.
The Cauchy distribution teaches a valuable lesson: not all data behaves nicely. Building robust systems means understanding when standard assumptions fail and having the tools to handle those cases correctly.