Weibull Distribution in Python: Complete Guide

The Weibull distribution is the workhorse of reliability engineering and survival analysis. Named after Swedish mathematician Waloddi Weibull, it models time-to-failure data with remarkable...

Key Insights

  • The Weibull distribution’s shape parameter (k) determines failure behavior: k < 1 indicates early-life failures, k = 1 gives constant failure rate (exponential), and k > 1 models wear-out failures—making it essential for reliability engineering.
  • SciPy’s weibull_min uses a different parameterization than textbooks; understanding the c (shape) and scale parameters prevents common implementation errors.
  • Parameter estimation via Maximum Likelihood Estimation (MLE) with scipy.stats.weibull_min.fit() provides production-ready Weibull fitting, but always validate fits with probability plots and goodness-of-fit tests.

Introduction to the Weibull Distribution

The Weibull distribution is the workhorse of reliability engineering and survival analysis. Named after Swedish mathematician Waloddi Weibull, it models time-to-failure data with remarkable flexibility. Unlike the normal distribution, Weibull handles the asymmetric, bounded-at-zero nature of lifetime data naturally.

Two parameters control its behavior: the shape parameter (k, also called β) and the scale parameter (λ). The shape parameter determines whether failure rates increase, decrease, or remain constant over time. The scale parameter stretches or compresses the distribution along the time axis, representing characteristic life—the time by which approximately 63.2% of units will have failed.

You’ll encounter Weibull distributions in manufacturing quality control, wind speed modeling, survival analysis in medicine, and anywhere you need to predict when things break.

Mathematical Foundation

The Weibull probability density function (PDF) is:

$$f(x; k, \lambda) = \frac{k}{\lambda}\left(\frac{x}{\lambda}\right)^{k-1} e^{-(x/\lambda)^k}$$

The cumulative distribution function (CDF) has a clean closed form:

$$F(x; k, \lambda) = 1 - e^{-(x/\lambda)^k}$$

The hazard function (instantaneous failure rate) reveals why Weibull is so useful:

$$h(x) = \frac{k}{\lambda}\left(\frac{x}{\lambda}\right)^{k-1}$$

When k = 1, you get a constant hazard rate—the exponential distribution. When k = 2, you get the Rayleigh distribution. This flexibility from a single parameter family is Weibull’s superpower.

Key statistical properties:

  • Mean: $\lambda \cdot \Gamma(1 + 1/k)$
  • Variance: $\lambda^2 \left[\Gamma(1 + 2/k) - \Gamma^2(1 + 1/k)\right]$
  • Median: $\lambda \cdot (\ln 2)^{1/k}$
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from scipy.special import gamma

# Visualize how shape parameter affects the distribution
x = np.linspace(0, 3, 500)
scale = 1.0  # lambda

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

shape_params = [0.5, 1.0, 1.5, 2.0, 3.0]
colors = plt.cm.viridis(np.linspace(0, 0.9, len(shape_params)))

for k, color in zip(shape_params, colors):
    # PDF
    pdf = stats.weibull_min.pdf(x, c=k, scale=scale)
    axes[0].plot(x, pdf, label=f'k={k}', color=color, linewidth=2)
    
    # CDF
    cdf = stats.weibull_min.cdf(x, c=k, scale=scale)
    axes[1].plot(x, cdf, label=f'k={k}', color=color, linewidth=2)

axes[0].set_xlabel('x')
axes[0].set_ylabel('f(x)')
axes[0].set_title('Weibull PDF')
axes[0].legend()
axes[0].set_ylim(0, 2.5)

axes[1].set_xlabel('x')
axes[1].set_ylabel('F(x)')
axes[1].set_title('Weibull CDF')
axes[1].legend()

plt.tight_layout()
plt.savefig('weibull_pdf_cdf.png', dpi=150)
plt.show()

Implementing Weibull in Python with SciPy

SciPy’s scipy.stats.weibull_min is your primary tool for Weibull analysis. A critical warning: SciPy parameterizes the Weibull differently than most textbooks. The c parameter is the shape (k), and scale is λ. There’s no loc parameter in the standard Weibull, but SciPy includes it for shifting—set it to 0 for the standard two-parameter Weibull.

from scipy import stats
import numpy as np

# Define Weibull parameters
shape = 2.0  # k (shape parameter)
scale = 100.0  # lambda (scale parameter, e.g., hours)

# Create a frozen distribution object
weibull = stats.weibull_min(c=shape, scale=scale)

# Generate random samples (e.g., simulated failure times)
np.random.seed(42)
samples = weibull.rvs(size=1000)

print(f"Sample mean: {samples.mean():.2f}")
print(f"Theoretical mean: {weibull.mean():.2f}")
print(f"Sample std: {samples.std():.2f}")
print(f"Theoretical std: {weibull.std():.2f}")

# Probability calculations
time = 80  # hours

# What's the probability of failure before 80 hours?
prob_failure = weibull.cdf(time)
print(f"\nP(T < {time}) = {prob_failure:.4f}")

# What's the survival probability at 80 hours?
survival_prob = weibull.sf(time)  # sf = survival function = 1 - cdf
print(f"P(T > {time}) = {survival_prob:.4f}")

# What time corresponds to 10% failure probability?
time_10pct = weibull.ppf(0.10)  # percent point function (inverse CDF)
print(f"B10 life (10% failure): {time_10pct:.2f} hours")

# PDF value at a specific point
pdf_value = weibull.pdf(time)
print(f"PDF at t={time}: {pdf_value:.6f}")

Parameter Estimation from Data

Real-world analysis starts with data, not parameters. Maximum Likelihood Estimation (MLE) finds the parameters that make your observed data most probable. SciPy’s fit() method handles this automatically.

import numpy as np
from scipy import stats

# Simulate realistic failure time data (in practice, this comes from testing)
np.random.seed(123)
true_shape = 2.5
true_scale = 150.0
failure_times = stats.weibull_min.rvs(c=true_shape, scale=true_scale, size=50)

# Fit Weibull distribution to data
# floc=0 fixes location at 0 (standard 2-parameter Weibull)
shape_fit, loc_fit, scale_fit = stats.weibull_min.fit(failure_times, floc=0)

print(f"True parameters: shape={true_shape}, scale={true_scale}")
print(f"Fitted parameters: shape={shape_fit:.3f}, scale={scale_fit:.3f}")

# Goodness-of-fit: Kolmogorov-Smirnov test
ks_statistic, p_value = stats.kstest(
    failure_times, 
    'weibull_min', 
    args=(shape_fit, loc_fit, scale_fit)
)
print(f"\nKolmogorov-Smirnov test:")
print(f"KS statistic: {ks_statistic:.4f}")
print(f"p-value: {p_value:.4f}")

if p_value > 0.05:
    print("Fail to reject null hypothesis: data consistent with Weibull")
else:
    print("Reject null hypothesis: data may not follow Weibull")

# Anderson-Darling test (more sensitive to tail behavior)
# Note: scipy's anderson doesn't directly support Weibull, 
# so we transform to exponential
transformed = (failure_times / scale_fit) ** shape_fit
ad_result = stats.anderson(transformed, dist='expon')
print(f"\nAnderson-Darling statistic: {ad_result.statistic:.4f}")

Practical Application: Reliability Analysis

Let’s work through a complete reliability engineering scenario. You’re analyzing bearing failures in industrial equipment, and you have 30 recorded failure times in thousands of hours.

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Bearing failure times (thousands of hours)
failure_times = np.array([
    2.1, 3.4, 4.2, 5.1, 5.8, 6.3, 7.1, 7.9, 8.4, 9.2,
    9.8, 10.5, 11.2, 12.1, 13.0, 14.2, 15.1, 16.5, 18.2, 19.8,
    21.5, 23.4, 26.1, 29.3, 33.2, 38.1, 44.5, 52.3, 65.2, 89.1
])

# Fit Weibull distribution
shape, loc, scale = stats.weibull_min.fit(failure_times, floc=0)
weibull = stats.weibull_min(c=shape, scale=scale)

print("="*50)
print("RELIABILITY ANALYSIS REPORT")
print("="*50)
print(f"\nWeibull Parameters:")
print(f"  Shape (β): {shape:.3f}")
print(f"  Scale (η): {scale:.3f} thousand hours")

# Interpret shape parameter
if shape < 1:
    failure_mode = "Infant mortality (decreasing failure rate)"
elif shape == 1:
    failure_mode = "Random failures (constant failure rate)"
else:
    failure_mode = "Wear-out failures (increasing failure rate)"
print(f"  Failure mode: {failure_mode}")

# Key reliability metrics
mttf = weibull.mean()
median_life = weibull.median()
b10_life = weibull.ppf(0.10)  # 10% failure
b50_life = weibull.ppf(0.50)  # 50% failure (median)

print(f"\nReliability Metrics:")
print(f"  MTTF (Mean Time To Failure): {mttf:.2f} thousand hours")
print(f"  Median Life (B50): {median_life:.2f} thousand hours")
print(f"  B10 Life (90% survival): {b10_life:.2f} thousand hours")

# Reliability at specific times
warranty_period = 5.0  # thousand hours
design_life = 20.0

r_warranty = weibull.sf(warranty_period)
r_design = weibull.sf(design_life)

print(f"\nSurvival Probabilities:")
print(f"  R({warranty_period}k hrs) = {r_warranty:.4f} ({r_warranty*100:.1f}% survive warranty)")
print(f"  R({design_life}k hrs) = {r_design:.4f} ({r_design*100:.1f}% survive design life)")

# Hazard rate at design life
hazard = (shape / scale) * (design_life / scale) ** (shape - 1)
print(f"\nHazard rate at {design_life}k hrs: {hazard:.4f} failures per thousand hours")

# Plot survival curve
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

t = np.linspace(0, 100, 500)

# Survival function
axes[0].plot(t, weibull.sf(t), 'b-', linewidth=2, label='Fitted Weibull')
axes[0].axhline(y=0.9, color='g', linestyle='--', alpha=0.7, label='90% Reliability')
axes[0].axvline(x=b10_life, color='g', linestyle='--', alpha=0.7)
axes[0].set_xlabel('Time (thousand hours)')
axes[0].set_ylabel('Reliability R(t)')
axes[0].set_title('Survival Curve')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Hazard function
hazard_values = (shape / scale) * (t / scale) ** (shape - 1)
axes[1].plot(t, hazard_values, 'r-', linewidth=2)
axes[1].set_xlabel('Time (thousand hours)')
axes[1].set_ylabel('Hazard rate h(t)')
axes[1].set_title('Hazard Function (Failure Rate)')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('reliability_analysis.png', dpi=150)
plt.show()

Visualization Techniques

Weibull probability plots are essential for assessing whether your data follows a Weibull distribution. On a properly scaled plot, Weibull-distributed data appears as a straight line.

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

def weibull_probability_plot(data, ax=None):
    """Create a Weibull probability plot."""
    if ax is None:
        fig, ax = plt.subplots(figsize=(8, 6))
    
    # Sort data and compute plotting positions (median rank)
    sorted_data = np.sort(data)
    n = len(data)
    # Bernard's approximation for median rank
    ranks = (np.arange(1, n + 1) - 0.3) / (n + 0.4)
    
    # Transform for Weibull plot: ln(t) vs ln(ln(1/(1-F)))
    x = np.log(sorted_data)
    y = np.log(-np.log(1 - ranks))
    
    # Fit line to transformed data
    slope, intercept = np.polyfit(x, y, 1)
    
    # Back-calculate Weibull parameters
    shape_est = slope
    scale_est = np.exp(-intercept / slope)
    
    # Plot data and fitted line
    ax.scatter(x, y, c='blue', alpha=0.6, label='Data')
    x_fit = np.linspace(x.min() - 0.5, x.max() + 0.5, 100)
    y_fit = slope * x_fit + intercept
    ax.plot(x_fit, y_fit, 'r-', linewidth=2, 
            label=f'Fit: β={shape_est:.2f}, η={scale_est:.2f}')
    
    ax.set_xlabel('ln(Time)')
    ax.set_ylabel('ln(ln(1/(1-F)))')
    ax.set_title('Weibull Probability Plot')
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    return shape_est, scale_est

# Example with our failure data
failure_times = np.array([
    2.1, 3.4, 4.2, 5.1, 5.8, 6.3, 7.1, 7.9, 8.4, 9.2,
    9.8, 10.5, 11.2, 12.1, 13.0, 14.2, 15.1, 16.5, 18.2, 19.8,
    21.5, 23.4, 26.1, 29.3, 33.2, 38.1, 44.5, 52.3, 65.2, 89.1
])

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Probability plot
shape_graphical, scale_graphical = weibull_probability_plot(failure_times, axes[0])

# Compare empirical vs fitted CDF
shape_mle, _, scale_mle = stats.weibull_min.fit(failure_times, floc=0)

sorted_data = np.sort(failure_times)
empirical_cdf = np.arange(1, len(failure_times) + 1) / len(failure_times)

t = np.linspace(0, 100, 500)
fitted_cdf = stats.weibull_min.cdf(t, c=shape_mle, scale=scale_mle)

axes[1].step(sorted_data, empirical_cdf, where='post', 
             label='Empirical CDF', linewidth=2)
axes[1].plot(t, fitted_cdf, 'r-', label='Fitted Weibull', linewidth=2)
axes[1].set_xlabel('Time (thousand hours)')
axes[1].set_ylabel('Cumulative Probability')
axes[1].set_title('Empirical vs Fitted Distribution')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('weibull_validation.png', dpi=150)
plt.show()

print(f"Graphical estimation: shape={shape_graphical:.3f}, scale={scale_graphical:.3f}")
print(f"MLE estimation: shape={shape_mle:.3f}, scale={scale_mle:.3f}")

Summary and Further Resources

The Weibull distribution is indispensable for reliability engineering and survival analysis. Here’s what you need to remember:

Core SciPy functions:

  • stats.weibull_min(c, scale) — create distribution object
  • .rvs(size) — generate random samples
  • .pdf(x), .cdf(x), .sf(x) — probability functions
  • .fit(data, floc=0) — parameter estimation
  • .ppf(q) — quantile function (inverse CDF)

Parameterization warning: SciPy uses c for shape and scale for scale. Always set floc=0 when fitting standard two-parameter Weibull.

For basic random number generation without the full statistical machinery, NumPy offers numpy.random.weibull(a, size), but note it only provides shape parameter control with scale=1.

For production reliability engineering, consider the reliability package (pip install reliability), which provides specialized tools including Weibull mixture models, competing risks analysis, and proper handling of censored data—common in real-world testing where not all units have failed by the end of observation.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.