Weibull Distribution in R: Complete Guide

The Weibull distribution is a continuous probability distribution that models time-to-failure data better than almost any other distribution. Named after Swedish mathematician Waloddi Weibull, it's...

Key Insights

  • The Weibull distribution’s shape parameter determines whether failure rates increase, decrease, or remain constant over time—making it the go-to choice for reliability engineering and survival analysis.
  • R provides four core functions (dweibull, pweibull, qweibull, rweibull) that handle all Weibull calculations, while packages like fitdistrplus and survival enable real-world parameter estimation and modeling.
  • Always validate your Weibull fit using Q-Q plots and goodness-of-fit tests; a poor fit can lead to catastrophically wrong predictions about failure times or survival probabilities.

Introduction to the Weibull Distribution

The Weibull distribution is a continuous probability distribution that models time-to-failure data better than almost any other distribution. Named after Swedish mathematician Waloddi Weibull, it’s the workhorse of reliability engineering, survival analysis, and any domain where you need to predict when things break.

What makes Weibull special is its flexibility. By adjusting two parameters, you can model infant mortality (early failures), random failures, or wear-out failures. This single distribution captures failure patterns that would otherwise require multiple models.

Common applications include:

  • Manufacturing: Predicting product lifespans and warranty claims
  • Medical research: Modeling patient survival times
  • Engineering: Planning maintenance schedules for equipment
  • Wind energy: Characterizing wind speed distributions

Understanding Weibull Parameters

The Weibull distribution has two parameters that control its behavior:

Shape parameter (k): This determines the failure rate pattern:

  • k < 1: Decreasing failure rate (infant mortality)
  • k = 1: Constant failure rate (reduces to exponential distribution)
  • k > 1: Increasing failure rate (wear-out failures)

Scale parameter (λ): This stretches or compresses the distribution along the time axis. It represents the characteristic life—the time at which 63.2% of units have failed.

Let’s visualize how these parameters affect the distribution:

library(ggplot2)
library(dplyr)
library(tidyr)

# Create data for multiple Weibull curves
x <- seq(0, 5, length.out = 200)

# Varying shape parameter (scale = 1)
shape_data <- data.frame(
  x = rep(x, 4),
  density = c(
    dweibull(x, shape = 0.5, scale = 1),
    dweibull(x, shape = 1, scale = 1),
    dweibull(x, shape = 2, scale = 1),
    dweibull(x, shape = 5, scale = 1)
  ),
  parameter = rep(c("k = 0.5", "k = 1", "k = 2", "k = 5"), each = 200)
)

ggplot(shape_data, aes(x = x, y = density, color = parameter)) +
  geom_line(linewidth = 1.2) +
  labs(
    title = "Weibull PDF: Effect of Shape Parameter",
    x = "Time",
    y = "Density",
    color = "Shape (k)"
  ) +
  theme_minimal() +
  scale_color_brewer(palette = "Set1")

# Varying scale parameter (shape = 2)
scale_data <- data.frame(
  x = rep(x, 3),
  density = c(
    dweibull(x, shape = 2, scale = 0.5),
    dweibull(x, shape = 2, scale = 1),
    dweibull(x, shape = 2, scale = 2)
  ),
  parameter = rep(c("λ = 0.5", "λ = 1", "λ = 2"), each = 200)
)

ggplot(scale_data, aes(x = x, y = density, color = parameter)) +
  geom_line(linewidth = 1.2) +
  labs(
    title = "Weibull PDF: Effect of Scale Parameter",
    x = "Time",
    y = "Density",
    color = "Scale (λ)"
  ) +
  theme_minimal() +
  scale_color_brewer(palette = "Set2")

Core R Functions for Weibull Distribution

R provides four built-in functions following the standard d/p/q/r naming convention:

dweibull(): Probability Density Function

# Probability density at specific points
dweibull(x = 2, shape = 1.5, scale = 3)
# [1] 0.2218417

# Density across a range
x_vals <- seq(0, 10, by = 0.5)
densities <- dweibull(x_vals, shape = 1.5, scale = 3)

pweibull(): Cumulative Distribution Function

This calculates the probability that a failure occurs before time t:

# Probability of failure before time = 5
pweibull(q = 5, shape = 2, scale = 4)
# [1] 0.7768698

# Probability of surviving beyond time = 5
pweibull(q = 5, shape = 2, scale = 4, lower.tail = FALSE)
# [1] 0.2231302

# What percentage of components fail within warranty period (2 years)?
warranty_failures <- pweibull(2, shape = 1.8, scale = 10)
sprintf("%.1f%% of components fail within warranty", warranty_failures * 100)

qweibull(): Quantile Function

The inverse of the CDF—find the time at which a given proportion has failed:

# Time at which 50% of units have failed (median lifetime)
qweibull(p = 0.5, shape = 2, scale = 4)
# [1] 3.330169

# Time at which 90% have failed
qweibull(p = 0.9, shape = 2, scale = 4)
# [1] 6.070184

# Calculate B10 life (time at which 10% have failed)
b10_life <- qweibull(0.1, shape = 2.5, scale = 1000)
sprintf("B10 life: %.1f hours", b10_life)

rweibull(): Random Number Generation

Generate random samples from a Weibull distribution:

set.seed(42)

# Generate 1000 random failure times
failure_times <- rweibull(n = 1000, shape = 2, scale = 100)

# Summary statistics
summary(failure_times)
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#   2.016  62.020  86.740  88.620 112.500 217.100

Fitting Weibull Distributions to Data

Real-world analysis starts with data, not parameters. Here’s how to estimate Weibull parameters from observed failure times.

Using MASS::fitdistr()

library(MASS)

# Simulate failure data (true shape = 2, scale = 100)
set.seed(123)
observed_failures <- rweibull(200, shape = 2, scale = 100)

# Fit Weibull distribution
fit_mass <- fitdistr(observed_failures, "weibull")
print(fit_mass)
#      shape        scale   
#   2.0547925   99.7tried546 
#  (0.1082345) (3.5012789)

# Extract parameters
estimated_shape <- fit_mass$estimate["shape"]
estimated_scale <- fit_mass$estimate["scale"]

Using fitdistrplus::fitdist()

The fitdistrplus package provides more diagnostic tools:

library(fitdistrplus)

# Fit with additional diagnostics
fit_plus <- fitdist(observed_failures, "weibull")
summary(fit_plus)

# Compare estimation methods
fit_mle <- fitdist(observed_failures, "weibull", method = "mle")
fit_mme <- fitdist(observed_failures, "weibull", method = "mme")

# Goodness-of-fit statistics
gofstat(list(fit_mle, fit_mme), 
        fitnames = c("MLE", "MME"))

Visualizing Weibull Distributions

Publication-quality visualizations are essential for communicating results.

Histogram with Fitted Overlay

library(ggplot2)

# Create histogram with fitted Weibull overlay
ggplot(data.frame(x = observed_failures), aes(x = x)) +
  geom_histogram(aes(y = after_stat(density)), 
                 bins = 30, 
                 fill = "steelblue", 
                 alpha = 0.7) +
  stat_function(
    fun = dweibull,
    args = list(shape = fit_plus$estimate["shape"],
                scale = fit_plus$estimate["scale"]),
    color = "red",
    linewidth = 1.2
  ) +
  labs(
    title = "Observed Failures with Fitted Weibull Distribution",
    x = "Time to Failure",
    y = "Density"
  ) +
  theme_minimal()

Q-Q Plot for Goodness-of-Fit

# Q-Q plot using fitdistrplus
plot(fit_plus, which = "qq")

# Custom Q-Q plot with ggplot2
n <- length(observed_failures)
theoretical_quantiles <- qweibull(
  ppoints(n),
  shape = fit_plus$estimate["shape"],
  scale = fit_plus$estimate["scale"]
)

qq_data <- data.frame(
  theoretical = theoretical_quantiles,
  observed = sort(observed_failures)
)

ggplot(qq_data, aes(x = theoretical, y = observed)) +
  geom_point(alpha = 0.6) +
  geom_abline(slope = 1, intercept = 0, color = "red", linewidth = 1) +
  labs(
    title = "Q-Q Plot: Weibull Fit Assessment",
    x = "Theoretical Quantiles",
    y = "Observed Quantiles"
  ) +
  theme_minimal()

CDF Comparison Plot

# Empirical vs fitted CDF
plot(fit_plus, which = "cdf")

Practical Application: Survival Analysis

Let’s work through a complete survival analysis using the Weibull distribution.

library(survival)
library(survminer)

# Simulate patient survival data
set.seed(456)
n_patients <- 150

survival_data <- data.frame(
  time = rweibull(n_patients, shape = 1.2, scale = 24),
  status = sample(c(0, 1), n_patients, replace = TRUE, prob = c(0.2, 0.8)),
  treatment = rep(c("Control", "Treatment"), each = n_patients/2)
)

# Create survival object
surv_obj <- Surv(time = survival_data$time, event = survival_data$status)

# Fit Weibull regression model
weibull_model <- survreg(
  surv_obj ~ treatment,
  data = survival_data,
  dist = "weibull"
)

summary(weibull_model)

# Extract Weibull parameters
# Note: survreg uses a different parameterization
scale_param <- exp(coef(weibull_model)[1])
shape_param <- 1 / weibull_model$scale

cat("Estimated shape:", round(shape_param, 3), "\n")
cat("Estimated scale (Control):", round(scale_param, 3), "\n")

Hazard Function Analysis

The hazard function shows the instantaneous failure rate:

# Weibull hazard function: h(t) = (k/λ) * (t/λ)^(k-1)
weibull_hazard <- function(t, shape, scale) {
  (shape / scale) * (t / scale)^(shape - 1)
}

# Plot hazard functions for different shapes
t_vals <- seq(0.1, 10, length.out = 100)

hazard_data <- data.frame(
  t = rep(t_vals, 3),
  hazard = c(
    weibull_hazard(t_vals, shape = 0.5, scale = 2),
    weibull_hazard(t_vals, shape = 1, scale = 2),
    weibull_hazard(t_vals, shape = 2, scale = 2)
  ),
  type = rep(c("Decreasing (k=0.5)", "Constant (k=1)", "Increasing (k=2)"), 
             each = 100)
)

ggplot(hazard_data, aes(x = t, y = hazard, color = type)) +
  geom_line(linewidth = 1.2) +
  labs(
    title = "Weibull Hazard Functions",
    x = "Time",
    y = "Hazard Rate",
    color = "Failure Pattern"
  ) +
  theme_minimal() +
  coord_cartesian(ylim = c(0, 2))

Summary and Best Practices

The Weibull distribution earns its place as the default choice for time-to-event modeling. Here’s what to remember:

When to use Weibull:

  • Modeling failure times or survival data
  • When failure rates change over time (unlike exponential)
  • Reliability engineering and warranty analysis
  • Wind speed and other natural phenomena

When to consider alternatives:

  • Log-normal: When multiplicative effects dominate
  • Gamma: For waiting times with integer shape parameters
  • Exponential: When failure rate is truly constant (simpler model)

Common pitfalls to avoid:

  1. Ignoring censored data: Use the survival package for censored observations—don’t just drop them.
  2. Assuming Weibull without checking: Always validate with Q-Q plots and goodness-of-fit tests.
  3. Confusing parameterizations: R uses shape/scale, but some software uses shape/rate. Check documentation.
  4. Extrapolating beyond data range: Weibull predictions become unreliable far outside observed time ranges.

The Weibull distribution’s flexibility comes from its two-parameter structure. Master the interpretation of shape and scale, use R’s built-in functions for calculations, and always validate your fits visually. Your reliability predictions will be far more trustworthy as a result.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.