How to Plot the Beta Distribution in R

The beta distribution is one of the most useful probability distributions in applied statistics, yet it often gets overlooked in introductory courses. It's a continuous distribution defined on the...

Key Insights

  • The beta distribution is defined on [0,1] and controlled by two shape parameters α and β, making it ideal for modeling probabilities, conversion rates, and Bayesian priors.
  • R provides four core functions for working with beta distributions: dbeta() for density, pbeta() for cumulative probability, qbeta() for quantiles, and rbeta() for random sampling.
  • Both base R’s curve() function and ggplot2’s stat_function() can plot beta distributions effectively, but ggplot2 offers superior control for publication-ready visualizations.

Introduction to the Beta Distribution

The beta distribution is one of the most useful probability distributions in applied statistics, yet it often gets overlooked in introductory courses. It’s a continuous distribution defined on the interval [0,1], which makes it naturally suited for modeling probabilities, proportions, and rates.

Two shape parameters control the beta distribution: α (alpha) and β (beta). When both parameters equal 1, you get a uniform distribution. As you increase α, the distribution shifts toward 1. Increase β, and it shifts toward 0. When both are greater than 1, you get a unimodal distribution. When both are less than 1, you get a U-shaped distribution with mass at the extremes.

This flexibility makes the beta distribution invaluable in several contexts. In Bayesian inference, it serves as the conjugate prior for binomial proportions—meaning your posterior is also a beta distribution, which simplifies calculations enormously. In A/B testing, you can model conversion rates as beta-distributed random variables. In project management, PERT analysis uses beta distributions to model task duration uncertainty.

Understanding how to visualize beta distributions helps you build intuition about these applications and communicate results effectively.

Understanding Beta Distribution Functions in R

R provides four functions for working with the beta distribution, following its standard naming convention for probability distributions:

# dbeta(): Probability density function (PDF)
# Returns the height of the density curve at a given point
dbeta(0.3, shape1 = 2, shape2 = 5)
# [1] 1.8522

# pbeta(): Cumulative distribution function (CDF)
# Returns P(X <= x), the probability of being at or below a value
pbeta(0.3, shape1 = 2, shape2 = 5)
# [1] 0.5798

# qbeta(): Quantile function (inverse CDF)
# Returns the value x such that P(X <= x) = p
qbeta(0.5, shape1 = 2, shape2 = 5)
# [1] 0.2826

# rbeta(): Random number generation
# Generates random samples from the distribution
set.seed(42)
rbeta(5, shape1 = 2, shape2 = 5)
# [1] 0.1532 0.3853 0.2248 0.2571 0.3307

For plotting, you’ll primarily use dbeta() since you want to visualize the density curve. The pbeta() function becomes useful when you need to shade regions representing probabilities. The qbeta() function helps when marking specific quantiles on your plots. And rbeta() is essential for simulations and histogram comparisons.

Basic Plotting with Base R

The fastest way to plot a beta distribution is using base R’s curve() function combined with dbeta():

# Simple beta distribution plot
curve(dbeta(x, shape1 = 2, shape2 = 5), 
      from = 0, to = 1,
      xlab = "x",
      ylab = "Density",
      main = "Beta(2, 5) Distribution",
      col = "steelblue",
      lwd = 2)

# Add a grid for readability
grid(col = "gray90")

The curve() function automatically creates a sequence of x values and evaluates the expression for each one. The from and to arguments define the range—for beta distributions, this should always be 0 to 1.

You can customize the appearance further:

curve(dbeta(x, shape1 = 2, shape2 = 5), 
      from = 0, to = 1,
      xlab = "Probability",
      ylab = "Density",
      main = "Beta(2, 5) Distribution",
      col = "#2E86AB",
      lwd = 3,
      las = 1,           # Horizontal axis labels
      bty = "l",         # L-shaped box
      xaxs = "i",        # No padding on x-axis
      yaxs = "i")        # No padding on y-axis

Comparing Multiple Beta Distributions

The real power of visualization comes when comparing different parameter combinations. This builds intuition about how α and β affect the distribution’s shape:

# Set up the plot with the first distribution
curve(dbeta(x, 0.5, 0.5), 
      from = 0, to = 1,
      ylim = c(0, 3),
      xlab = "x",
      ylab = "Density",
      main = "Beta Distributions with Different Parameters",
      col = "#E63946",
      lwd = 2,
      las = 1)

# Add additional distributions
curve(dbeta(x, 1, 1), add = TRUE, col = "#457B9D", lwd = 2)
curve(dbeta(x, 2, 2), add = TRUE, col = "#2A9D8F", lwd = 2)
curve(dbeta(x, 2, 5), add = TRUE, col = "#E9C46A", lwd = 2)
curve(dbeta(x, 5, 2), add = TRUE, col = "#9B5DE5", lwd = 2)

# Add legend
legend("top", 
       legend = c("Beta(0.5, 0.5)", "Beta(1, 1)", "Beta(2, 2)", 
                  "Beta(2, 5)", "Beta(5, 2)"),
       col = c("#E63946", "#457B9D", "#2A9D8F", "#E9C46A", "#9B5DE5"),
       lwd = 2,
       bty = "n",
       ncol = 2)

This comparison reveals key patterns: Beta(0.5, 0.5) creates a U-shape with mass at the extremes, Beta(1, 1) is uniform, Beta(2, 2) is symmetric and bell-shaped, and asymmetric parameters create skewed distributions.

Creating Publication-Ready Plots with ggplot2

For professional publications and reports, ggplot2 provides superior control over aesthetics. The stat_function() geom is your tool for plotting mathematical functions:

library(ggplot2)

# Create a data frame defining the distributions to plot
distributions <- data.frame(
  alpha = c(0.5, 1, 2, 2, 5),
  beta = c(0.5, 1, 2, 5, 2)
)
distributions$label <- paste0("Beta(", distributions$alpha, ", ", 
                               distributions$beta, ")")

# Build the plot
p <- ggplot(data.frame(x = c(0, 1)), aes(x = x)) +
  stat_function(fun = dbeta, args = list(shape1 = 0.5, shape2 = 0.5),
                aes(color = "Beta(0.5, 0.5)"), linewidth = 1) +
  stat_function(fun = dbeta, args = list(shape1 = 1, shape2 = 1),
                aes(color = "Beta(1, 1)"), linewidth = 1) +
  stat_function(fun = dbeta, args = list(shape1 = 2, shape2 = 2),
                aes(color = "Beta(2, 2)"), linewidth = 1) +
  stat_function(fun = dbeta, args = list(shape1 = 2, shape2 = 5),
                aes(color = "Beta(2, 5)"), linewidth = 1) +
  stat_function(fun = dbeta, args = list(shape1 = 5, shape2 = 2),
                aes(color = "Beta(5, 2)"), linewidth = 1) +
  scale_color_manual(
    name = "Distribution",
    values = c("Beta(0.5, 0.5)" = "#E63946",
               "Beta(1, 1)" = "#457B9D",
               "Beta(2, 2)" = "#2A9D8F",
               "Beta(2, 5)" = "#E9C46A",
               "Beta(5, 2)" = "#9B5DE5")
  ) +
  labs(x = "x", y = "Density",
       title = "Beta Distribution Shape Variations",
       subtitle = "How α and β parameters affect distribution shape") +
  coord_cartesian(ylim = c(0, 3)) +
  theme_minimal(base_size = 12) +
  theme(
    legend.position = "right",
    plot.title = element_text(face = "bold"),
    panel.grid.minor = element_blank()
  )

print(p)

For even cleaner code when plotting many distributions, you can use a programmatic approach:

library(ggplot2)
library(dplyr)

# Generate data for multiple distributions
params <- list(
  list(a = 0.5, b = 0.5),
  list(a = 2, b = 2),
  list(a = 2, b = 5),
  list(a = 5, b = 2)
)

x_seq <- seq(0.001, 0.999, length.out = 500)

plot_data <- do.call(rbind, lapply(params, function(p) {
  data.frame(
    x = x_seq,
    density = dbeta(x_seq, p$a, p$b),
    distribution = sprintf("Beta(%g, %g)", p$a, p$b)
  )
}))

ggplot(plot_data, aes(x = x, y = density, color = distribution)) +
  geom_line(linewidth = 1.1) +
  scale_color_brewer(palette = "Set1", name = "Distribution") +
  labs(x = "x", y = "Density") +
  theme_minimal() +
  theme(legend.position = "bottom")

Interactive Visualization with Shiny

For exploratory analysis and teaching, an interactive Shiny app lets users manipulate α and β in real-time:

library(shiny)
library(ggplot2)

ui <- fluidPage(
  titlePanel("Beta Distribution Explorer"),
  sidebarLayout(
    sidebarPanel(
      sliderInput("alpha", "α (alpha):", 
                  min = 0.1, max = 10, value = 2, step = 0.1),
      sliderInput("beta", "β (beta):", 
                  min = 0.1, max = 10, value = 5, step = 0.1),
      hr(),
      verbatimTextOutput("stats")
    ),
    mainPanel(
      plotOutput("betaPlot", height = "400px")
    )
  )
)

server <- function(input, output) {
  output$betaPlot <- renderPlot({
    ggplot(data.frame(x = c(0, 1)), aes(x = x)) +
      stat_function(fun = dbeta, 
                    args = list(shape1 = input$alpha, shape2 = input$beta),
                    color = "#2E86AB", linewidth = 1.5) +
      labs(title = sprintf("Beta(%.1f, %.1f)", input$alpha, input$beta),
           x = "x", y = "Density") +
      theme_minimal(base_size = 14)
  })
  
  output$stats <- renderPrint({
    mean_val <- input$alpha / (input$alpha + input$beta)
    mode_val <- ifelse(input$alpha > 1 & input$beta > 1,
                       (input$alpha - 1) / (input$alpha + input$beta - 2),
                       NA)
    cat(sprintf("Mean: %.3f\nMode: %s", 
                mean_val, 
                ifelse(is.na(mode_val), "undefined", sprintf("%.3f", mode_val))))
  })
}

shinyApp(ui, server)

Practical Example: Visualizing Bayesian Prior and Posterior

Here’s where beta distribution visualization becomes genuinely useful. Suppose you’re running an A/B test and want to estimate a conversion rate. You start with a prior belief (Beta(2, 2), representing mild uncertainty centered at 50%), then observe 10 successes out of 16 trials.

The posterior is Beta(2 + 10, 2 + 6) = Beta(12, 8):

library(ggplot2)

ggplot(data.frame(x = c(0, 1)), aes(x = x)) +
  # Prior distribution
  stat_function(fun = dbeta, args = list(shape1 = 2, shape2 = 2),
                aes(color = "Prior: Beta(2, 2)"), 
                linewidth = 1.2, linetype = "dashed") +
  # Posterior distribution
  stat_function(fun = dbeta, args = list(shape1 = 12, shape2 = 8),
                aes(color = "Posterior: Beta(12, 8)"), 
                linewidth = 1.2) +
  # Mark the MLE (10/16 = 0.625)
  geom_vline(xintercept = 10/16, linetype = "dotted", color = "gray40") +
  annotate("text", x = 0.625, y = 3.2, label = "MLE = 0.625", 
           hjust = -0.1, size = 3.5) +
  scale_color_manual(
    name = "",
    values = c("Prior: Beta(2, 2)" = "#457B9D", 
               "Posterior: Beta(12, 8)" = "#E63946")
  ) +
  labs(x = "Conversion Rate", y = "Density",
       title = "Bayesian Update: Prior to Posterior",
       subtitle = "After observing 10 successes in 16 trials") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom")

This visualization immediately communicates how the data updated your beliefs: the posterior is more concentrated (less uncertain) and shifted toward the observed success rate.

The beta distribution’s conjugacy with the binomial makes these visualizations straightforward to create and interpret. Master these plotting techniques, and you’ll have a powerful tool for communicating probabilistic reasoning in any domain where you’re modeling rates or proportions.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.