How to Plot the Binomial Distribution in R

The binomial distribution models a simple but powerful scenario: you run n independent trials, each with the same probability p of success, and count how many successes you get. Coin flips, A/B test...

Key Insights

  • The binomial distribution has four core R functions (dbinom, pbinom, qbinom, rbinom) that serve distinct purposes—master these before attempting any visualization.
  • Base R’s barplot() works fine for quick PMF plots, but ggplot2 gives you the control needed for publication-quality visualizations and multi-distribution comparisons.
  • Always annotate your plots with the theoretical mean (n×p) and standard deviation (√(np(1-p))) to provide statistical context and catch potential errors in your analysis.

Introduction to the Binomial Distribution

The binomial distribution models a simple but powerful scenario: you run n independent trials, each with the same probability p of success, and count how many successes you get. Coin flips, A/B test conversions, defective items in manufacturing—these all follow binomial distributions.

Two parameters define the distribution completely:

  • n: the number of trials (must be a positive integer)
  • p: the probability of success on each trial (between 0 and 1)

If you’re running 20 trials with a 30% success rate, you’re working with Binomial(n=20, p=0.3). The distribution tells you the probability of getting exactly 0 successes, exactly 1 success, and so on up to 20.

Understanding how to visualize this distribution is essential for communicating statistical concepts, validating your models, and exploring how parameter changes affect outcomes.

Core R Functions for Binomial Distribution

R provides four functions for working with the binomial distribution. Each serves a specific purpose, and you’ll use all of them when creating comprehensive visualizations.

# dbinom: Probability Mass Function (PMF)
# "What's the probability of exactly k successes?"
dbinom(x = 5, size = 10, prob = 0.5)  # P(X = 5) when n=10, p=0.5
# [1] 0.2460938

# pbinom: Cumulative Distribution Function (CDF)
# "What's the probability of k or fewer successes?"
pbinom(q = 5, size = 10, prob = 0.5)  # P(X <= 5)
# [1] 0.6230469

# qbinom: Quantile Function (inverse CDF)
# "How many successes correspond to the pth percentile?"
qbinom(p = 0.5, size = 10, prob = 0.5)  # Median
# [1] 5

# rbinom: Random Generation
# "Generate random samples from this distribution"
rbinom(n = 10, size = 10, prob = 0.5)  # 10 random samples
# [1] 6 4 5 7 3 5 6 4 5 8

The naming convention follows R’s standard: d for density (probability), p for cumulative probability, q for quantile, and r for random. This pattern applies to all distributions in R.

Plotting the Probability Mass Function (PMF)

The PMF shows the probability of each possible outcome. Since the binomial distribution is discrete, a bar plot is the appropriate visualization—not a continuous line.

Base R Approach

# Define parameters
n <- 20
p <- 0.3

# Calculate probabilities for all possible outcomes (0 to n)
x <- 0:n
probabilities <- dbinom(x, size = n, prob = p)

# Create the bar plot
barplot(
  probabilities,
  names.arg = x,
  col = "steelblue",
  border = "white",
  main = paste0("Binomial Distribution (n = ", n, ", p = ", p, ")"),
  xlab = "Number of Successes",
  ylab = "Probability",
  ylim = c(0, max(probabilities) * 1.1)
)

This gets the job done, but barplot() has limitations. The x-axis spacing can be awkward, and customization requires fighting against the function’s defaults.

ggplot2 Approach

library(ggplot2)

# Create a data frame (ggplot2 requires this)
n <- 20
p <- 0.3

df <- data.frame(
  x = 0:n,
  probability = dbinom(0:n, size = n, prob = p)
)

# Build the plot
ggplot(df, aes(x = x, y = probability)) +
  geom_col(fill = "steelblue", color = "white", width = 0.8) +
  scale_x_continuous(breaks = seq(0, n, by = 2)) +
  labs(
    title = paste0("Binomial Distribution (n = ", n, ", p = ", p, ")"),
    x = "Number of Successes",
    y = "Probability"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
    panel.grid.minor = element_blank()
  )

The ggplot2 version gives you precise control over aesthetics and scales naturally when you need to add layers or create faceted plots.

Plotting the Cumulative Distribution Function (CDF)

The CDF answers “what’s the probability of getting k or fewer successes?” This is useful for calculating p-values and understanding the distribution’s cumulative behavior.

Since the binomial distribution is discrete, the CDF is a step function—it jumps at each integer value and stays flat between them.

# Base R step plot
n <- 20
p <- 0.3

x <- 0:n
cdf <- pbinom(x, size = n, prob = p)

# Create step plot
plot(
  x, cdf,
  type = "s",  # Step function
  col = "darkred",
  lwd = 2,
  main = paste0("Binomial CDF (n = ", n, ", p = ", p, ")"),
  xlab = "Number of Successes",
  ylab = "Cumulative Probability",
  ylim = c(0, 1)
)

# Add points at the actual values
points(x, cdf, pch = 19, col = "darkred", cex = 0.8)

# Add horizontal reference lines
abline(h = c(0.25, 0.5, 0.75), lty = 2, col = "gray60")

For a ggplot2 version, use geom_step():

library(ggplot2)

df <- data.frame(
  x = 0:n,
  cdf = pbinom(0:n, size = n, prob = p)
)

ggplot(df, aes(x = x, y = cdf)) +
  geom_step(color = "darkred", linewidth = 1) +
  geom_point(color = "darkred", size = 2) +
  geom_hline(yintercept = c(0.25, 0.5, 0.75), linetype = "dashed", color = "gray60") +
  scale_y_continuous(limits = c(0, 1), breaks = seq(0, 1, 0.25)) +
  labs(
    title = paste0("Binomial CDF (n = ", n, ", p = ", p, ")"),
    x = "Number of Successes",
    y = "Cumulative Probability"
  ) +
  theme_minimal()

Comparing Multiple Distributions

Real analysis often requires comparing distributions with different parameters. How does increasing n affect the spread? What happens when p moves toward 0.5?

Overlaid Distributions

library(ggplot2)
library(dplyr)

# Define multiple parameter sets
params <- list(
  list(n = 20, p = 0.3),
  list(n = 20, p = 0.5),
  list(n = 20, p = 0.7)
)

# Build combined data frame
df_combined <- do.call(rbind, lapply(params, function(par) {
  data.frame(
    x = 0:par$n,
    probability = dbinom(0:par$n, size = par$n, prob = par$p),
    params = paste0("n = ", par$n, ", p = ", par$p)
  )
}))

# Plot with different colors
ggplot(df_combined, aes(x = x, y = probability, fill = params)) +
  geom_col(position = "dodge", width = 0.7, alpha = 0.8) +
  scale_fill_brewer(palette = "Set1") +
  labs(
    title = "Comparing Binomial Distributions with Different p Values",
    x = "Number of Successes",
    y = "Probability",
    fill = "Parameters"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

Faceted Comparison

When comparing many distributions, faceting prevents visual clutter:

# Compare effect of n with fixed p
params <- list(
  list(n = 10, p = 0.5),
  list(n = 30, p = 0.5),
  list(n = 50, p = 0.5),
  list(n = 100, p = 0.5)
)

df_facet <- do.call(rbind, lapply(params, function(par) {
  data.frame(
    x = 0:par$n,
    probability = dbinom(0:par$n, size = par$n, prob = par$p),
    n = par$n
  )
}))

ggplot(df_facet, aes(x = x, y = probability)) +
  geom_col(fill = "steelblue", width = 0.8) +
  facet_wrap(~ paste("n =", n), scales = "free", ncol = 2) +
  labs(
    title = "Effect of Sample Size on Binomial Distribution (p = 0.5)",
    x = "Number of Successes",
    y = "Probability"
  ) +
  theme_minimal()

Notice how the distribution becomes more bell-shaped and relatively narrower as n increases—this is the normal approximation emerging.

Adding Statistical Annotations

Raw plots are incomplete without statistical context. At minimum, mark the expected value. For deeper analysis, add standard deviation bounds and highlight probability regions.

library(ggplot2)

n <- 30
p <- 0.4

# Calculate theoretical statistics
mean_val <- n * p
sd_val <- sqrt(n * p * (1 - p))

df <- data.frame(
  x = 0:n,
  probability = dbinom(0:n, size = n, prob = p)
)

# Identify region of interest (e.g., within 1 SD of mean)
df$region <- ifelse(
  df$x >= (mean_val - sd_val) & df$x <= (mean_val + sd_val),
  "Within 1 SD",
  "Outside 1 SD"
)

ggplot(df, aes(x = x, y = probability, fill = region)) +
  geom_col(width = 0.8, alpha = 0.8) +
  scale_fill_manual(values = c("Within 1 SD" = "steelblue", "Outside 1 SD" = "gray70")) +
  
  # Add mean line
  geom_vline(xintercept = mean_val, color = "red", linewidth = 1, linetype = "solid") +
  
  # Add SD boundaries
  geom_vline(xintercept = c(mean_val - sd_val, mean_val + sd_val), 
             color = "darkred", linewidth = 0.8, linetype = "dashed") +
  
  # Annotate
  annotate("text", x = mean_val + 1, y = max(df$probability) * 0.95,
           label = paste0("μ = ", round(mean_val, 2)), color = "red", hjust = 0) +
  annotate("text", x = mean_val + sd_val + 0.5, y = max(df$probability) * 0.85,
           label = paste0("σ = ", round(sd_val, 2)), color = "darkred", hjust = 0) +
  
  labs(
    title = paste0("Binomial Distribution (n = ", n, ", p = ", p, ")"),
    subtitle = "Red line = mean, dashed lines = ±1 standard deviation",
    x = "Number of Successes",
    y = "Probability",
    fill = NULL
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

This visualization immediately communicates where most probability mass lies and provides the theoretical parameters for validation.

Conclusion

Plotting binomial distributions in R comes down to choosing the right tool for your purpose:

  • PMF bar plots show the probability of each outcome—use these when explaining the distribution or comparing specific probabilities.
  • CDF step plots show cumulative probabilities—use these for p-value calculations or when “at most k” questions matter.
  • Multi-distribution comparisons reveal how parameters affect shape—essential for teaching or exploratory analysis.
  • Annotated plots add statistical rigor—always include these in reports or publications.

For large n (typically n > 30) with p not too close to 0 or 1, the binomial distribution approximates a normal distribution with mean np and variance np(1-p). When you see your faceted plots becoming increasingly bell-shaped, you’re watching this convergence happen. This connection matters for hypothesis testing and confidence intervals, where normal approximations simplify calculations.

Start with base R for quick exploration, but invest in ggplot2 fluency for anything you’ll share. The grammar of graphics pays dividends when your visualization requirements inevitably grow more complex.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.