How to Plot the Poisson Distribution in R

Key Insights

The Poisson distribution models count data and rare events using a single parameter (λ), and R provides dpois(), ppois(), and rpois() for working with probability mass, cumulative distribution, and random sampling respectively.
Bar plots are the correct visualization choice for Poisson data because it’s a discrete distribution—avoid line plots or smooth curves that imply continuity between integer values.
Comparing multiple λ values on a single plot reveals how the distribution shifts from right-skewed (low λ) toward symmetric (high λ), which is essential for understanding your data’s behavior.

Introduction to the Poisson Distribution

The Poisson distribution models the number of events occurring in a fixed interval of time or space. Think customer arrivals per hour, server errors per day, or radioactive decay events per second. It’s the go-to distribution when you’re counting discrete occurrences of relatively rare, independent events.

The distribution has one parameter: lambda (λ), which represents both the mean and variance. This single-parameter simplicity makes Poisson particularly elegant—if your count data has roughly equal mean and variance, Poisson is likely a good fit.

The probability mass function gives you the probability of observing exactly k events:

P(X = k) = (λ^k × e^(-λ)) / k!

R handles this math for you. Your job is understanding when to use it and how to visualize it effectively.

Generating Poisson Data in R

R provides three core functions for the Poisson distribution:

dpois(x, lambda): Returns the probability mass function (PMF) value
ppois(q, lambda): Returns the cumulative distribution function (CDF) value
rpois(n, lambda): Generates random samples

Here’s how to use each:

# Probability of exactly 3 events when lambda = 5
dpois(3, lambda = 5)
# [1] 0.1403739

# Probability of 3 or fewer events when lambda = 5
ppois(3, lambda = 5)
# [1] 0.2650259

# Generate 1000 random samples from Poisson(lambda = 5)
set.seed(42)
samples <- rpois(1000, lambda = 5)
head(samples)
# [1] 7 7 3 6 4 5

# Verify the mean approximates lambda
mean(samples)
# [1] 4.994

For plotting the theoretical distribution, you’ll primarily use dpois(). For simulating real-world scenarios or testing statistical methods, rpois() is your tool.

Creating a Basic Bar Plot

Start with base R’s barplot() function. It’s fast, requires no dependencies, and works well for quick exploration.

# Define the range of possible values and lambda
x <- 0:15
lambda <- 5

# Calculate probabilities for each value
probs <- dpois(x, lambda = lambda)

# Create the bar plot
barplot(
  probs,
  names.arg = x,
  main = paste("Poisson Distribution (λ =", lambda, ")"),
  xlab = "Number of Events (k)",
  ylab = "Probability P(X = k)",
  col = "steelblue",
  border = "white"
)

This produces a clean bar chart showing the probability of each count value. Notice how the distribution peaks around λ = 5 and tapers off on both sides, with a slight right skew.

A common mistake is using plot() with type = "l" for discrete distributions. Don’t do this. Lines imply that values between integers are meaningful—they’re not. You can’t have 3.5 customer arrivals. Bars or points are the correct choice.

Visualizing with ggplot2

For publication-quality graphics or more complex visualizations, ggplot2 is the standard. Here’s how to create a polished Poisson plot:

library(ggplot2)

# Create a data frame for plotting
lambda <- 5
df <- data.frame(
  k = 0:15,
  probability = dpois(0:15, lambda = lambda)
)

# Build the plot
ggplot(df, aes(x = k, y = probability)) +
  geom_col(fill = "#2563eb", width = 0.7) +
  scale_x_continuous(breaks = 0:15) +
  labs(
    title = paste("Poisson Distribution (λ =", lambda, ")"),
    x = "Number of Events (k)",
    y = "Probability P(X = k)"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_blank()
  )

Use geom_col() rather than geom_bar() when you already have the computed probabilities. The geom_bar() function expects raw data and computes counts itself—not what you want here.

The scale_x_continuous(breaks = 0:15) ensures every integer gets a tick mark. For discrete distributions, this clarity matters.

Comparing Multiple Distributions

The real insight comes from comparing distributions with different λ values. This shows how the shape evolves:

library(ggplot2)
library(dplyr)

# Create data for multiple lambda values
lambdas <- c(2, 5, 10)
x_range <- 0:20

df <- expand.grid(k = x_range, lambda = lambdas) %>%
  mutate(
    probability = dpois(k, lambda = lambda),
    lambda_label = paste("λ =", lambda)
  )

# Overlaid plot with transparency
ggplot(df, aes(x = k, y = probability, fill = lambda_label)) +
  geom_col(position = "identity", alpha = 0.6, width = 0.8) +
  scale_x_continuous(breaks = seq(0, 20, by = 2)) +
  scale_fill_manual(values = c("#ef4444", "#22c55e", "#3b82f6")) +
  labs(
    title = "Poisson Distribution Comparison",
    x = "Number of Events (k)",
    y = "Probability P(X = k)",
    fill = "Parameter"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    legend.position = "top",
    panel.grid.minor = element_blank()
  )

For cleaner separation, use faceting:

ggplot(df, aes(x = k, y = probability)) +
  geom_col(fill = "#2563eb", width = 0.7) +
  facet_wrap(~lambda_label, ncol = 1, scales = "free_y") +
  scale_x_continuous(breaks = seq(0, 20, by = 2)) +
  labs(
    title = "Poisson Distribution by Lambda",
    x = "Number of Events (k)",
    y = "Probability"
  ) +
  theme_minimal(base_size = 11)

Notice how λ = 2 is heavily right-skewed, λ = 5 is moderately skewed, and λ = 10 approaches symmetry. As λ increases, the Poisson distribution approximates a normal distribution—a useful property for large-count scenarios.

Adding the Cumulative Distribution

The cumulative distribution function (CDF) answers “what’s the probability of k or fewer events?” This is often more useful for practical decision-making than the PMF.

library(ggplot2)
library(patchwork)

lambda <- 5
df <- data.frame(
  k = 0:15,
  pmf = dpois(0:15, lambda = lambda),
  cdf = ppois(0:15, lambda = lambda)
)

# PMF plot
p1 <- ggplot(df, aes(x = k, y = pmf)) +
  geom_col(fill = "#2563eb", width = 0.7) +
  scale_x_continuous(breaks = 0:15) +
  labs(
    title = "Probability Mass Function",
    x = "k",
    y = "P(X = k)"
  ) +
  theme_minimal()

# CDF plot - use step function for accuracy
p2 <- ggplot(df, aes(x = k, y = cdf)) +
  geom_step(color = "#dc2626", linewidth = 1) +
  geom_point(color = "#dc2626", size = 2) +
  scale_x_continuous(breaks = 0:15) +
  scale_y_continuous(limits = c(0, 1)) +
  labs(
    title = "Cumulative Distribution Function",
    x = "k",
    y = "P(X ≤ k)"
  ) +
  theme_minimal()

# Combine with patchwork
p1 + p2 + plot_annotation(
  title = paste("Poisson Distribution (λ =", lambda, ")"),
  theme = theme(plot.title = element_text(hjust = 0.5, size = 14))
)

For the CDF, geom_step() is appropriate because the cumulative probability jumps at each integer value and remains constant between them. Adding points at each integer clarifies where the defined values are.

Practical Tips and Customization

Choosing your x-axis range: A good rule of thumb is to plot from 0 to λ + 3√λ. This captures the bulk of the probability mass without excessive whitespace.

lambda <- 10
x_max <- ceiling(lambda + 3 * sqrt(lambda))
# x_max = 20, which covers ~99.9% of the distribution

Color considerations: Use a single color for single distributions. For comparisons, choose a colorblind-friendly palette. The viridis package or manual selection of distinct hues works well.

Points versus bars: Bars are standard for PMFs, but lollipop charts (points with stems) can work when you have many values or want a cleaner look:

ggplot(df, aes(x = k, y = pmf)) +
  geom_segment(aes(xend = k, yend = 0), color = "gray60") +
  geom_point(color = "#2563eb", size = 3) +
  scale_x_continuous(breaks = 0:15) +
  theme_minimal()

Interactive plots: For dashboards or exploratory work, plotly converts ggplot2 objects with one line:

library(plotly)
ggplotly(p1)

Alternative packages: The lattice package offers barchart() with formula syntax, useful if you’re already in that ecosystem. For quick exploration, base R is fastest. For anything you’ll share or publish, invest the time in ggplot2.

The Poisson distribution is foundational for count data analysis. Master these visualization techniques, and you’ll communicate your statistical findings clearly—whether you’re presenting to stakeholders or debugging your own models.