How to Plot the Normal Distribution in R
The normal distribution is the workhorse of statistics. Whether you're running hypothesis tests, building confidence intervals, or checking regression assumptions, you'll encounter this bell-shaped...
Key Insights
- Use
dnorm()withseq()for smooth theoretical bell curves, andrnorm()when you need random samples for histograms or simulations - Base R’s
curve()function offers the fastest path to a basic normal distribution plot, but ggplot2’sstat_function()provides superior customization for publication-ready graphics - Shading areas under the curve transforms abstract probability concepts into visual intuition—master
polygon()in base R orgeom_ribbon()in ggplot2 for this essential technique
Introduction to the Normal Distribution
The normal distribution is the workhorse of statistics. Whether you’re running hypothesis tests, building confidence intervals, or checking regression assumptions, you’ll encounter this bell-shaped curve constantly. Understanding how to visualize it in R isn’t just an academic exercise—it’s a practical skill you’ll use repeatedly.
The normal distribution is defined by two parameters: the mean (μ), which determines where the curve centers, and the standard deviation (σ), which controls its spread. A distribution with mean 0 and standard deviation 1 is called the standard normal distribution, and it serves as the reference point for most statistical tables and functions.
Visualization matters because it transforms abstract probability density into something tangible. When you can see that 95% of values fall within roughly two standard deviations of the mean, the concept sticks. Let’s build that intuition with code.
Generating Normal Distribution Data in R
R provides two essential functions for working with normal distributions: rnorm() generates random samples, while dnorm() calculates the probability density at specific points.
Use rnorm() when you need simulated data:
# Generate 1000 random values from a normal distribution
set.seed(42) # For reproducibility
random_samples <- rnorm(n = 1000, mean = 0, sd = 1)
# Check the results
summary(random_samples)
The output will show values centered around 0, with most falling between -3 and 3.
For plotting smooth density curves, combine seq() with dnorm():
# Create a sequence of x values
x_values <- seq(from = -4, to = 4, by = 0.01)
# Calculate the density at each point
y_values <- dnorm(x_values, mean = 0, sd = 1)
# Preview the data
head(data.frame(x = x_values, density = y_values))
This approach gives you precise control over the curve’s smoothness. The by = 0.01 argument creates 800 points, which is plenty for a smooth visual.
Basic Plotting with Base R
Base R graphics get the job done quickly. Start with a histogram of random data:
set.seed(42)
samples <- rnorm(1000, mean = 50, sd = 10)
hist(samples,
breaks = 30,
probability = TRUE, # Important: use density, not frequency
main = "Histogram of Normal Data",
xlab = "Value",
col = "lightblue",
border = "white")
Setting probability = TRUE scales the y-axis to density, which allows you to overlay a theoretical curve.
For a clean bell curve without histogram bars, use plot() directly:
x <- seq(-4, 4, by = 0.01)
y <- dnorm(x, mean = 0, sd = 1)
plot(x, y,
type = "l", # Line plot
lwd = 2, # Line width
col = "darkblue",
main = "Standard Normal Distribution",
xlab = "z",
ylab = "Density")
Even simpler, the curve() function handles the sequence generation automatically:
curve(dnorm(x, mean = 0, sd = 1),
from = -4, to = 4,
lwd = 2,
col = "darkblue",
main = "Standard Normal Distribution",
xlab = "z",
ylab = "Density")
Now combine both approaches—overlay the theoretical curve on your histogram:
set.seed(42)
samples <- rnorm(1000, mean = 50, sd = 10)
hist(samples,
breaks = 30,
probability = TRUE,
main = "Sample Data with Theoretical Curve",
xlab = "Value",
col = "lightblue",
border = "white")
# Add the theoretical curve
x_curve <- seq(min(samples), max(samples), length.out = 100)
lines(x_curve, dnorm(x_curve, mean = 50, sd = 10),
col = "red", lwd = 2)
This comparison between empirical data and theoretical distribution is invaluable for checking normality assumptions.
Enhanced Visualization with ggplot2
Base R works, but ggplot2 produces cleaner graphics with less fiddling. Install it if you haven’t: install.packages("ggplot2").
Create a histogram with density overlay:
library(ggplot2)
set.seed(42)
df <- data.frame(value = rnorm(1000, mean = 0, sd = 1))
ggplot(df, aes(x = value)) +
geom_histogram(aes(y = after_stat(density)),
bins = 30,
fill = "steelblue",
color = "white",
alpha = 0.7) +
geom_density(color = "red", linewidth = 1) +
labs(title = "Normal Distribution with Density Overlay",
x = "Value",
y = "Density") +
theme_minimal()
The after_stat(density) transformation ensures the histogram uses density scaling.
For a pure theoretical curve without sample data, use stat_function():
ggplot(data.frame(x = c(-4, 4)), aes(x = x)) +
stat_function(fun = dnorm,
args = list(mean = 0, sd = 1),
linewidth = 1.2,
color = "darkblue") +
labs(title = "Standard Normal Distribution",
x = "z",
y = "Density") +
theme_classic()
This approach is cleaner because you don’t need to pre-generate data—ggplot2 handles it internally.
Visualizing Different Parameters
Comparing distributions with different parameters builds intuition about what mean and standard deviation actually do.
Plot multiple curves with different means:
ggplot(data.frame(x = c(-6, 10)), aes(x = x)) +
stat_function(fun = dnorm, args = list(mean = 0, sd = 1),
aes(color = "μ = 0"), linewidth = 1) +
stat_function(fun = dnorm, args = list(mean = 2, sd = 1),
aes(color = "μ = 2"), linewidth = 1) +
stat_function(fun = dnorm, args = list(mean = 4, sd = 1),
aes(color = "μ = 4"), linewidth = 1) +
scale_color_manual(name = "Mean",
values = c("μ = 0" = "blue",
"μ = 2" = "green",
"μ = 4" = "red")) +
labs(title = "Effect of Mean on Normal Distribution",
x = "Value",
y = "Density") +
theme_minimal()
The curves shift horizontally but maintain identical shapes.
Now compare different standard deviations:
ggplot(data.frame(x = c(-8, 8)), aes(x = x)) +
stat_function(fun = dnorm, args = list(mean = 0, sd = 0.5),
aes(color = "σ = 0.5"), linewidth = 1) +
stat_function(fun = dnorm, args = list(mean = 0, sd = 1),
aes(color = "σ = 1"), linewidth = 1) +
stat_function(fun = dnorm, args = list(mean = 0, sd = 2),
aes(color = "σ = 2"), linewidth = 1) +
scale_color_manual(name = "Std Dev",
values = c("σ = 0.5" = "blue",
"σ = 1" = "green",
"σ = 2" = "red")) +
labs(title = "Effect of Standard Deviation on Normal Distribution",
x = "Value",
y = "Density") +
theme_minimal()
Larger standard deviations produce wider, flatter curves. The total area under each curve remains 1.
Shading Areas Under the Curve
Shading regions under the curve visualizes probabilities. This is essential for understanding p-values, confidence intervals, and hypothesis testing.
In base R, use polygon():
x <- seq(-4, 4, by = 0.01)
y <- dnorm(x)
plot(x, y, type = "l", lwd = 2,
main = "Shaded Area: P(-1 < Z < 1)",
xlab = "z", ylab = "Density")
# Define the region to shade
x_shade <- seq(-1, 1, by = 0.01)
y_shade <- dnorm(x_shade)
# Create polygon coordinates
polygon(c(-1, x_shade, 1),
c(0, y_shade, 0),
col = rgb(0, 0, 1, 0.3),
border = NA)
# Add text annotation
text(0, 0.15, paste("Area =", round(pnorm(1) - pnorm(-1), 3)))
For tail probabilities in ggplot2, use geom_area() with filtered data:
# Create data for the full curve and shaded region
df_curve <- data.frame(x = seq(-4, 4, by = 0.01))
df_curve$y <- dnorm(df_curve$x)
# Upper tail: P(Z > 1.96)
df_tail <- df_curve[df_curve$x >= 1.96, ]
ggplot() +
geom_line(data = df_curve, aes(x = x, y = y),
linewidth = 1, color = "darkblue") +
geom_area(data = df_tail, aes(x = x, y = y),
fill = "red", alpha = 0.5) +
annotate("text", x = 2.5, y = 0.05,
label = paste("P(Z > 1.96) =", round(1 - pnorm(1.96), 4))) +
labs(title = "Upper Tail Probability",
x = "z",
y = "Density") +
theme_minimal()
For more complex shading with stat_function(), combine it with geom_ribbon():
ggplot(data.frame(x = c(-4, 4)), aes(x = x)) +
stat_function(fun = dnorm, linewidth = 1) +
stat_function(fun = dnorm,
xlim = c(-1.96, 1.96),
geom = "area",
fill = "steelblue",
alpha = 0.5) +
annotate("text", x = 0, y = 0.15,
label = "95% of data", fontface = "bold") +
labs(title = "95% Confidence Region",
x = "z",
y = "Density") +
theme_classic()
Conclusion
You now have a complete toolkit for visualizing normal distributions in R. Use base R’s curve() for quick exploratory plots and ggplot2’s stat_function() for polished, publication-ready graphics.
The key functions to remember: dnorm() gives you density values for plotting curves, rnorm() generates random samples for histograms, and pnorm() calculates cumulative probabilities for shading regions.
Start with the simplest approach that meets your needs, then add complexity only when necessary. A basic curve(dnorm(x), -4, 4) often communicates just as effectively as a heavily styled ggplot—choose based on your audience and purpose.