R - Normal Distribution (dnorm, pnorm, qnorm, rnorm)

Key Insights

• R provides four core functions for working with normal distributions: dnorm() for probability density, pnorm() for cumulative probability, qnorm() for quantiles, and rnorm() for random generation • Understanding the relationship between these functions is essential for statistical analysis, hypothesis testing, and simulation work in R • All four functions accept mean and sd parameters, with defaults of 0 and 1 respectively for the standard normal distribution

Understanding the Normal Distribution Functions

R’s normal distribution functions follow a consistent naming pattern used across all probability distributions. The prefix indicates the function type: d for density, p for cumulative probability, q for quantile, and r for random generation. Each serves a distinct purpose in statistical computing.

The normal distribution is characterized by two parameters: mean (μ) and standard deviation (σ). The standard normal distribution uses μ=0 and σ=1, which R uses as defaults.

dnorm: Probability Density Function

The dnorm() function returns the height of the probability density curve at a given point. This value represents the relative likelihood of observing values near that point, though it’s not a probability itself.

# Basic usage - standard normal distribution
dnorm(0)  # Returns 0.3989423 (peak of the curve)
dnorm(1)  # Returns 0.2419707
dnorm(-1) # Returns 0.2419707 (symmetric)

# Custom mean and standard deviation
dnorm(100, mean = 100, sd = 15)  # Returns 0.02659615
dnorm(115, mean = 100, sd = 15)  # Returns 0.01752830

# Vectorized input
x <- seq(-4, 4, by = 0.1)
densities <- dnorm(x)

# Plotting the density curve
plot(x, densities, type = "l", 
     main = "Standard Normal Distribution",
     xlab = "x", ylab = "Density",
     lwd = 2, col = "blue")

The density function is crucial for visualizing distributions and understanding likelihood. When comparing different normal distributions:

x <- seq(-10, 20, by = 0.1)

# Three different normal distributions
d1 <- dnorm(x, mean = 0, sd = 2)
d2 <- dnorm(x, mean = 5, sd = 3)
d3 <- dnorm(x, mean = 10, sd = 1.5)

plot(x, d1, type = "l", col = "red", lwd = 2,
     ylim = c(0, max(c(d1, d2, d3))),
     main = "Comparing Normal Distributions",
     xlab = "x", ylab = "Density")
lines(x, d2, col = "blue", lwd = 2)
lines(x, d3, col = "green", lwd = 2)
legend("topright", 
       legend = c("N(0,2)", "N(5,3)", "N(10,1.5)"),
       col = c("red", "blue", "green"), lwd = 2)

pnorm: Cumulative Distribution Function

The pnorm() function calculates the probability that a random variable is less than or equal to a given value. This is the area under the density curve to the left of that point.

# Probability of getting a value <= 0 in standard normal
pnorm(0)  # Returns 0.5 (50%)

# Probability of getting a value <= 1
pnorm(1)  # Returns 0.8413447 (84.13%)

# Right tail probability using lower.tail parameter
pnorm(1.96, lower.tail = FALSE)  # Returns 0.0249979 (2.5%)
# Equivalent to:
1 - pnorm(1.96)  # Returns 0.0249979

# Real-world example: IQ scores (mean=100, sd=15)
# What percentage of people have IQ <= 115?
pnorm(115, mean = 100, sd = 15)  # Returns 0.8413447 (84.13%)

# What percentage have IQ > 130?
pnorm(130, mean = 100, sd = 15, lower.tail = FALSE)  # Returns 0.0227501 (2.28%)

For hypothesis testing, pnorm() calculates p-values:

# Z-test example: observed z-score of 2.5
z_score <- 2.5

# Two-tailed p-value
p_value_two_tailed <- 2 * pnorm(-abs(z_score))
cat("Two-tailed p-value:", p_value_two_tailed, "\n")  # 0.0124

# One-tailed p-value (upper tail)
p_value_one_tailed <- pnorm(z_score, lower.tail = FALSE)
cat("One-tailed p-value:", p_value_one_tailed, "\n")  # 0.0062

qnorm: Quantile Function (Inverse CDF)

The qnorm() function is the inverse of pnorm(). Given a probability, it returns the corresponding value from the distribution. This is essential for calculating confidence intervals and critical values.

# What value has 50% of the distribution below it?
qnorm(0.5)  # Returns 0 (the median)

# What value has 95% below it?
qnorm(0.95)  # Returns 1.644854

# Critical values for hypothesis testing
qnorm(0.975)  # Returns 1.959964 (two-tailed 5% significance)
qnorm(0.025)  # Returns -1.959964

# Confidence interval calculation
mean_val <- 100
se <- 5
confidence_level <- 0.95
alpha <- 1 - confidence_level

# 95% confidence interval
lower_bound <- mean_val + qnorm(alpha/2) * se
upper_bound <- mean_val + qnorm(1 - alpha/2) * se
cat("95% CI: [", lower_bound, ",", upper_bound, "]\n")
# Output: 95% CI: [ 90.20018 , 109.7998 ]

Practical application for percentile calculations:

# SAT scores: mean=1050, sd=200
# What score represents the 90th percentile?
sat_90th <- qnorm(0.90, mean = 1050, sd = 200)
cat("90th percentile SAT score:", round(sat_90th), "\n")  # 1306

# What scores bound the middle 95% of test-takers?
sat_lower <- qnorm(0.025, mean = 1050, sd = 200)
sat_upper <- qnorm(0.975, mean = 1050, sd = 200)
cat("Middle 95%: [", round(sat_lower), ",", round(sat_upper), "]\n")
# Output: Middle 95%: [ 658 , 1442 ]

rnorm: Random Number Generation

The rnorm() function generates random samples from a normal distribution. This is fundamental for simulations, Monte Carlo methods, and bootstrap procedures.

# Generate 10 random values from standard normal
set.seed(123)  # For reproducibility
random_values <- rnorm(10)
print(random_values)

# Generate 1000 values with custom parameters
set.seed(456)
sample_data <- rnorm(1000, mean = 50, sd = 10)

# Verify the sample statistics
cat("Sample mean:", mean(sample_data), "\n")      # ~50
cat("Sample SD:", sd(sample_data), "\n")          # ~10

Monte Carlo simulation example:

# Simulate portfolio returns
set.seed(789)
n_simulations <- 10000
annual_return <- 0.07
annual_sd <- 0.15

# Simulate one year of daily returns (252 trading days)
daily_return <- annual_return / 252
daily_sd <- annual_sd / sqrt(252)

portfolio_outcomes <- numeric(n_simulations)
initial_investment <- 10000

for(i in 1:n_simulations) {
  daily_returns <- rnorm(252, mean = daily_return, sd = daily_sd)
  final_value <- initial_investment * prod(1 + daily_returns)
  portfolio_outcomes[i] <- final_value
}

# Analyze results
cat("Mean final value:", mean(portfolio_outcomes), "\n")
cat("Median final value:", median(portfolio_outcomes), "\n")
cat("5th percentile:", quantile(portfolio_outcomes, 0.05), "\n")
cat("95th percentile:", quantile(portfolio_outcomes, 0.95), "\n")

# Probability of losing money
prob_loss <- mean(portfolio_outcomes < initial_investment)
cat("Probability of loss:", prob_loss, "\n")

Practical Integration: Complete Workflow

Combining all four functions in a quality control scenario:

# Manufacturing process: bolt diameter should be 10mm with sd=0.2mm
spec_mean <- 10
spec_sd <- 0.2
tolerance <- 0.5  # Acceptable range: 9.5 to 10.5mm

# 1. Calculate probability of out-of-spec parts (pnorm)
prob_too_small <- pnorm(9.5, mean = spec_mean, sd = spec_sd)
prob_too_large <- pnorm(10.5, mean = spec_mean, sd = spec_sd, 
                        lower.tail = FALSE)
prob_defective <- prob_too_small + prob_too_large
cat("Defect rate:", round(prob_defective * 100, 2), "%\n")

# 2. Find specification limits for 99% yield (qnorm)
lower_limit <- qnorm(0.005, mean = spec_mean, sd = spec_sd)
upper_limit <- qnorm(0.995, mean = spec_mean, sd = spec_sd)
cat("99% yield limits: [", round(lower_limit, 3), ",", 
    round(upper_limit, 3), "]\n")

# 3. Simulate production batch (rnorm)
set.seed(100)
batch_size <- 1000
batch <- rnorm(batch_size, mean = spec_mean, sd = spec_sd)

# 4. Analyze batch quality
defects <- sum(batch < 9.5 | batch > 10.5)
cat("Actual defects in batch:", defects, "\n")

# 5. Visualize (dnorm)
x <- seq(9, 11, by = 0.01)
density <- dnorm(x, mean = spec_mean, sd = spec_sd)

hist(batch, breaks = 30, probability = TRUE, 
     main = "Batch Distribution vs Specification",
     xlab = "Diameter (mm)", col = "lightblue")
lines(x, density, col = "red", lwd = 2)
abline(v = c(9.5, 10.5), col = "darkred", lty = 2, lwd = 2)
legend("topright", 
       legend = c("Theoretical", "Spec Limits"),
       col = c("red", "darkred"), lwd = 2, lty = c(1, 2))

These four functions form the foundation of normal distribution analysis in R. Master them, and you’ll handle most statistical computing tasks involving continuous data.