R - Normal Distribution (dnorm, pnorm, qnorm, rnorm)
• R provides four core functions for working with normal distributions: `dnorm()` for probability density, `pnorm()` for cumulative probability, `qnorm()` for quantiles, and `rnorm()` for random...
Key Insights
• R provides four core functions for working with normal distributions: dnorm() for probability density, pnorm() for cumulative probability, qnorm() for quantiles, and rnorm() for random generation
• Understanding the relationship between these functions is essential for statistical analysis, hypothesis testing, and simulation work in R
• All four functions accept mean and sd parameters, with defaults of 0 and 1 respectively for the standard normal distribution
Understanding the Normal Distribution Functions
R’s normal distribution functions follow a consistent naming pattern used across all probability distributions. The prefix indicates the function type: d for density, p for cumulative probability, q for quantile, and r for random generation. Each serves a distinct purpose in statistical computing.
The normal distribution is characterized by two parameters: mean (μ) and standard deviation (σ). The standard normal distribution uses μ=0 and σ=1, which R uses as defaults.
dnorm: Probability Density Function
The dnorm() function returns the height of the probability density curve at a given point. This value represents the relative likelihood of observing values near that point, though it’s not a probability itself.
# Basic usage - standard normal distribution
dnorm(0) # Returns 0.3989423 (peak of the curve)
dnorm(1) # Returns 0.2419707
dnorm(-1) # Returns 0.2419707 (symmetric)
# Custom mean and standard deviation
dnorm(100, mean = 100, sd = 15) # Returns 0.02659615
dnorm(115, mean = 100, sd = 15) # Returns 0.01752830
# Vectorized input
x <- seq(-4, 4, by = 0.1)
densities <- dnorm(x)
# Plotting the density curve
plot(x, densities, type = "l",
main = "Standard Normal Distribution",
xlab = "x", ylab = "Density",
lwd = 2, col = "blue")
The density function is crucial for visualizing distributions and understanding likelihood. When comparing different normal distributions:
x <- seq(-10, 20, by = 0.1)
# Three different normal distributions
d1 <- dnorm(x, mean = 0, sd = 2)
d2 <- dnorm(x, mean = 5, sd = 3)
d3 <- dnorm(x, mean = 10, sd = 1.5)
plot(x, d1, type = "l", col = "red", lwd = 2,
ylim = c(0, max(c(d1, d2, d3))),
main = "Comparing Normal Distributions",
xlab = "x", ylab = "Density")
lines(x, d2, col = "blue", lwd = 2)
lines(x, d3, col = "green", lwd = 2)
legend("topright",
legend = c("N(0,2)", "N(5,3)", "N(10,1.5)"),
col = c("red", "blue", "green"), lwd = 2)
pnorm: Cumulative Distribution Function
The pnorm() function calculates the probability that a random variable is less than or equal to a given value. This is the area under the density curve to the left of that point.
# Probability of getting a value <= 0 in standard normal
pnorm(0) # Returns 0.5 (50%)
# Probability of getting a value <= 1
pnorm(1) # Returns 0.8413447 (84.13%)
# Right tail probability using lower.tail parameter
pnorm(1.96, lower.tail = FALSE) # Returns 0.0249979 (2.5%)
# Equivalent to:
1 - pnorm(1.96) # Returns 0.0249979
# Real-world example: IQ scores (mean=100, sd=15)
# What percentage of people have IQ <= 115?
pnorm(115, mean = 100, sd = 15) # Returns 0.8413447 (84.13%)
# What percentage have IQ > 130?
pnorm(130, mean = 100, sd = 15, lower.tail = FALSE) # Returns 0.0227501 (2.28%)
For hypothesis testing, pnorm() calculates p-values:
# Z-test example: observed z-score of 2.5
z_score <- 2.5
# Two-tailed p-value
p_value_two_tailed <- 2 * pnorm(-abs(z_score))
cat("Two-tailed p-value:", p_value_two_tailed, "\n") # 0.0124
# One-tailed p-value (upper tail)
p_value_one_tailed <- pnorm(z_score, lower.tail = FALSE)
cat("One-tailed p-value:", p_value_one_tailed, "\n") # 0.0062
qnorm: Quantile Function (Inverse CDF)
The qnorm() function is the inverse of pnorm(). Given a probability, it returns the corresponding value from the distribution. This is essential for calculating confidence intervals and critical values.
# What value has 50% of the distribution below it?
qnorm(0.5) # Returns 0 (the median)
# What value has 95% below it?
qnorm(0.95) # Returns 1.644854
# Critical values for hypothesis testing
qnorm(0.975) # Returns 1.959964 (two-tailed 5% significance)
qnorm(0.025) # Returns -1.959964
# Confidence interval calculation
mean_val <- 100
se <- 5
confidence_level <- 0.95
alpha <- 1 - confidence_level
# 95% confidence interval
lower_bound <- mean_val + qnorm(alpha/2) * se
upper_bound <- mean_val + qnorm(1 - alpha/2) * se
cat("95% CI: [", lower_bound, ",", upper_bound, "]\n")
# Output: 95% CI: [ 90.20018 , 109.7998 ]
Practical application for percentile calculations:
# SAT scores: mean=1050, sd=200
# What score represents the 90th percentile?
sat_90th <- qnorm(0.90, mean = 1050, sd = 200)
cat("90th percentile SAT score:", round(sat_90th), "\n") # 1306
# What scores bound the middle 95% of test-takers?
sat_lower <- qnorm(0.025, mean = 1050, sd = 200)
sat_upper <- qnorm(0.975, mean = 1050, sd = 200)
cat("Middle 95%: [", round(sat_lower), ",", round(sat_upper), "]\n")
# Output: Middle 95%: [ 658 , 1442 ]
rnorm: Random Number Generation
The rnorm() function generates random samples from a normal distribution. This is fundamental for simulations, Monte Carlo methods, and bootstrap procedures.
# Generate 10 random values from standard normal
set.seed(123) # For reproducibility
random_values <- rnorm(10)
print(random_values)
# Generate 1000 values with custom parameters
set.seed(456)
sample_data <- rnorm(1000, mean = 50, sd = 10)
# Verify the sample statistics
cat("Sample mean:", mean(sample_data), "\n") # ~50
cat("Sample SD:", sd(sample_data), "\n") # ~10
Monte Carlo simulation example:
# Simulate portfolio returns
set.seed(789)
n_simulations <- 10000
annual_return <- 0.07
annual_sd <- 0.15
# Simulate one year of daily returns (252 trading days)
daily_return <- annual_return / 252
daily_sd <- annual_sd / sqrt(252)
portfolio_outcomes <- numeric(n_simulations)
initial_investment <- 10000
for(i in 1:n_simulations) {
daily_returns <- rnorm(252, mean = daily_return, sd = daily_sd)
final_value <- initial_investment * prod(1 + daily_returns)
portfolio_outcomes[i] <- final_value
}
# Analyze results
cat("Mean final value:", mean(portfolio_outcomes), "\n")
cat("Median final value:", median(portfolio_outcomes), "\n")
cat("5th percentile:", quantile(portfolio_outcomes, 0.05), "\n")
cat("95th percentile:", quantile(portfolio_outcomes, 0.95), "\n")
# Probability of losing money
prob_loss <- mean(portfolio_outcomes < initial_investment)
cat("Probability of loss:", prob_loss, "\n")
Practical Integration: Complete Workflow
Combining all four functions in a quality control scenario:
# Manufacturing process: bolt diameter should be 10mm with sd=0.2mm
spec_mean <- 10
spec_sd <- 0.2
tolerance <- 0.5 # Acceptable range: 9.5 to 10.5mm
# 1. Calculate probability of out-of-spec parts (pnorm)
prob_too_small <- pnorm(9.5, mean = spec_mean, sd = spec_sd)
prob_too_large <- pnorm(10.5, mean = spec_mean, sd = spec_sd,
lower.tail = FALSE)
prob_defective <- prob_too_small + prob_too_large
cat("Defect rate:", round(prob_defective * 100, 2), "%\n")
# 2. Find specification limits for 99% yield (qnorm)
lower_limit <- qnorm(0.005, mean = spec_mean, sd = spec_sd)
upper_limit <- qnorm(0.995, mean = spec_mean, sd = spec_sd)
cat("99% yield limits: [", round(lower_limit, 3), ",",
round(upper_limit, 3), "]\n")
# 3. Simulate production batch (rnorm)
set.seed(100)
batch_size <- 1000
batch <- rnorm(batch_size, mean = spec_mean, sd = spec_sd)
# 4. Analyze batch quality
defects <- sum(batch < 9.5 | batch > 10.5)
cat("Actual defects in batch:", defects, "\n")
# 5. Visualize (dnorm)
x <- seq(9, 11, by = 0.01)
density <- dnorm(x, mean = spec_mean, sd = spec_sd)
hist(batch, breaks = 30, probability = TRUE,
main = "Batch Distribution vs Specification",
xlab = "Diameter (mm)", col = "lightblue")
lines(x, density, col = "red", lwd = 2)
abline(v = c(9.5, 10.5), col = "darkred", lty = 2, lwd = 2)
legend("topright",
legend = c("Theoretical", "Spec Limits"),
col = c("red", "darkred"), lwd = 2, lty = c(1, 2))
These four functions form the foundation of normal distribution analysis in R. Master them, and you’ll handle most statistical computing tasks involving continuous data.