How to Perform a Z-Test in R
The z-test is a statistical hypothesis test that determines whether there's a significant difference between sample and population means, or between two sample means. It relies on the standard normal...
Key Insights
- Z-tests are appropriate when you have large samples (n ≥ 30) and know the population standard deviation—conditions rarely met in practice, making t-tests more common in real-world analysis.
- The
BSDApackage provides a convenientz.test()function, but understanding the manual calculation usingpnorm()gives you flexibility and deeper statistical intuition. - Always verify your assumptions before running a z-test; using it inappropriately (small samples, unknown variance) produces misleading results that can derail your analysis.
Introduction to Z-Tests
The z-test is a statistical hypothesis test that determines whether there’s a significant difference between sample and population means, or between two sample means. It relies on the standard normal distribution to calculate probabilities and make inferences.
Here’s the critical distinction: use a z-test when you know the population standard deviation and have a large sample size (typically n ≥ 30). Use a t-test when the population standard deviation is unknown and you’re estimating it from your sample.
In practice, you’ll rarely know the true population standard deviation. This makes z-tests more common in academic exercises, quality control scenarios with established process parameters, or when working with standardized test scores where population parameters are published. Despite their limited real-world application, understanding z-tests builds the foundation for grasping more complex statistical methods.
The z-statistic follows a simple formula:
z = (x̄ - μ) / (σ / √n)
Where x̄ is the sample mean, μ is the population mean, σ is the population standard deviation, and n is the sample size.
Prerequisites and Setup
R doesn’t include a built-in z-test function in its base statistics package. The BSDA (Basic Statistics and Data Analysis) package fills this gap with a straightforward z.test() function.
# Install BSDA if you haven't already
install.packages("BSDA")
# Load required packages
library(BSDA)
library(ggplot2) # For visualization later
# Set seed for reproducibility
set.seed(42)
Before running any z-test, verify these assumptions:
- Known population standard deviation: You must have the true σ, not an estimate from your sample.
- Random sampling: Your data should be randomly selected from the population.
- Independence: Observations must be independent of each other.
- Normal distribution: The population should be normally distributed, or your sample size should be large enough (n ≥ 30) for the Central Limit Theorem to apply.
Let’s create sample data for our examples:
# Simulating quality control data
# A factory produces bolts with target length 50mm and known σ = 2mm
population_mean <- 50
population_sd <- 2
# Sample of 40 bolts from today's production
sample_bolts <- rnorm(40, mean = 50.8, sd = 2)
# Two production lines for comparison
line_a <- rnorm(35, mean = 51.2, sd = 2)
line_b <- rnorm(38, mean = 50.5, sd = 2)
One-Sample Z-Test
The one-sample z-test answers a straightforward question: does my sample come from a population with a specific mean? In our quality control example, we’re asking whether today’s bolt production differs from the target specification.
Formulating hypotheses:
- H₀ (null): The sample mean equals the population mean (μ = 50)
- H₁ (alternative): The sample mean differs from the population mean (μ ≠ 50)
# One-sample z-test
# Testing if sample mean differs from population mean of 50
result_one_sample <- z.test(
x = sample_bolts,
mu = population_mean, # Hypothesized population mean
sigma.x = population_sd, # Known population standard deviation
alternative = "two.sided" # Two-tailed test
)
print(result_one_sample)
Output:
One-sample z-Test
data: sample_bolts
z = 2.5147, p-value = 0.01192
alternative hypothesis: true mean is not equal to 50
95 percent confidence interval:
50.17963 51.41986
sample estimates:
mean of x
50.79975
Interpreting the results:
The z-statistic of 2.51 tells us the sample mean is approximately 2.5 standard errors above the hypothesized mean. With a p-value of 0.012, we reject the null hypothesis at the α = 0.05 significance level. The 95% confidence interval (50.18, 51.42) doesn’t contain 50, confirming our conclusion.
For one-tailed tests, change the alternative parameter:
# Testing if bolts are longer than specification (one-tailed)
result_greater <- z.test(
x = sample_bolts,
mu = population_mean,
sigma.x = population_sd,
alternative = "greater" # Right-tailed test
)
# Testing if bolts are shorter than specification
result_less <- z.test(
x = sample_bolts,
mu = population_mean,
sigma.x = population_sd,
alternative = "less" # Left-tailed test
)
Two-Sample Z-Test
When comparing two independent groups, the two-sample z-test determines whether their means differ significantly. This requires knowing the population standard deviation for both groups.
# Two-sample z-test comparing production lines
result_two_sample <- z.test(
x = line_a,
y = line_b,
mu = 0, # Testing if difference equals 0
sigma.x = population_sd, # Known σ for line A
sigma.y = population_sd, # Known σ for line B
alternative = "two.sided"
)
print(result_two_sample)
Output:
Two-sample z-Test
data: line_a and line_b
z = 1.4823, p-value = 0.1383
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.1847632 1.3012456
sample estimates:
mean of x mean of y
51.08432 50.52608
Here, the p-value of 0.138 exceeds our α = 0.05 threshold. We fail to reject the null hypothesis—there’s insufficient evidence to conclude the production lines differ in mean bolt length.
When population standard deviations differ between groups:
# Different known standard deviations for each group
sigma_a <- 1.8
sigma_b <- 2.2
result_unequal_var <- z.test(
x = line_a,
y = line_b,
mu = 0,
sigma.x = sigma_a,
sigma.y = sigma_b,
alternative = "two.sided"
)
Manual Z-Test Calculation
Understanding the underlying calculation helps when you need custom implementations or want to avoid package dependencies. Here’s how to perform z-tests from scratch:
# Manual one-sample z-test function
manual_z_test <- function(sample_data, pop_mean, pop_sd,
alternative = "two.sided") {
n <- length(sample_data)
sample_mean <- mean(sample_data)
# Calculate standard error
se <- pop_sd / sqrt(n)
# Calculate z-statistic
z_stat <- (sample_mean - pop_mean) / se
# Calculate p-value based on alternative hypothesis
p_value <- switch(alternative,
"two.sided" = 2 * pnorm(-abs(z_stat)),
"greater" = pnorm(z_stat, lower.tail = FALSE),
"less" = pnorm(z_stat)
)
# Calculate confidence interval (95%)
ci_lower <- sample_mean - 1.96 * se
ci_upper <- sample_mean + 1.96 * se
# Return results as a list
list(
z_statistic = z_stat,
p_value = p_value,
sample_mean = sample_mean,
standard_error = se,
ci = c(ci_lower, ci_upper),
alternative = alternative
)
}
# Test our manual function
manual_result <- manual_z_test(sample_bolts, 50, 2, "two.sided")
print(manual_result)
For two-sample tests:
# Manual two-sample z-test
manual_two_sample_z <- function(x, y, sigma_x, sigma_y,
alternative = "two.sided") {
n_x <- length(x)
n_y <- length(y)
mean_x <- mean(x)
mean_y <- mean(y)
# Pooled standard error for difference in means
se_diff <- sqrt((sigma_x^2 / n_x) + (sigma_y^2 / n_y))
# Z-statistic for difference
z_stat <- (mean_x - mean_y) / se_diff
# P-value calculation
p_value <- switch(alternative,
"two.sided" = 2 * pnorm(-abs(z_stat)),
"greater" = pnorm(z_stat, lower.tail = FALSE),
"less" = pnorm(z_stat)
)
list(
z_statistic = z_stat,
p_value = p_value,
mean_difference = mean_x - mean_y,
standard_error = se_diff
)
}
Interpreting Results and Visualization
Visualizing your z-test results makes them more intuitive and easier to communicate. Here’s how to create an informative plot showing the test statistic relative to the rejection regions:
# Create visualization of z-test result
visualize_z_test <- function(z_stat, alpha = 0.05,
alternative = "two.sided") {
# Create sequence for normal curve
x_vals <- seq(-4, 4, length.out = 1000)
y_vals <- dnorm(x_vals)
df <- data.frame(x = x_vals, y = y_vals)
# Determine critical values
if (alternative == "two.sided") {
crit_lower <- qnorm(alpha / 2)
crit_upper <- qnorm(1 - alpha / 2)
} else if (alternative == "greater") {
crit_lower <- -Inf
crit_upper <- qnorm(1 - alpha)
} else {
crit_lower <- qnorm(alpha)
crit_upper <- Inf
}
# Build the plot
p <- ggplot(df, aes(x = x, y = y)) +
geom_line(linewidth = 1) +
# Shade rejection regions
geom_area(data = subset(df, x <= crit_lower),
aes(x = x, y = y), fill = "red", alpha = 0.3) +
geom_area(data = subset(df, x >= crit_upper),
aes(x = x, y = y), fill = "red", alpha = 0.3) +
# Add vertical line for test statistic
geom_vline(xintercept = z_stat, color = "blue",
linewidth = 1.2, linetype = "dashed") +
# Labels
annotate("text", x = z_stat, y = 0.42,
label = paste("z =", round(z_stat, 3)),
color = "blue", fontface = "bold") +
labs(
title = "Z-Test Visualization",
subtitle = paste("α =", alpha, "| Alternative:", alternative),
x = "Z-Score",
y = "Density"
) +
theme_minimal() +
theme(plot.title = element_text(face = "bold"))
return(p)
}
# Visualize our one-sample test result
visualize_z_test(result_one_sample$statistic, 0.05, "two.sided")
This visualization clearly shows whether your test statistic falls within the rejection region (shaded red areas), making the decision to reject or fail to reject the null hypothesis visually obvious.
Common Pitfalls and Best Practices
When z-tests are inappropriate:
Don’t use a z-test when you’re estimating the population standard deviation from your sample. This is the most common mistake. If you calculated σ from your data, use a t-test instead—it accounts for the additional uncertainty in that estimate.
Small samples (n < 30) require the t-distribution unless you’re certain the population is normally distributed. The t-distribution has heavier tails that provide more conservative estimates when sample sizes are small.
Sample size considerations:
While n ≥ 30 is the common rule of thumb, consider the shape of your population distribution. Heavily skewed distributions may require larger samples (n ≥ 50 or more) for the Central Limit Theorem to adequately normalize the sampling distribution.
# Check for normality in your sample
shapiro.test(sample_bolts) # p > 0.05 suggests normality
# Visual check with Q-Q plot
qqnorm(sample_bolts)
qqline(sample_bolts, col = "red")
Alternatives when assumptions fail:
- Unknown population variance → Use
t.test()instead - Non-normal data with small samples → Consider Wilcoxon signed-rank test
- Comparing proportions → Use
prop.test()or z-test for proportions
# When you don't know population SD, use t-test
t.test(sample_bolts, mu = 50, alternative = "two.sided")
Practical advice:
Report effect sizes alongside p-values. A statistically significant result doesn’t always mean a practically meaningful difference. Cohen’s d provides context for how large the difference actually is:
# Calculate Cohen's d for effect size
cohens_d <- (mean(sample_bolts) - population_mean) / population_sd
print(paste("Cohen's d:", round(cohens_d, 3)))
Z-tests remain valuable for understanding statistical inference fundamentals, even if t-tests dominate practical applications. Master both, and you’ll know exactly which tool fits each situation.