How to Perform a Z-Test in R

Key Insights

Z-tests are appropriate when you have large samples (n ≥ 30) and know the population standard deviation—conditions rarely met in practice, making t-tests more common in real-world analysis.
The BSDA package provides a convenient z.test() function, but understanding the manual calculation using pnorm() gives you flexibility and deeper statistical intuition.
Always verify your assumptions before running a z-test; using it inappropriately (small samples, unknown variance) produces misleading results that can derail your analysis.

Introduction to Z-Tests

The z-test is a statistical hypothesis test that determines whether there’s a significant difference between sample and population means, or between two sample means. It relies on the standard normal distribution to calculate probabilities and make inferences.

Here’s the critical distinction: use a z-test when you know the population standard deviation and have a large sample size (typically n ≥ 30). Use a t-test when the population standard deviation is unknown and you’re estimating it from your sample.

In practice, you’ll rarely know the true population standard deviation. This makes z-tests more common in academic exercises, quality control scenarios with established process parameters, or when working with standardized test scores where population parameters are published. Despite their limited real-world application, understanding z-tests builds the foundation for grasping more complex statistical methods.

The z-statistic follows a simple formula:

z = (x̄ - μ) / (σ / √n)

Where x̄ is the sample mean, μ is the population mean, σ is the population standard deviation, and n is the sample size.

Prerequisites and Setup

R doesn’t include a built-in z-test function in its base statistics package. The BSDA (Basic Statistics and Data Analysis) package fills this gap with a straightforward z.test() function.

# Install BSDA if you haven't already
install.packages("BSDA")

# Load required packages
library(BSDA)
library(ggplot2)  # For visualization later

# Set seed for reproducibility
set.seed(42)

Before running any z-test, verify these assumptions:

Known population standard deviation: You must have the true σ, not an estimate from your sample.
Random sampling: Your data should be randomly selected from the population.
Independence: Observations must be independent of each other.
Normal distribution: The population should be normally distributed, or your sample size should be large enough (n ≥ 30) for the Central Limit Theorem to apply.

Let’s create sample data for our examples:

# Simulating quality control data
# A factory produces bolts with target length 50mm and known σ = 2mm
population_mean <- 50
population_sd <- 2

# Sample of 40 bolts from today's production
sample_bolts <- rnorm(40, mean = 50.8, sd = 2)

# Two production lines for comparison
line_a <- rnorm(35, mean = 51.2, sd = 2)
line_b <- rnorm(38, mean = 50.5, sd = 2)

One-Sample Z-Test

The one-sample z-test answers a straightforward question: does my sample come from a population with a specific mean? In our quality control example, we’re asking whether today’s bolt production differs from the target specification.

Formulating hypotheses:

H₀ (null): The sample mean equals the population mean (μ = 50)
H₁ (alternative): The sample mean differs from the population mean (μ ≠ 50)

# One-sample z-test
# Testing if sample mean differs from population mean of 50
result_one_sample <- z.test(
  x = sample_bolts,
  mu = population_mean,      # Hypothesized population mean
  sigma.x = population_sd,   # Known population standard deviation
  alternative = "two.sided"  # Two-tailed test
)

print(result_one_sample)

Output:

	One-sample z-Test

data:  sample_bolts
z = 2.5147, p-value = 0.01192
alternative hypothesis: true mean is not equal to 50
95 percent confidence interval:
 50.17963 51.41986
sample estimates:
mean of x 
 50.79975

Interpreting the results:

The z-statistic of 2.51 tells us the sample mean is approximately 2.5 standard errors above the hypothesized mean. With a p-value of 0.012, we reject the null hypothesis at the α = 0.05 significance level. The 95% confidence interval (50.18, 51.42) doesn’t contain 50, confirming our conclusion.

For one-tailed tests, change the alternative parameter:

# Testing if bolts are longer than specification (one-tailed)
result_greater <- z.test(
  x = sample_bolts,
  mu = population_mean,
  sigma.x = population_sd,
  alternative = "greater"  # Right-tailed test
)

# Testing if bolts are shorter than specification
result_less <- z.test(
  x = sample_bolts,
  mu = population_mean,
  sigma.x = population_sd,
  alternative = "less"     # Left-tailed test
)

Two-Sample Z-Test

When comparing two independent groups, the two-sample z-test determines whether their means differ significantly. This requires knowing the population standard deviation for both groups.

# Two-sample z-test comparing production lines
result_two_sample <- z.test(
  x = line_a,
  y = line_b,
  mu = 0,                    # Testing if difference equals 0
  sigma.x = population_sd,   # Known σ for line A
  sigma.y = population_sd,   # Known σ for line B
  alternative = "two.sided"
)

print(result_two_sample)

Output:

	Two-sample z-Test

data:  line_a and line_b
z = 1.4823, p-value = 0.1383
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.1847632  1.3012456
sample estimates:
mean of x mean of y 
 51.08432  50.52608

Here, the p-value of 0.138 exceeds our α = 0.05 threshold. We fail to reject the null hypothesis—there’s insufficient evidence to conclude the production lines differ in mean bolt length.

When population standard deviations differ between groups:

# Different known standard deviations for each group
sigma_a <- 1.8
sigma_b <- 2.2

result_unequal_var <- z.test(
  x = line_a,
  y = line_b,
  mu = 0,
  sigma.x = sigma_a,
  sigma.y = sigma_b,
  alternative = "two.sided"
)

Manual Z-Test Calculation

Understanding the underlying calculation helps when you need custom implementations or want to avoid package dependencies. Here’s how to perform z-tests from scratch:

# Manual one-sample z-test function
manual_z_test <- function(sample_data, pop_mean, pop_sd, 
                          alternative = "two.sided") {
  n <- length(sample_data)
  sample_mean <- mean(sample_data)
  
  # Calculate standard error
  se <- pop_sd / sqrt(n)
  
  # Calculate z-statistic
  z_stat <- (sample_mean - pop_mean) / se
  
  # Calculate p-value based on alternative hypothesis
  p_value <- switch(alternative,
    "two.sided" = 2 * pnorm(-abs(z_stat)),
    "greater" = pnorm(z_stat, lower.tail = FALSE),
    "less" = pnorm(z_stat)
  )
  
  # Calculate confidence interval (95%)
  ci_lower <- sample_mean - 1.96 * se
  ci_upper <- sample_mean + 1.96 * se
  
  # Return results as a list
  list(
    z_statistic = z_stat,
    p_value = p_value,
    sample_mean = sample_mean,
    standard_error = se,
    ci = c(ci_lower, ci_upper),
    alternative = alternative
  )
}

# Test our manual function
manual_result <- manual_z_test(sample_bolts, 50, 2, "two.sided")
print(manual_result)

For two-sample tests:

# Manual two-sample z-test
manual_two_sample_z <- function(x, y, sigma_x, sigma_y, 
                                 alternative = "two.sided") {
  n_x <- length(x)
  n_y <- length(y)
  mean_x <- mean(x)
  mean_y <- mean(y)
  
  # Pooled standard error for difference in means
  se_diff <- sqrt((sigma_x^2 / n_x) + (sigma_y^2 / n_y))
  
  # Z-statistic for difference
  z_stat <- (mean_x - mean_y) / se_diff
  
  # P-value calculation
  p_value <- switch(alternative,
    "two.sided" = 2 * pnorm(-abs(z_stat)),
    "greater" = pnorm(z_stat, lower.tail = FALSE),
    "less" = pnorm(z_stat)
  )
  
  list(
    z_statistic = z_stat,
    p_value = p_value,
    mean_difference = mean_x - mean_y,
    standard_error = se_diff
  )
}

Interpreting Results and Visualization

Visualizing your z-test results makes them more intuitive and easier to communicate. Here’s how to create an informative plot showing the test statistic relative to the rejection regions:

# Create visualization of z-test result
visualize_z_test <- function(z_stat, alpha = 0.05, 
                              alternative = "two.sided") {
  # Create sequence for normal curve
  x_vals <- seq(-4, 4, length.out = 1000)
  y_vals <- dnorm(x_vals)
  
  df <- data.frame(x = x_vals, y = y_vals)
  
  # Determine critical values
  if (alternative == "two.sided") {
    crit_lower <- qnorm(alpha / 2)
    crit_upper <- qnorm(1 - alpha / 2)
  } else if (alternative == "greater") {
    crit_lower <- -Inf
    crit_upper <- qnorm(1 - alpha)
  } else {
    crit_lower <- qnorm(alpha)
    crit_upper <- Inf
  }
  
  # Build the plot
  p <- ggplot(df, aes(x = x, y = y)) +
    geom_line(linewidth = 1) +
    # Shade rejection regions
    geom_area(data = subset(df, x <= crit_lower),
              aes(x = x, y = y), fill = "red", alpha = 0.3) +
    geom_area(data = subset(df, x >= crit_upper),
              aes(x = x, y = y), fill = "red", alpha = 0.3) +
    # Add vertical line for test statistic
    geom_vline(xintercept = z_stat, color = "blue", 
               linewidth = 1.2, linetype = "dashed") +
    # Labels
    annotate("text", x = z_stat, y = 0.42, 
             label = paste("z =", round(z_stat, 3)),
             color = "blue", fontface = "bold") +
    labs(
      title = "Z-Test Visualization",
      subtitle = paste("α =", alpha, "| Alternative:", alternative),
      x = "Z-Score",
      y = "Density"
    ) +
    theme_minimal() +
    theme(plot.title = element_text(face = "bold"))
  
  return(p)
}

# Visualize our one-sample test result
visualize_z_test(result_one_sample$statistic, 0.05, "two.sided")

This visualization clearly shows whether your test statistic falls within the rejection region (shaded red areas), making the decision to reject or fail to reject the null hypothesis visually obvious.

Common Pitfalls and Best Practices

When z-tests are inappropriate:

Don’t use a z-test when you’re estimating the population standard deviation from your sample. This is the most common mistake. If you calculated σ from your data, use a t-test instead—it accounts for the additional uncertainty in that estimate.

Small samples (n < 30) require the t-distribution unless you’re certain the population is normally distributed. The t-distribution has heavier tails that provide more conservative estimates when sample sizes are small.

Sample size considerations:

While n ≥ 30 is the common rule of thumb, consider the shape of your population distribution. Heavily skewed distributions may require larger samples (n ≥ 50 or more) for the Central Limit Theorem to adequately normalize the sampling distribution.

# Check for normality in your sample
shapiro.test(sample_bolts)  # p > 0.05 suggests normality

# Visual check with Q-Q plot
qqnorm(sample_bolts)
qqline(sample_bolts, col = "red")

Alternatives when assumptions fail:

Unknown population variance → Use t.test() instead
Non-normal data with small samples → Consider Wilcoxon signed-rank test
Comparing proportions → Use prop.test() or z-test for proportions

# When you don't know population SD, use t-test
t.test(sample_bolts, mu = 50, alternative = "two.sided")

Practical advice:

Report effect sizes alongside p-values. A statistically significant result doesn’t always mean a practically meaningful difference. Cohen’s d provides context for how large the difference actually is:

# Calculate Cohen's d for effect size
cohens_d <- (mean(sample_bolts) - population_mean) / population_sd
print(paste("Cohen's d:", round(cohens_d, 3)))

Z-tests remain valuable for understanding statistical inference fundamentals, even if t-tests dominate practical applications. Master both, and you’ll know exactly which tool fits each situation.