How to Perform a One-Proportion Z-Test in R
The one-proportion z-test answers a simple but powerful question: does my observed proportion differ significantly from what I expected? You're comparing a single sample proportion against a known or...
Key Insights
- The one-proportion z-test determines whether a sample proportion significantly differs from a hypothesized population proportion—essential for quality control, A/B testing, and survey validation.
- R’s built-in
prop.test()function handles the heavy lifting, but understanding the manual calculation helps you interpret results correctly and troubleshoot edge cases. - Always verify your assumptions (random sampling, binary outcomes, sufficient sample size) before trusting your p-values—violated assumptions produce misleading conclusions.
Introduction
The one-proportion z-test answers a simple but powerful question: does my observed proportion differ significantly from what I expected? You’re comparing a single sample proportion against a known or hypothesized population proportion.
This test appears constantly in applied statistics. Quality engineers use it to determine if defect rates exceed acceptable thresholds. Marketing analysts test whether conversion rates match industry benchmarks. Researchers validate survey responses against known population demographics. If you’re working with binary outcomes and a reference proportion, this is your tool.
R makes this straightforward with prop.test(), but understanding the mechanics behind the function separates competent analysts from those who blindly trust output. Let’s build that understanding.
Assumptions and Requirements
Before running any hypothesis test, verify your assumptions. The one-proportion z-test requires four conditions:
Random sampling: Your observations must be randomly selected from the population of interest. Convenience samples or self-selected respondents violate this assumption and bias your results.
Binary outcome: Each observation falls into exactly one of two categories—success or failure, defective or non-defective, clicked or didn’t click. No middle ground.
Sufficient sample size: The normal approximation underlying the z-test requires both np ≥ 10 and n(1-p) ≥ 10, where n is your sample size and p is the hypothesized proportion. This ensures the sampling distribution is approximately normal.
Independence: Each observation must be independent of others. In practice, this typically means sampling without replacement from a population at least 10 times larger than your sample.
Violating these assumptions doesn’t necessarily invalidate your analysis, but it does require caution. Small samples may need exact binomial tests instead. Non-random sampling requires careful qualification of your conclusions. When in doubt, be conservative in your interpretations.
The Mathematical Foundation
The z-test statistic measures how many standard errors your sample proportion falls from the hypothesized proportion:
z = (p̂ - p₀) / √(p₀(1-p₀)/n)
Where:
p̂(p-hat) is your observed sample proportionp₀is the hypothesized population proportionnis your sample size- The denominator is the standard error under the null hypothesis
The null hypothesis states that the true population proportion equals the hypothesized value (H₀: p = p₀). The alternative hypothesis depends on your research question—it can be two-tailed (p ≠ p₀) or one-tailed (p > p₀ or p < p₀).
Here’s how to calculate this manually in R:
# Manual one-proportion z-test calculation
# Scenario: 58 successes out of 100 trials, testing against p = 0.50
n <- 100 # sample size
x <- 58 # number of successes
p_hat <- x / n # sample proportion (0.58)
p_0 <- 0.50 # hypothesized proportion
# Calculate z-statistic
standard_error <- sqrt(p_0 * (1 - p_0) / n)
z_stat <- (p_hat - p_0) / standard_error
# Calculate two-tailed p-value
p_value <- 2 * (1 - pnorm(abs(z_stat)))
# Display results
cat("Sample proportion:", p_hat, "\n")
cat("Z-statistic:", round(z_stat, 4), "\n")
cat("P-value (two-tailed):", round(p_value, 4), "\n")
Output:
Sample proportion: 0.58
Z-statistic: 1.6
P-value (two-tailed): 0.1096
This manual approach clarifies exactly what’s happening. The sample proportion of 0.58 is 1.6 standard errors above the hypothesized 0.50, yielding a p-value of approximately 0.11—not significant at the conventional α = 0.05 level.
Using prop.test() in Base R
While manual calculation builds understanding, prop.test() is your production tool. It’s built into base R, handles edge cases gracefully, and provides confidence intervals automatically.
# Basic prop.test() syntax
# Testing if defect rate differs from 5% acceptable threshold
defective <- 18 # defective items found
total_inspected <- 250 # total items inspected
result <- prop.test(
x = defective,
n = total_inspected,
p = 0.05, # hypothesized proportion (5%)
alternative = "two.sided",
correct = TRUE # Yates' continuity correction (default)
)
print(result)
Output:
1-sample proportions test with continuity correction
data: defective out of total_inspected, null probability 0.05
X-squared = 4.805, df = 1, p-value = 0.02838
alternative hypothesis: true p is not equal to 0.05
95 percent confidence interval:
0.04461498 0.11172217
sample estimates:
p
0.072
Key parameters explained:
x: Number of successes (or events of interest)n: Total number of trialsp: The null hypothesis proportion you’re testing againstalternative: Direction of test ("two.sided","greater", or"less")correct: Whether to apply Yates’ continuity correction (recommended for small samples)
Note that prop.test() reports a chi-squared statistic rather than a z-statistic. They’re mathematically equivalent—the chi-squared value equals z². In this case, √4.805 ≈ 2.19, which is the z-statistic.
The p-value of 0.028 indicates significant evidence that the true defect rate differs from 5%. With an observed rate of 7.2%, this manufacturer has a quality problem.
One-Tailed vs. Two-Tailed Tests
Your research question dictates which test to use. Two-tailed tests ask “is there a difference?” One-tailed tests ask “is it specifically higher?” or “is it specifically lower?”
Use one-tailed tests when:
- You have a directional hypothesis before seeing the data
- Only one direction of effect matters practically
- You want more statistical power in that specific direction
Use two-tailed tests when:
- You’re exploring whether any difference exists
- Deviations in either direction would be meaningful
- You’re unsure about the direction of effect
# Comparing all three alternative hypothesis options
# Scenario: Testing if customer satisfaction (72/100) exceeds 65% benchmark
successes <- 72
trials <- 100
benchmark <- 0.65
# Two-tailed: Is satisfaction different from 65%?
two_tailed <- prop.test(successes, trials, p = benchmark,
alternative = "two.sided")
# Greater: Is satisfaction higher than 65%?
greater <- prop.test(successes, trials, p = benchmark,
alternative = "greater")
# Less: Is satisfaction lower than 65%?
less <- prop.test(successes, trials, p = benchmark,
alternative = "less")
# Compare p-values
cat("Two-tailed p-value:", round(two_tailed$p.value, 4), "\n")
cat("Greater p-value:", round(greater$p.value, 4), "\n")
cat("Less p-value:", round(less$p.value, 4), "\n")
Output:
Two-tailed p-value: 0.1749
Greater p-value: 0.0875
Less p-value: 0.9335
Notice the relationship: the one-tailed p-value in the direction of the observed effect (greater) is exactly half the two-tailed p-value. The opposite direction yields a very large p-value because the data strongly contradicts that hypothesis.
Confidence Intervals and Effect Size
Statistical significance tells you whether an effect exists. Confidence intervals and effect sizes tell you whether it matters.
Extract the confidence interval from prop.test() output:
# Extracting and interpreting confidence intervals
result <- prop.test(x = 156, n = 400, p = 0.35)
# Access specific components
observed_prop <- result$estimate
conf_interval <- result$conf.int
conf_level <- attr(result$conf.int, "conf.level")
cat("Observed proportion:", observed_prop, "\n")
cat(conf_level * 100, "% CI: [",
round(conf_interval[1], 4), ", ",
round(conf_interval[2], 4), "]\n", sep = "")
# Cohen's h for effect size
# Measures the difference between two proportions on an arcsine scale
p_observed <- 156 / 400 # 0.39
p_null <- 0.35
cohens_h <- 2 * asin(sqrt(p_observed)) - 2 * asin(sqrt(p_null))
cat("Cohen's h:", round(cohens_h, 4), "\n")
cat("Effect size interpretation: ",
ifelse(abs(cohens_h) < 0.2, "small",
ifelse(abs(cohens_h) < 0.5, "small-to-medium",
ifelse(abs(cohens_h) < 0.8, "medium", "large"))), "\n")
Output:
Observed proportion: 0.39
95% CI: [0.3421, 0.4403]
Cohen's h: 0.0823
Effect size interpretation: small
The confidence interval (34.2% to 44.0%) contains the null value of 35%, consistent with the non-significant p-value. Cohen’s h of 0.08 indicates a trivially small effect—even if this were statistically significant, the practical difference is negligible.
Complete Worked Example
Let’s walk through a realistic analysis from start to finish. An e-commerce company claims their website converts 3.5% of visitors to customers. After a site redesign, you collect data on 2,000 visitors and observe 84 conversions. Did the redesign change the conversion rate?
# =============================================================
# One-Proportion Z-Test: Website Conversion Rate Analysis
# =============================================================
# --- Step 1: Define the problem and hypotheses ---
# H0: p = 0.035 (conversion rate equals historical 3.5%)
# Ha: p ≠ 0.035 (conversion rate has changed)
# Significance level: α = 0.05
conversions <- 84
visitors <- 2000
historical_rate <- 0.035
# --- Step 2: Check assumptions ---
# Random sampling: Visitors during test period (assumed representative)
# Binary outcome: Converted or didn't convert ✓
# Sample size check:
np <- visitors * historical_rate
nq <- visitors * (1 - historical_rate)
cat("Assumption check:\n")
cat(" np =", np, "(need ≥ 10) ✓\n")
cat(" n(1-p) =", nq, "(need ≥ 10) ✓\n\n")
# --- Step 3: Conduct the test ---
test_result <- prop.test(
x = conversions,
n = visitors,
p = historical_rate,
alternative = "two.sided",
correct = TRUE
)
print(test_result)
# --- Step 4: Extract and interpret results ---
cat("\n--- Results Summary ---\n")
cat("Observed conversion rate:",
round(test_result$estimate * 100, 2), "%\n")
cat("Historical benchmark:", historical_rate * 100, "%\n")
cat("Difference:",
round((test_result$estimate - historical_rate) * 100, 2),
"percentage points\n")
cat("95% CI:",
round(test_result$conf.int[1] * 100, 2), "% to",
round(test_result$conf.int[2] * 100, 2), "%\n")
cat("P-value:", round(test_result$p.value, 4), "\n")
# --- Step 5: Calculate effect size ---
cohens_h <- 2 * asin(sqrt(conversions/visitors)) -
2 * asin(sqrt(historical_rate))
cat("Cohen's h:", round(cohens_h, 3), "\n")
# --- Step 6: State conclusion ---
alpha <- 0.05
if (test_result$p.value < alpha) {
cat("\nConclusion: Reject H0. The conversion rate has significantly changed.\n")
} else {
cat("\nConclusion: Fail to reject H0. No significant change detected.\n")
}
Output:
Assumption check:
np = 70 (need ≥ 10) ✓
n(1-p) = 1930 (need ≥ 10) ✓
1-sample proportions test with continuity correction
data: conversions out of visitors, null probability 0.035
X-squared = 1.6498, df = 1, p-value = 0.199
alternative hypothesis: true p is not equal to 0.035
95 percent confidence interval:
0.03384498 0.05175498
sample estimates:
p
0.042
--- Results Summary ---
Observed conversion rate: 4.2 %
Historical benchmark: 3.5 %
Difference: 0.7 percentage points
95% CI: 3.38 % to 5.18 %
P-value: 0.199
Cohen's h: 0.037
Conclusion: Fail to reject H0. No significant change detected.
The observed conversion rate of 4.2% is higher than the historical 3.5%, but this difference isn’t statistically significant (p = 0.199). The 95% confidence interval includes the historical rate, and Cohen’s h of 0.037 indicates a trivial effect size. The redesign hasn’t demonstrably changed conversion rates—at least not with this sample size.
The one-proportion z-test is a foundational tool that you’ll use repeatedly. Master prop.test(), understand its assumptions, and always pair statistical significance with practical significance through confidence intervals and effect sizes.