How to Perform a One-Proportion Z-Test in Python
The one-proportion z-test answers a simple question: does my observed proportion differ significantly from an expected value? You're not comparing two groups—you're comparing one sample against a...
Key Insights
- The one-proportion z-test determines whether a sample proportion significantly differs from a hypothesized population proportion—essential for A/B testing, quality control, and survey analysis
- Use
statsmodels.stats.proportion.proportions_ztest()for production code, but understanding the manual calculation helps you debug issues and explain results to stakeholders - The test requires large samples (np ≥ 10 and n(1-p) ≥ 10); for smaller samples, use the exact binomial test instead
Introduction to One-Proportion Z-Tests
The one-proportion z-test answers a simple question: does my observed proportion differ significantly from an expected value? You’re not comparing two groups—you’re comparing one sample against a known or hypothesized standard.
Consider these scenarios where this test applies:
- A/B testing: Your historical conversion rate is 12%. After a site redesign, you observe 15% conversion on 500 visitors. Is this improvement real or random noise?
- Quality control: A manufacturer claims 2% defect rate. You inspect 1,000 units and find 35 defects (3.5%). Should you reject the shipment?
- Survey analysis: A politician claims 60% approval. Your poll of 800 voters shows 54% approval. Is the claim statistically unsupported?
The test relies on three assumptions:
- Binary outcomes: Each observation falls into exactly one of two categories (success/failure, yes/no, defect/no defect)
- Random sampling: Observations are independent and randomly selected
- Large sample size: The normal approximation holds when np ≥ 10 and n(1-p) ≥ 10
When these assumptions hold, the sampling distribution of proportions approximates a normal distribution, making the z-test valid.
The Math Behind the Test
The z-statistic measures how many standard errors your sample proportion lies from the hypothesized proportion:
$$z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}$$
Where:
- p̂ (p-hat) = sample proportion (successes / sample size)
- p₀ = hypothesized population proportion
- n = sample size
The denominator is the standard error under the null hypothesis—we use p₀, not p̂, because we assume the null is true when calculating the test statistic.
One-tailed vs. two-tailed tests depend on your research question:
- Two-tailed: “Is the proportion different from p₀?” (H₁: p ≠ p₀)
- Right-tailed: “Is the proportion greater than p₀?” (H₁: p > p₀)
- Left-tailed: “Is the proportion less than p₀?” (H₁: p < p₀)
Here’s the manual calculation in Python:
import math
def manual_z_test(successes, n, p0, alternative='two-sided'):
"""
Perform a one-proportion z-test manually.
Parameters:
-----------
successes : int - Number of successes in sample
n : int - Sample size
p0 : float - Hypothesized population proportion
alternative : str - 'two-sided', 'larger', or 'smaller'
Returns:
--------
tuple: (z_statistic, p_value)
"""
# Sample proportion
p_hat = successes / n
# Standard error under null hypothesis
se = math.sqrt(p0 * (1 - p0) / n)
# Z-statistic
z = (p_hat - p0) / se
# P-value calculation using the standard normal CDF
# We'll use scipy for this part
from scipy.stats import norm
if alternative == 'two-sided':
p_value = 2 * (1 - norm.cdf(abs(z)))
elif alternative == 'larger':
p_value = 1 - norm.cdf(z)
elif alternative == 'smaller':
p_value = norm.cdf(z)
else:
raise ValueError("alternative must be 'two-sided', 'larger', or 'smaller'")
return z, p_value
# Example: 75 successes out of 500 trials, testing against p0 = 0.12
z_stat, p_val = manual_z_test(75, 500, 0.12, 'larger')
print(f"Sample proportion: {75/500:.3f}")
print(f"Z-statistic: {z_stat:.4f}")
print(f"P-value: {p_val:.4f}")
Output:
Sample proportion: 0.150
Z-statistic: 2.0656
P-value: 0.0194
Implementation with statsmodels
For production code, use statsmodels. It’s well-tested, handles edge cases, and provides additional functionality like confidence intervals.
from statsmodels.stats.proportion import proportions_ztest, proportion_confint
# Scenario: Testing if website conversion rate exceeds historical 12%
# Observed: 75 conversions out of 500 visitors
count = 75 # Number of successes
nobs = 500 # Number of observations
null_proportion = 0.12 # Hypothesized proportion
# Perform the test
z_stat, p_value = proportions_ztest(
count=count,
nobs=nobs,
value=null_proportion,
alternative='larger' # Testing if proportion > 0.12
)
# Calculate confidence interval
ci_low, ci_high = proportion_confint(
count=count,
nobs=nobs,
alpha=0.05,
method='normal'
)
print("One-Proportion Z-Test Results")
print("=" * 40)
print(f"Sample proportion: {count/nobs:.4f}")
print(f"Null hypothesis: p = {null_proportion}")
print(f"Alternative: p > {null_proportion}")
print(f"Z-statistic: {z_stat:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"95% CI: [{ci_low:.4f}, {ci_high:.4f}]")
print()
alpha = 0.05
if p_value < alpha:
print(f"Result: Reject H₀ at α = {alpha}")
print("Conclusion: Significant evidence that conversion rate exceeds 12%")
else:
print(f"Result: Fail to reject H₀ at α = {alpha}")
print("Conclusion: Insufficient evidence that conversion rate exceeds 12%")
Output:
One-Proportion Z-Test Results
========================================
Sample proportion: 0.1500
Null hypothesis: p = 0.12
Alternative: p > 0.12
Z-statistic: 2.0656
P-value: 0.0194
95% CI: [0.1187, 0.1813]
Result: Reject H₀ at α = 0.05
Conclusion: Significant evidence that conversion rate exceeds 12%
Manual Implementation from Scratch
Sometimes you need a lightweight solution without heavy dependencies, or you want complete control over the implementation:
from scipy.stats import norm
from typing import Tuple, Literal
def one_proportion_ztest(
successes: int,
n: int,
p0: float,
alternative: Literal['two-sided', 'larger', 'smaller'] = 'two-sided',
alpha: float = 0.05
) -> dict:
"""
Perform a one-proportion z-test with comprehensive output.
Parameters:
-----------
successes : int - Number of successes observed
n : int - Total sample size
p0 : float - Hypothesized population proportion (0 < p0 < 1)
alternative : str - Type of alternative hypothesis
alpha : float - Significance level
Returns:
--------
dict with test results
"""
# Input validation
if not 0 < p0 < 1:
raise ValueError("p0 must be between 0 and 1")
if successes > n or successes < 0:
raise ValueError("successes must be between 0 and n")
# Check sample size assumptions
if n * p0 < 10 or n * (1 - p0) < 10:
import warnings
warnings.warn(
"Sample size may be too small for normal approximation. "
"Consider using exact binomial test."
)
# Calculate statistics
p_hat = successes / n
se = (p0 * (1 - p0) / n) ** 0.5
z_stat = (p_hat - p0) / se
# Calculate p-value based on alternative
if alternative == 'two-sided':
p_value = 2 * (1 - norm.cdf(abs(z_stat)))
elif alternative == 'larger':
p_value = 1 - norm.cdf(z_stat)
else: # smaller
p_value = norm.cdf(z_stat)
# Confidence interval (always two-sided)
z_crit = norm.ppf(1 - alpha / 2)
se_ci = (p_hat * (1 - p_hat) / n) ** 0.5 # Use p_hat for CI
ci_lower = p_hat - z_crit * se_ci
ci_upper = p_hat + z_crit * se_ci
return {
'sample_proportion': p_hat,
'null_proportion': p0,
'z_statistic': z_stat,
'p_value': p_value,
'alpha': alpha,
'reject_null': p_value < alpha,
'confidence_interval': (max(0, ci_lower), min(1, ci_upper)),
'alternative': alternative
}
# Test it
results = one_proportion_ztest(75, 500, 0.12, alternative='larger')
for key, value in results.items():
print(f"{key}: {value}")
Interpreting and Visualizing Results
Visualization helps communicate results to non-technical stakeholders. Here’s how to plot the test with rejection regions:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
def visualize_z_test(z_stat, alternative='two-sided', alpha=0.05):
"""Visualize the z-test with rejection regions."""
fig, ax = plt.subplots(figsize=(10, 6))
# Generate x values for the standard normal distribution
x = np.linspace(-4, 4, 1000)
y = norm.pdf(x)
# Plot the distribution
ax.plot(x, y, 'b-', linewidth=2, label='Standard Normal')
ax.fill_between(x, y, alpha=0.1, color='blue')
# Determine critical values and shade rejection regions
if alternative == 'two-sided':
z_crit = norm.ppf(1 - alpha / 2)
# Right tail
x_reject_right = x[x >= z_crit]
ax.fill_between(x_reject_right, norm.pdf(x_reject_right),
color='red', alpha=0.4, label=f'Rejection region (α={alpha})')
# Left tail
x_reject_left = x[x <= -z_crit]
ax.fill_between(x_reject_left, norm.pdf(x_reject_left),
color='red', alpha=0.4)
ax.axvline(-z_crit, color='red', linestyle='--', linewidth=1.5)
ax.axvline(z_crit, color='red', linestyle='--', linewidth=1.5)
elif alternative == 'larger':
z_crit = norm.ppf(1 - alpha)
x_reject = x[x >= z_crit]
ax.fill_between(x_reject, norm.pdf(x_reject),
color='red', alpha=0.4, label=f'Rejection region (α={alpha})')
ax.axvline(z_crit, color='red', linestyle='--', linewidth=1.5)
else: # smaller
z_crit = norm.ppf(alpha)
x_reject = x[x <= z_crit]
ax.fill_between(x_reject, norm.pdf(x_reject),
color='red', alpha=0.4, label=f'Rejection region (α={alpha})')
ax.axvline(z_crit, color='red', linestyle='--', linewidth=1.5)
# Plot the observed z-statistic
ax.axvline(z_stat, color='green', linewidth=2, label=f'Observed z = {z_stat:.3f}')
ax.plot(z_stat, norm.pdf(z_stat), 'go', markersize=10)
ax.set_xlabel('Z-score', fontsize=12)
ax.set_ylabel('Probability Density', fontsize=12)
ax.set_title('One-Proportion Z-Test Visualization', fontsize=14)
ax.legend(loc='upper right')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('z_test_visualization.png', dpi=150)
plt.show()
# Visualize our test result
visualize_z_test(2.0656, alternative='larger', alpha=0.05)
Practical Example: End-to-End Analysis
Let’s work through a complete real-world scenario:
"""
Scenario: Your product team launched a new onboarding flow.
Historical signup completion rate: 34%
After 2 weeks, you have data from 1,247 new users.
478 completed the signup process.
Question: Did the new flow improve signup completion?
"""
from statsmodels.stats.proportion import proportions_ztest, proportion_confint
import numpy as np
# Data
completed = 478
total_users = 1247
historical_rate = 0.34
# Step 1: State hypotheses
print("HYPOTHESIS TEST: New Onboarding Flow")
print("=" * 50)
print(f"H₀: p = {historical_rate} (no improvement)")
print(f"H₁: p > {historical_rate} (improvement)")
print(f"Significance level: α = 0.05")
print()
# Step 2: Check assumptions
print("ASSUMPTION CHECKS")
print("-" * 50)
np_check = total_users * historical_rate
nq_check = total_users * (1 - historical_rate)
print(f"n × p₀ = {np_check:.1f} {'✓' if np_check >= 10 else '✗'} (need ≥ 10)")
print(f"n × (1-p₀) = {nq_check:.1f} {'✓' if nq_check >= 10 else '✗'} (need ≥ 10)")
print()
# Step 3: Perform the test
z_stat, p_value = proportions_ztest(
count=completed,
nobs=total_users,
value=historical_rate,
alternative='larger'
)
ci_low, ci_high = proportion_confint(completed, total_users, alpha=0.05)
# Step 4: Results
print("RESULTS")
print("-" * 50)
print(f"Sample size: {total_users}")
print(f"Observed completions: {completed}")
print(f"Sample proportion: {completed/total_users:.4f} ({completed/total_users*100:.2f}%)")
print(f"Z-statistic: {z_stat:.4f}")
print(f"P-value: {p_value:.6f}")
print(f"95% CI: [{ci_low:.4f}, {ci_high:.4f}]")
print()
# Step 5: Decision and interpretation
print("CONCLUSION")
print("-" * 50)
alpha = 0.05
if p_value < alpha:
print(f"✓ Reject H₀ (p-value {p_value:.4f} < α {alpha})")
print()
lift = (completed/total_users - historical_rate) / historical_rate * 100
print(f"The new onboarding flow shows a statistically significant")
print(f"improvement in signup completion rate.")
print(f"Observed lift: {lift:.1f}% relative improvement")
else:
print(f"✗ Fail to reject H₀ (p-value {p_value:.4f} ≥ α {alpha})")
print("Insufficient evidence to conclude the new flow improved signups.")
Common Pitfalls and Best Practices
Sample size requirements matter. The rule of thumb—np ≥ 10 and n(1-p) ≥ 10—exists because the normal approximation breaks down with small samples. For edge cases, use the exact binomial test:
from scipy.stats import binom_test, binomtest
# For small samples, use exact binomial test
# Example: 3 successes out of 20 trials, testing against p0 = 0.25
result = binomtest(3, 20, 0.25, alternative='less')
print(f"Exact binomial p-value: {result.pvalue:.4f}")
Multiple testing inflates false positives. If you run 20 tests at α = 0.05, you expect one false positive by chance. Apply corrections like Bonferroni when testing multiple proportions.
Effect size matters more than p-values. A statistically significant result with a tiny effect size may not be practically meaningful. Always report confidence intervals and consider the business impact of the observed difference.
Don’t confuse statistical significance with practical significance. A 0.5% improvement might be statistically significant with a large enough sample, but is it worth the engineering effort? Let business context guide your decisions.