How to Perform a One-Proportion Z-Test in Python

Key Insights

The one-proportion z-test determines whether a sample proportion significantly differs from a hypothesized population proportion—essential for A/B testing, quality control, and survey analysis
Use statsmodels.stats.proportion.proportions_ztest() for production code, but understanding the manual calculation helps you debug issues and explain results to stakeholders
The test requires large samples (np ≥ 10 and n(1-p) ≥ 10); for smaller samples, use the exact binomial test instead

Introduction to One-Proportion Z-Tests

The one-proportion z-test answers a simple question: does my observed proportion differ significantly from an expected value? You’re not comparing two groups—you’re comparing one sample against a known or hypothesized standard.

Consider these scenarios where this test applies:

A/B testing: Your historical conversion rate is 12%. After a site redesign, you observe 15% conversion on 500 visitors. Is this improvement real or random noise?
Quality control: A manufacturer claims 2% defect rate. You inspect 1,000 units and find 35 defects (3.5%). Should you reject the shipment?
Survey analysis: A politician claims 60% approval. Your poll of 800 voters shows 54% approval. Is the claim statistically unsupported?

The test relies on three assumptions:

Binary outcomes: Each observation falls into exactly one of two categories (success/failure, yes/no, defect/no defect)
Random sampling: Observations are independent and randomly selected
Large sample size: The normal approximation holds when np ≥ 10 and n(1-p) ≥ 10

When these assumptions hold, the sampling distribution of proportions approximates a normal distribution, making the z-test valid.

The Math Behind the Test

The z-statistic measures how many standard errors your sample proportion lies from the hypothesized proportion:

$$z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}$$

Where:

p̂ (p-hat) = sample proportion (successes / sample size)
p₀ = hypothesized population proportion
n = sample size

The denominator is the standard error under the null hypothesis—we use p₀, not p̂, because we assume the null is true when calculating the test statistic.

One-tailed vs. two-tailed tests depend on your research question:

Two-tailed: “Is the proportion different from p₀?” (H₁: p ≠ p₀)
Right-tailed: “Is the proportion greater than p₀?” (H₁: p > p₀)
Left-tailed: “Is the proportion less than p₀?” (H₁: p < p₀)

Here’s the manual calculation in Python:

import math

def manual_z_test(successes, n, p0, alternative='two-sided'):
    """
    Perform a one-proportion z-test manually.
    
    Parameters:
    -----------
    successes : int - Number of successes in sample
    n : int - Sample size
    p0 : float - Hypothesized population proportion
    alternative : str - 'two-sided', 'larger', or 'smaller'
    
    Returns:
    --------
    tuple: (z_statistic, p_value)
    """
    # Sample proportion
    p_hat = successes / n
    
    # Standard error under null hypothesis
    se = math.sqrt(p0 * (1 - p0) / n)
    
    # Z-statistic
    z = (p_hat - p0) / se
    
    # P-value calculation using the standard normal CDF
    # We'll use scipy for this part
    from scipy.stats import norm
    
    if alternative == 'two-sided':
        p_value = 2 * (1 - norm.cdf(abs(z)))
    elif alternative == 'larger':
        p_value = 1 - norm.cdf(z)
    elif alternative == 'smaller':
        p_value = norm.cdf(z)
    else:
        raise ValueError("alternative must be 'two-sided', 'larger', or 'smaller'")
    
    return z, p_value

# Example: 75 successes out of 500 trials, testing against p0 = 0.12
z_stat, p_val = manual_z_test(75, 500, 0.12, 'larger')
print(f"Sample proportion: {75/500:.3f}")
print(f"Z-statistic: {z_stat:.4f}")
print(f"P-value: {p_val:.4f}")

Output:

Sample proportion: 0.150
Z-statistic: 2.0656
P-value: 0.0194

Implementation with statsmodels

For production code, use statsmodels. It’s well-tested, handles edge cases, and provides additional functionality like confidence intervals.

from statsmodels.stats.proportion import proportions_ztest, proportion_confint

# Scenario: Testing if website conversion rate exceeds historical 12%
# Observed: 75 conversions out of 500 visitors

count = 75          # Number of successes
nobs = 500          # Number of observations
null_proportion = 0.12  # Hypothesized proportion

# Perform the test
z_stat, p_value = proportions_ztest(
    count=count,
    nobs=nobs,
    value=null_proportion,
    alternative='larger'  # Testing if proportion > 0.12
)

# Calculate confidence interval
ci_low, ci_high = proportion_confint(
    count=count,
    nobs=nobs,
    alpha=0.05,
    method='normal'
)

print("One-Proportion Z-Test Results")
print("=" * 40)
print(f"Sample proportion: {count/nobs:.4f}")
print(f"Null hypothesis: p = {null_proportion}")
print(f"Alternative: p > {null_proportion}")
print(f"Z-statistic: {z_stat:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"95% CI: [{ci_low:.4f}, {ci_high:.4f}]")
print()

alpha = 0.05
if p_value < alpha:
    print(f"Result: Reject H₀ at α = {alpha}")
    print("Conclusion: Significant evidence that conversion rate exceeds 12%")
else:
    print(f"Result: Fail to reject H₀ at α = {alpha}")
    print("Conclusion: Insufficient evidence that conversion rate exceeds 12%")

Output:

One-Proportion Z-Test Results
========================================
Sample proportion: 0.1500
Null hypothesis: p = 0.12
Alternative: p > 0.12
Z-statistic: 2.0656
P-value: 0.0194
95% CI: [0.1187, 0.1813]

Result: Reject H₀ at α = 0.05
Conclusion: Significant evidence that conversion rate exceeds 12%

Manual Implementation from Scratch

Sometimes you need a lightweight solution without heavy dependencies, or you want complete control over the implementation:

from scipy.stats import norm
from typing import Tuple, Literal

def one_proportion_ztest(
    successes: int,
    n: int,
    p0: float,
    alternative: Literal['two-sided', 'larger', 'smaller'] = 'two-sided',
    alpha: float = 0.05
) -> dict:
    """
    Perform a one-proportion z-test with comprehensive output.
    
    Parameters:
    -----------
    successes : int - Number of successes observed
    n : int - Total sample size
    p0 : float - Hypothesized population proportion (0 < p0 < 1)
    alternative : str - Type of alternative hypothesis
    alpha : float - Significance level
    
    Returns:
    --------
    dict with test results
    """
    # Input validation
    if not 0 < p0 < 1:
        raise ValueError("p0 must be between 0 and 1")
    if successes > n or successes < 0:
        raise ValueError("successes must be between 0 and n")
    
    # Check sample size assumptions
    if n * p0 < 10 or n * (1 - p0) < 10:
        import warnings
        warnings.warn(
            "Sample size may be too small for normal approximation. "
            "Consider using exact binomial test."
        )
    
    # Calculate statistics
    p_hat = successes / n
    se = (p0 * (1 - p0) / n) ** 0.5
    z_stat = (p_hat - p0) / se
    
    # Calculate p-value based on alternative
    if alternative == 'two-sided':
        p_value = 2 * (1 - norm.cdf(abs(z_stat)))
    elif alternative == 'larger':
        p_value = 1 - norm.cdf(z_stat)
    else:  # smaller
        p_value = norm.cdf(z_stat)
    
    # Confidence interval (always two-sided)
    z_crit = norm.ppf(1 - alpha / 2)
    se_ci = (p_hat * (1 - p_hat) / n) ** 0.5  # Use p_hat for CI
    ci_lower = p_hat - z_crit * se_ci
    ci_upper = p_hat + z_crit * se_ci
    
    return {
        'sample_proportion': p_hat,
        'null_proportion': p0,
        'z_statistic': z_stat,
        'p_value': p_value,
        'alpha': alpha,
        'reject_null': p_value < alpha,
        'confidence_interval': (max(0, ci_lower), min(1, ci_upper)),
        'alternative': alternative
    }

# Test it
results = one_proportion_ztest(75, 500, 0.12, alternative='larger')
for key, value in results.items():
    print(f"{key}: {value}")

Interpreting and Visualizing Results

Visualization helps communicate results to non-technical stakeholders. Here’s how to plot the test with rejection regions:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

def visualize_z_test(z_stat, alternative='two-sided', alpha=0.05):
    """Visualize the z-test with rejection regions."""
    fig, ax = plt.subplots(figsize=(10, 6))
    
    # Generate x values for the standard normal distribution
    x = np.linspace(-4, 4, 1000)
    y = norm.pdf(x)
    
    # Plot the distribution
    ax.plot(x, y, 'b-', linewidth=2, label='Standard Normal')
    ax.fill_between(x, y, alpha=0.1, color='blue')
    
    # Determine critical values and shade rejection regions
    if alternative == 'two-sided':
        z_crit = norm.ppf(1 - alpha / 2)
        # Right tail
        x_reject_right = x[x >= z_crit]
        ax.fill_between(x_reject_right, norm.pdf(x_reject_right), 
                       color='red', alpha=0.4, label=f'Rejection region (α={alpha})')
        # Left tail
        x_reject_left = x[x <= -z_crit]
        ax.fill_between(x_reject_left, norm.pdf(x_reject_left), 
                       color='red', alpha=0.4)
        ax.axvline(-z_crit, color='red', linestyle='--', linewidth=1.5)
        ax.axvline(z_crit, color='red', linestyle='--', linewidth=1.5)
    elif alternative == 'larger':
        z_crit = norm.ppf(1 - alpha)
        x_reject = x[x >= z_crit]
        ax.fill_between(x_reject, norm.pdf(x_reject), 
                       color='red', alpha=0.4, label=f'Rejection region (α={alpha})')
        ax.axvline(z_crit, color='red', linestyle='--', linewidth=1.5)
    else:  # smaller
        z_crit = norm.ppf(alpha)
        x_reject = x[x <= z_crit]
        ax.fill_between(x_reject, norm.pdf(x_reject), 
                       color='red', alpha=0.4, label=f'Rejection region (α={alpha})')
        ax.axvline(z_crit, color='red', linestyle='--', linewidth=1.5)
    
    # Plot the observed z-statistic
    ax.axvline(z_stat, color='green', linewidth=2, label=f'Observed z = {z_stat:.3f}')
    ax.plot(z_stat, norm.pdf(z_stat), 'go', markersize=10)
    
    ax.set_xlabel('Z-score', fontsize=12)
    ax.set_ylabel('Probability Density', fontsize=12)
    ax.set_title('One-Proportion Z-Test Visualization', fontsize=14)
    ax.legend(loc='upper right')
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig('z_test_visualization.png', dpi=150)
    plt.show()

# Visualize our test result
visualize_z_test(2.0656, alternative='larger', alpha=0.05)

Practical Example: End-to-End Analysis

Let’s work through a complete real-world scenario:

"""
Scenario: Your product team launched a new onboarding flow. 
Historical signup completion rate: 34%
After 2 weeks, you have data from 1,247 new users.
478 completed the signup process.

Question: Did the new flow improve signup completion?
"""

from statsmodels.stats.proportion import proportions_ztest, proportion_confint
import numpy as np

# Data
completed = 478
total_users = 1247
historical_rate = 0.34

# Step 1: State hypotheses
print("HYPOTHESIS TEST: New Onboarding Flow")
print("=" * 50)
print(f"H₀: p = {historical_rate} (no improvement)")
print(f"H₁: p > {historical_rate} (improvement)")
print(f"Significance level: α = 0.05")
print()

# Step 2: Check assumptions
print("ASSUMPTION CHECKS")
print("-" * 50)
np_check = total_users * historical_rate
nq_check = total_users * (1 - historical_rate)
print(f"n × p₀ = {np_check:.1f} {'✓' if np_check >= 10 else '✗'} (need ≥ 10)")
print(f"n × (1-p₀) = {nq_check:.1f} {'✓' if nq_check >= 10 else '✗'} (need ≥ 10)")
print()

# Step 3: Perform the test
z_stat, p_value = proportions_ztest(
    count=completed,
    nobs=total_users,
    value=historical_rate,
    alternative='larger'
)

ci_low, ci_high = proportion_confint(completed, total_users, alpha=0.05)

# Step 4: Results
print("RESULTS")
print("-" * 50)
print(f"Sample size: {total_users}")
print(f"Observed completions: {completed}")
print(f"Sample proportion: {completed/total_users:.4f} ({completed/total_users*100:.2f}%)")
print(f"Z-statistic: {z_stat:.4f}")
print(f"P-value: {p_value:.6f}")
print(f"95% CI: [{ci_low:.4f}, {ci_high:.4f}]")
print()

# Step 5: Decision and interpretation
print("CONCLUSION")
print("-" * 50)
alpha = 0.05
if p_value < alpha:
    print(f"✓ Reject H₀ (p-value {p_value:.4f} < α {alpha})")
    print()
    lift = (completed/total_users - historical_rate) / historical_rate * 100
    print(f"The new onboarding flow shows a statistically significant")
    print(f"improvement in signup completion rate.")
    print(f"Observed lift: {lift:.1f}% relative improvement")
else:
    print(f"✗ Fail to reject H₀ (p-value {p_value:.4f} ≥ α {alpha})")
    print("Insufficient evidence to conclude the new flow improved signups.")

Common Pitfalls and Best Practices

Sample size requirements matter. The rule of thumb—np ≥ 10 and n(1-p) ≥ 10—exists because the normal approximation breaks down with small samples. For edge cases, use the exact binomial test:

from scipy.stats import binom_test, binomtest

# For small samples, use exact binomial test
# Example: 3 successes out of 20 trials, testing against p0 = 0.25
result = binomtest(3, 20, 0.25, alternative='less')
print(f"Exact binomial p-value: {result.pvalue:.4f}")

Multiple testing inflates false positives. If you run 20 tests at α = 0.05, you expect one false positive by chance. Apply corrections like Bonferroni when testing multiple proportions.

Effect size matters more than p-values. A statistically significant result with a tiny effect size may not be practically meaningful. Always report confidence intervals and consider the business impact of the observed difference.

Don’t confuse statistical significance with practical significance. A 0.5% improvement might be statistically significant with a large enough sample, but is it worth the engineering effort? Let business context guide your decisions.