How to Calculate Power Analysis in Python

Key Insights

Power analysis answers a critical question before you collect data: how many participants do you actually need to detect a meaningful effect with confidence?
Python’s statsmodels library provides ready-to-use power analysis functions for t-tests, chi-square tests, and ANOVA—covering most common research scenarios.
Always perform power analysis before your study; post-hoc power analysis on completed studies is statistically meaningless and should be avoided.

Introduction to Statistical Power

Statistical power is the probability that your study will detect an effect when one truly exists. In formal terms, it’s the probability of correctly rejecting a false null hypothesis (avoiding a Type II error). A study with 80% power has an 80% chance of finding a statistically significant result if the true effect exists.

Four interconnected components determine statistical power:

Effect size: The magnitude of the difference or relationship you’re trying to detect
Sample size: The number of observations in your study
Significance level (alpha): Your threshold for statistical significance, typically 0.05
Power: The probability of detecting the effect, conventionally set at 0.80 or higher

These four values are mathematically linked. If you fix any three, you can calculate the fourth. This relationship is the foundation of power analysis.

When and Why to Use Power Analysis

Power analysis serves two primary purposes, though only one is statistically valid.

A priori power analysis happens before data collection. You specify your desired power level (usually 0.80), your significance threshold (usually 0.05), and your expected effect size. The analysis then tells you how many participants you need. This is the legitimate and valuable use of power analysis.

Post-hoc power analysis attempts to calculate power after a study is complete using the observed effect size. This practice is statistically flawed and should be avoided. Post-hoc power is a direct mathematical transformation of the p-value—it provides no additional information. If your study found a non-significant result, post-hoc power will always be low. It’s circular reasoning dressed up as analysis.

The consequences of skipping proper power analysis are severe. Underpowered studies waste resources and participant time while producing unreliable results that fail to replicate. They contribute to the replication crisis plaguing many scientific fields. Overpowered studies, while less common, waste resources by recruiting far more participants than necessary—an ethical concern when participant burden is high.

Setting Up Your Python Environment

The statsmodels library provides comprehensive power analysis tools through its stats.power module. Here’s what you need:

import numpy as np
import matplotlib.pyplot as plt
from statsmodels.stats.power import (
    TTestIndPower,
    TTestPower,
    NormalIndPower,
    GofChisquarePower,
    FTestAnovaPower
)

# Create analysis objects for different test types
ttest_ind_analysis = TTestIndPower()
ttest_paired_analysis = TTestPower()
z_test_analysis = NormalIndPower()
chi_square_analysis = GofChisquarePower()
anova_analysis = FTestAnovaPower()

Install the required packages if you haven’t already:

pip install statsmodels numpy matplotlib

Power Analysis for Common Statistical Tests

Each test type in statsmodels follows the same interface pattern. You create an analysis object and call solve_power() to find any unknown parameter.

Independent Samples T-Test

The most common scenario: comparing means between two groups.

from statsmodels.stats.power import TTestIndPower

analysis = TTestIndPower()

# Calculate required sample size per group
sample_size = analysis.solve_power(
    effect_size=0.5,      # Cohen's d (medium effect)
    power=0.80,           # 80% power
    alpha=0.05,           # 5% significance level
    ratio=1.0,            # Equal group sizes
    alternative='two-sided'
)

print(f"Required sample size per group: {np.ceil(sample_size)}")
# Output: Required sample size per group: 64.0

Paired Samples T-Test

For within-subjects designs where the same participants are measured twice:

from statsmodels.stats.power import TTestPower

paired_analysis = TTestPower()

# Paired designs typically need fewer participants
sample_size = paired_analysis.solve_power(
    effect_size=0.5,
    power=0.80,
    alpha=0.05,
    alternative='two-sided'
)

print(f"Required sample size for paired design: {np.ceil(sample_size)}")
# Output: Required sample size for paired design: 34.0

Chi-Square Tests

For categorical data and goodness-of-fit tests:

from statsmodels.stats.power import GofChisquarePower

chi_analysis = GofChisquarePower()

# Effect size 'w' for chi-square (0.1=small, 0.3=medium, 0.5=large)
sample_size = chi_analysis.solve_power(
    effect_size=0.3,
    power=0.80,
    alpha=0.05,
    n_bins=4  # Number of categories
)

print(f"Required total sample size: {np.ceil(sample_size)}")
# Output: Required total sample size: 122.0

One-Way ANOVA

For comparing means across three or more groups:

from statsmodels.stats.power import FTestAnovaPower

anova_analysis = FTestAnovaPower()

# Effect size 'f' for ANOVA (0.1=small, 0.25=medium, 0.4=large)
sample_size = anova_analysis.solve_power(
    effect_size=0.25,
    power=0.80,
    alpha=0.05,
    k_groups=4  # Number of groups
)

print(f"Required sample size per group: {np.ceil(sample_size)}")
# Output: Required sample size per group: 45.0

Calculating Sample Size vs. Power

The solve_power() method can solve for any parameter by setting it to None. This flexibility lets you approach power analysis from different angles.

from statsmodels.stats.power import TTestIndPower

analysis = TTestIndPower()

# Scenario 1: Find required sample size (most common)
n = analysis.solve_power(
    effect_size=0.5,
    power=0.80,
    alpha=0.05,
    ratio=1.0,
    alternative='two-sided'
)
print(f"Required n per group: {np.ceil(n)}")

# Scenario 2: Find achievable power with fixed sample size
power = analysis.solve_power(
    effect_size=0.5,
    nobs1=50,             # Fixed at 50 per group
    alpha=0.05,
    ratio=1.0,
    alternative='two-sided'
)
print(f"Achievable power with n=50: {power:.3f}")

# Scenario 3: Find minimum detectable effect size
effect = analysis.solve_power(
    nobs1=100,
    power=0.80,
    alpha=0.05,
    ratio=1.0,
    alternative='two-sided'
)
print(f"Minimum detectable effect with n=100: {effect:.3f}")

Output:

Required n per group: 64.0
Achievable power with n=50: 0.697
Minimum detectable effect with n=100: 0.398

This last calculation is particularly useful when your sample size is constrained by budget or population availability. It tells you the smallest effect you can reliably detect.

Visualizing Power Curves

Power curves show how power changes with sample size across different effect sizes. They’re invaluable for communicating with stakeholders and making informed tradeoffs.

import numpy as np
import matplotlib.pyplot as plt
from statsmodels.stats.power import TTestIndPower

analysis = TTestIndPower()

# Define sample sizes and effect sizes to explore
sample_sizes = np.arange(10, 200, 5)
effect_sizes = [0.2, 0.5, 0.8]  # Small, medium, large
labels = ['Small (d=0.2)', 'Medium (d=0.5)', 'Large (d=0.8)']
colors = ['#e74c3c', '#3498db', '#2ecc71']

plt.figure(figsize=(10, 6))

for effect, label, color in zip(effect_sizes, labels, colors):
    powers = [analysis.solve_power(
        effect_size=effect,
        nobs1=n,
        alpha=0.05,
        ratio=1.0,
        alternative='two-sided'
    ) for n in sample_sizes]
    
    plt.plot(sample_sizes, powers, label=label, color=color, linewidth=2)

# Add reference line at 80% power
plt.axhline(y=0.80, color='gray', linestyle='--', alpha=0.7, label='80% Power')

plt.xlabel('Sample Size per Group', fontsize=12)
plt.ylabel('Statistical Power', fontsize=12)
plt.title('Power Curves for Independent Samples T-Test', fontsize=14)
plt.legend(loc='lower right')
plt.grid(True, alpha=0.3)
plt.ylim(0, 1)
plt.tight_layout()
plt.savefig('power_curves.png', dpi=150)
plt.show()

This visualization immediately shows that detecting small effects requires dramatically larger samples. With a medium effect size (d=0.5), you need about 64 participants per group for 80% power. For a small effect (d=0.2), you need nearly 400 per group.

You can also use the built-in plotting method for quick visualizations:

# Built-in power plot
fig = analysis.plot_power(
    dep_var='nobs',
    nobs=np.arange(10, 150, 5),
    effect_size=[0.2, 0.5, 0.8],
    alpha=0.05,
    alternative='two-sided'
)
plt.title('Power by Sample Size')
plt.show()

Practical Considerations and Best Practices

Choosing effect sizes: Cohen’s conventions (small=0.2, medium=0.5, large=0.8 for d) are starting points, not gospel. Better approaches include using effect sizes from prior research, pilot studies, or determining the smallest effect that would be practically meaningful in your domain. A drug that reduces blood pressure by 1 mmHg might be statistically detectable with enough participants, but clinically irrelevant.

Handling multiple comparisons: If you’re running multiple tests, your effective alpha decreases after correction. Account for this in your power analysis:

# Bonferroni correction for 5 comparisons
n_comparisons = 5
corrected_alpha = 0.05 / n_comparisons

sample_size = analysis.solve_power(
    effect_size=0.5,
    power=0.80,
    alpha=corrected_alpha,  # 0.01 instead of 0.05
    ratio=1.0,
    alternative='two-sided'
)
print(f"Required n with Bonferroni correction: {np.ceil(sample_size)}")
# Output: Required n with Bonferroni correction: 95.0

Unequal group sizes: When you can’t have equal groups, specify the ratio parameter. Unequal groups reduce power, so you’ll need more total participants:

# 2:1 ratio (control group twice as large)
sample_size = analysis.solve_power(
    effect_size=0.5,
    power=0.80,
    alpha=0.05,
    ratio=2.0,  # nobs2 = 2 * nobs1
    alternative='two-sided'
)
print(f"Smaller group size: {np.ceil(sample_size)}")
print(f"Larger group size: {np.ceil(sample_size * 2)}")

When to consult a statistician: Power analysis for simple designs is straightforward. But for mixed models, complex factorial designs, clustered data, or survival analysis, the assumptions become more nuanced. If your study design doesn’t map cleanly onto these basic functions, get expert help before committing resources.

Power analysis isn’t optional—it’s a fundamental part of responsible study design. Run it before you collect data, use realistic effect sizes, and build in a buffer for dropout and exclusions. Your future self, trying to interpret ambiguous results from an underpowered study, will thank you.