How to Perform Power Analysis in R

Key Insights

Power analysis should be conducted before data collection to determine the sample size needed to detect meaningful effects—post-hoc power analysis is largely uninformative and often misleading.
The pwr package in R provides straightforward functions for calculating power, sample size, or effect size for most common statistical tests, requiring you to specify three of the four key parameters.
Effect size estimation is the most challenging aspect of power analysis; use pilot data, published literature, or domain expertise rather than blindly applying Cohen’s conventions.

Introduction to Power Analysis

Statistical power is the probability that your study will detect an effect when one truly exists. More formally, it’s the probability of correctly rejecting a false null hypothesis—avoiding a Type II error (false negative). Most researchers target 80% power, meaning they accept a 20% chance of missing a real effect.

Power analysis involves four interconnected parameters:

Effect size: The magnitude of the phenomenon you’re studying
Sample size: The number of observations in your study
Significance level (α): Your threshold for Type I errors (typically 0.05)
Power (1-β): The probability of detecting a true effect

These parameters are mathematically linked. Specify any three, and you can solve for the fourth. In practice, you’ll most often fix your significance level and desired power, estimate the effect size, and solve for the required sample size.

When to Use Power Analysis

Power analysis comes in two flavors, but only one is genuinely useful.

A priori power analysis happens before data collection. You specify your desired power, significance level, and expected effect size to calculate the minimum sample size needed. This is the gold standard for study planning. It prevents two costly mistakes: collecting too few observations to detect real effects, or wasting resources on unnecessarily large samples.

Post-hoc power analysis attempts to calculate achieved power after a study concludes, typically when results aren’t significant. This practice is statistically problematic. Post-hoc power is a direct mathematical transformation of the p-value—it provides no additional information. A non-significant result with low post-hoc power doesn’t mean “we might have found something with more data.” It means you have insufficient evidence, full stop.

Focus your energy on a priori analysis. Calculate required sample sizes before committing resources, and design studies that can actually answer your research questions.

The pwr Package: Your Main Tool

The pwr package is the workhorse for power analysis in R. It covers most common statistical tests with a consistent interface.

# Install and load the package
install.packages("pwr")
library(pwr)

# Core functions available in pwr
# pwr.t.test()      - t-tests (one-sample, two-sample, paired)
# pwr.t2n.test()    - two-sample t-test with unequal n
# pwr.anova.test()  - one-way ANOVA
# pwr.r.test()      - correlation test
# pwr.chisq.test()  - chi-square test
# pwr.f2.test()     - general linear model
# pwr.p.test()      - proportion test (one-sample)
# pwr.2p.test()     - proportion test (two-sample)

Each function follows the same pattern: provide three of the four key parameters, leave one as NULL, and the function solves for the missing value.

Power Analysis for Common Tests

Let’s work through practical examples for the tests you’ll use most frequently.

T-Test Power Analysis

Suppose you’re comparing a treatment group to a control group and expect a medium effect size (Cohen’s d = 0.5). You want 80% power at α = 0.05.

# Two-sample t-test: solving for sample size
result <- pwr.t.test(
  d = 0.5,           # expected effect size (Cohen's d)
  sig.level = 0.05,  # significance level
  power = 0.80,      # desired power
  type = "two.sample",
  alternative = "two.sided"
)

print(result)
# n = 63.76561 per group

# Always round up for sample size
ceiling(result$n)  # 64 per group, 128 total

You can also solve for power given a fixed sample size:

# What power do we achieve with 50 participants per group?
pwr.t.test(
  n = 50,
  d = 0.5,
  sig.level = 0.05,
  type = "two.sample"
)
# power = 0.6968951 (about 70%)

ANOVA Power Analysis

For one-way ANOVA, you need to specify the number of groups and the effect size (Cohen’s f, not d).

# One-way ANOVA with 4 groups, medium effect
pwr.anova.test(
  k = 4,             # number of groups
  f = 0.25,          # effect size (Cohen's f)
  sig.level = 0.05,
  power = 0.80
)
# n = 44.59927 per group

# Total sample: 4 * 45 = 180 participants

Correlation Test

Testing whether a correlation differs from zero:

# Power analysis for correlation
pwr.r.test(
  r = 0.3,           # expected correlation
  sig.level = 0.05,
  power = 0.80,
  alternative = "two.sided"
)
# n = 84.07364

# You need 85 paired observations

Chi-Square Test

For chi-square tests, you need Cohen’s w (effect size for proportions) and degrees of freedom.

# Chi-square test: 3x2 contingency table
# df = (rows - 1) * (cols - 1) = 2 * 1 = 2
pwr.chisq.test(
  w = 0.3,           # effect size (Cohen's w)
  df = 2,            # degrees of freedom
  sig.level = 0.05,
  power = 0.80
)
# N = 107.0068 total observations

Understanding and Calculating Effect Sizes

Effect size estimation is where power analysis gets difficult. Cohen provided rough conventions, but these should be starting points, not defaults.

Test	Small	Medium	Large
t-test (d)	0.2	0.5	0.8
ANOVA (f)	0.1	0.25	0.4
Correlation (r)	0.1	0.3	0.5
Chi-square (w)	0.1	0.3	0.5

Better approaches for estimating effect sizes:

Pilot data: Run a small preliminary study
Published literature: Find effect sizes from similar studies
Minimum meaningful effect: What’s the smallest effect that would matter practically?

# Calculate Cohen's d from group statistics
cohens_d <- function(mean1, mean2, sd1, sd2, n1, n2) {
  # Pooled standard deviation
  pooled_sd <- sqrt(((n1 - 1) * sd1^2 + (n2 - 1) * sd2^2) / (n1 + n2 - 2))
  # Cohen's d
  (mean1 - mean2) / pooled_sd
}

# Example: treatment group (M=75, SD=10, n=30) vs control (M=70, SD=12, n=30)
d <- cohens_d(75, 70, 10, 12, 30, 30)
print(d)  # 0.452 - between small and medium

# Convert between effect sizes
# Cohen's d to Cohen's f (for ANOVA)
d_to_f <- function(d) d / 2
d_to_f(0.5)  # 0.25

# Cohen's d to correlation r
d_to_r <- function(d) d / sqrt(d^2 + 4)
d_to_r(0.5)  # 0.243

Visualizing Power Curves

Power curves show how power changes across different sample sizes or effect sizes. They’re invaluable for understanding the trade-offs in your study design.

library(ggplot2)

# Generate power curve data
sample_sizes <- seq(10, 150, by = 5)
effect_sizes <- c(0.2, 0.5, 0.8)

power_data <- expand.grid(n = sample_sizes, d = effect_sizes)
power_data$power <- mapply(function(n, d) {
  pwr.t.test(n = n, d = d, sig.level = 0.05, type = "two.sample")$power
}, power_data$n, power_data$d)

power_data$effect <- factor(power_data$d, 
                            labels = c("Small (d=0.2)", 
                                      "Medium (d=0.5)", 
                                      "Large (d=0.8)"))

# Create the plot
ggplot(power_data, aes(x = n, y = power, color = effect)) +
  geom_line(size = 1.2) +
  geom_hline(yintercept = 0.80, linetype = "dashed", color = "gray40") +
  annotate("text", x = 140, y = 0.82, label = "80% power", size = 3) +
  scale_y_continuous(limits = c(0, 1), breaks = seq(0, 1, 0.2)) +
  labs(
    x = "Sample Size (per group)",
    y = "Statistical Power",
    color = "Effect Size",
    title = "Power Curves for Two-Sample t-Test"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

This visualization immediately shows why detecting small effects requires substantially larger samples. With d = 0.2, you need roughly 400 participants per group for 80% power—versus only 26 per group for large effects.

# Quick base R alternative
plot(sample_sizes, 
     sapply(sample_sizes, function(n) 
       pwr.t.test(n = n, d = 0.5, sig.level = 0.05)$power),
     type = "l", lwd = 2,
     xlab = "Sample Size (per group)", 
     ylab = "Power",
     main = "Power Curve for Medium Effect (d = 0.5)")
abline(h = 0.80, lty = 2, col = "red")

Summary and Best Practices

Power analysis is a planning tool, not a post-hoc justification. Do it right, and you’ll design studies that can actually answer your questions.

Best practices:

Conduct power analysis during study design, not after data collection
Justify your effect size estimate with pilot data, literature, or practical significance arguments—don’t just use “medium” because it’s convenient
Report your power analysis in publications: state the expected effect size, target power, and resulting sample size calculation
Consider uncertainty: if your effect size estimate is rough, calculate sample sizes for a range of plausible values
Account for attrition: if you expect 15% dropout, inflate your target sample accordingly

Common pitfalls to avoid:

Using post-hoc power to interpret non-significant results
Applying Cohen’s conventions without domain-specific justification
Forgetting that pwr functions return sample size per group for between-subjects designs
Ignoring the practical constraints of your research context

Power analysis forces you to think carefully about what you’re trying to detect and whether your study can realistically detect it. That clarity alone makes it worth the effort, even before you collect a single data point.