Experiment Design for Data Scientists

Key Insights

Calculate sample size before running the experiment, not after
Randomization at the right unit (user vs session vs page view) determines validity
Pre-registration of hypotheses and metrics prevents p-hacking

Sample Size Calculation

from statsmodels.stats.power import TTestIndPower

analysis = TTestIndPower()
sample_size = analysis.solve_power(
    effect_size=0.05,  # minimum detectable effect
    alpha=0.05,        # significance level
    power=0.80,        # desired power
    ratio=1.0          # equal group sizes
)

Randomization Units

Choose the unit that prevents contamination:

User-level: Best for most product experiments
Session-level: Only if experiences don’t persist
Cluster-level: When users interact (marketplaces, social networks)

Guard Rails

Set guardrail metrics before launch. If key business metrics degrade beyond thresholds, the experiment stops automatically regardless of the primary metric result.

Sample Size Calculation

Randomization Units

Guard Rails

Liked this? There's more.

Similar Articles