How to Perform Dunnett's Test in Python

Key Insights

Dunnett’s test is specifically designed for comparing multiple treatment groups against a single control, making it more powerful than Tukey’s HSD when you only care about control comparisons.
SciPy 1.9+ includes scipy.stats.dunnett(), which provides both p-values and confidence intervals with proper family-wise error rate control.
Always verify your control group is correctly specified—a common mistake is accidentally comparing treatments to each other instead of to the control.

Introduction to Dunnett’s Test

When you run an experiment with multiple treatment groups and a control, you need a statistical test that answers a specific question: “Which treatments differ significantly from the control?” Dunnett’s test does exactly this, and nothing more.

Consider a clinical trial testing three drug dosages against a placebo. You don’t care whether the 10mg dose differs from the 20mg dose—you care whether each dose outperforms the placebo. Running all pairwise comparisons with Tukey’s HSD wastes statistical power on comparisons you don’t need.

Dunnett’s test was designed in 1955 for precisely this scenario. It’s the right choice when:

You have one designated control group
You want to compare each treatment to the control (not to each other)
You need to control the family-wise error rate across these comparisons

Common applications include pharmaceutical dose-response studies, A/B testing with a baseline variant, agricultural experiments comparing fertilizers to untreated plots, and manufacturing quality control comparing process changes to standard procedures.

Statistical Background

Before running any statistical test, you should understand its assumptions and mechanics.

Assumptions:

Independence: Observations within and between groups are independent
Normality: Data in each group follows a normal distribution (or sample sizes are large enough for CLT)
Homogeneity of variance: All groups have equal population variances

These are the same assumptions as one-way ANOVA, which you should run first to confirm an overall effect exists.

Family-wise Error Rate Control:

When you make multiple comparisons, the probability of at least one false positive increases. With three treatment-vs-control comparisons at α = 0.05, your actual error rate could reach 1 - (0.95)³ ≈ 0.14 if you used uncorrected t-tests.

Dunnett’s test controls this by using a multivariate t-distribution that accounts for the correlation structure between comparisons. This makes it more powerful than Bonferroni correction while maintaining the desired error rate.

The Test Statistic:

For each treatment group i, the Dunnett statistic is:

t_i = (X̄_i - X̄_control) / (s_pooled * √(1/n_i + 1/n_control))

Where s_pooled is the pooled standard deviation from all groups. The critical values come from the Dunnett distribution, which depends on the number of treatment groups and total degrees of freedom.

Setting Up Your Python Environment

You’ll need SciPy 1.9 or later for the built-in Dunnett’s test. For alternatives and additional functionality, scikit-posthocs is useful.

# Core requirements
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt

# Optional: for alternative implementation
# pip install scikit-posthocs
import scikit_posthocs as sp

# Check your SciPy version
print(f"SciPy version: {stats.__version__ if hasattr(stats, '__version__') else 'unknown'}")

Let’s create a realistic dataset simulating a drug efficacy study. We have a placebo control and three dosage levels, measuring some continuous outcome like pain reduction score.

# Set seed for reproducibility
np.random.seed(42)

# Simulate drug efficacy study
# Control (placebo) and three dosage levels
n_per_group = 25

control = np.random.normal(loc=5.0, scale=2.0, size=n_per_group)
dose_10mg = np.random.normal(loc=6.5, scale=2.1, size=n_per_group)
dose_20mg = np.random.normal(loc=7.8, scale=1.9, size=n_per_group)
dose_40mg = np.random.normal(loc=8.2, scale=2.2, size=n_per_group)

# Create DataFrame for easier handling
df = pd.DataFrame({
    'score': np.concatenate([control, dose_10mg, dose_20mg, dose_40mg]),
    'group': ['Control']*n_per_group + ['10mg']*n_per_group + 
             ['20mg']*n_per_group + ['40mg']*n_per_group
})

print(df.groupby('group')['score'].agg(['mean', 'std', 'count']))

Performing Dunnett’s Test with scipy.stats

The scipy.stats.dunnett() function takes the control group as the first argument, followed by treatment groups. This ordering matters—get it wrong and you’ll compare treatments to the wrong baseline.

# Run Dunnett's test
# IMPORTANT: Control group comes FIRST, then treatment groups
result = stats.dunnett(control, dose_10mg, dose_20mg, dose_40mg)

print("Dunnett's Test Results")
print("=" * 50)
print(f"Test statistic(s): {result.statistic}")
print(f"P-value(s): {result.pvalue}")

The output gives you one statistic and p-value per treatment group. Let’s format this more clearly:

# Better formatted output
treatment_names = ['10mg vs Control', '20mg vs Control', '40mg vs Control']

print("\nDunnett's Test: Treatment vs Control Comparisons")
print("-" * 55)
print(f"{'Comparison':<25} {'Statistic':>12} {'P-value':>12}")
print("-" * 55)

for name, stat, pval in zip(treatment_names, result.statistic, result.pvalue):
    sig = "***" if pval < 0.001 else "**" if pval < 0.01 else "*" if pval < 0.05 else ""
    print(f"{name:<25} {stat:>12.4f} {pval:>12.4f} {sig}")

print("-" * 55)
print("Significance: * p<0.05, ** p<0.01, *** p<0.001")

You can also get confidence intervals for the mean differences:

# Get confidence intervals
ci = result.confidence_interval(confidence_level=0.95)

print("\n95% Confidence Intervals for Mean Differences")
print("-" * 55)
print(f"{'Comparison':<25} {'Lower':>12} {'Upper':>12}")
print("-" * 55)

for name, low, high in zip(treatment_names, ci.low, ci.high):
    print(f"{name:<25} {low:>12.4f} {high:>12.4f}")

If the confidence interval excludes zero, the difference is statistically significant at your chosen alpha level.

Alternative: Using scikit-posthocs

The scikit-posthocs library offers an alternative implementation that works well with pandas DataFrames and provides a familiar interface if you’re already using it for other post-hoc tests.

import scikit_posthocs as sp

# Dunnett's test with DataFrame input
# Note: You must specify the control group name
dunnett_results = sp.posthoc_dunnett(
    df, 
    val_col='score', 
    group_col='group', 
    control='Control'
)

print("Dunnett's Test Results (scikit-posthocs)")
print(dunnett_results)

This returns a DataFrame with p-values, which integrates nicely into pandas-based workflows. The library also provides posthoc_dunnett_test() for array inputs if you prefer that interface.

One advantage of scikit-posthocs is consistency—if you’re already using it for Tukey, Dunn, or other post-hoc tests, the API is familiar. However, for Dunnett’s test specifically, the SciPy implementation is more complete since it provides confidence intervals directly.

Visualizing Results

A good visualization communicates your findings faster than a table of numbers. Here’s how to create a comparison plot with significance markers:

def plot_dunnett_results(df, group_col, value_col, control_name, result):
    """Create a bar chart with Dunnett's test significance markers."""
    
    # Calculate group statistics
    group_stats = df.groupby(group_col)[value_col].agg(['mean', 'std', 'count'])
    group_stats['se'] = group_stats['std'] / np.sqrt(group_stats['count'])
    
    # Reorder so control is first
    groups = [control_name] + [g for g in group_stats.index if g != control_name]
    group_stats = group_stats.reindex(groups)
    
    # Create plot
    fig, ax = plt.subplots(figsize=(10, 6))
    
    x = np.arange(len(groups))
    colors = ['#808080'] + ['#4C72B0'] * (len(groups) - 1)  # Gray for control
    
    bars = ax.bar(x, group_stats['mean'], yerr=group_stats['se'] * 1.96,
                  capsize=5, color=colors, edgecolor='black', linewidth=1.2)
    
    # Add significance markers
    control_mean = group_stats.loc[control_name, 'mean']
    y_max = (group_stats['mean'] + group_stats['se'] * 1.96).max()
    
    for i, (pval, group) in enumerate(zip(result.pvalue, groups[1:])):
        if pval < 0.05:
            marker = '***' if pval < 0.001 else '**' if pval < 0.01 else '*'
            treatment_mean = group_stats.loc[group, 'mean']
            y_pos = y_max + 0.5 + i * 0.8
            
            # Draw bracket
            ax.plot([0, 0, i+1, i+1], 
                   [y_pos - 0.2, y_pos, y_pos, y_pos - 0.2], 
                   'k-', linewidth=1)
            ax.text((i+1) / 2, y_pos + 0.1, marker, ha='center', fontsize=12)
    
    ax.set_xticks(x)
    ax.set_xticklabels(groups)
    ax.set_ylabel('Score (mean ± 95% CI)')
    ax.set_xlabel('Treatment Group')
    ax.set_title("Drug Efficacy Study: Dunnett's Test Results")
    ax.axhline(y=control_mean, color='gray', linestyle='--', alpha=0.5, label='Control mean')
    ax.legend()
    
    plt.tight_layout()
    return fig, ax

# Create the visualization
fig, ax = plot_dunnett_results(df, 'group', 'score', 'Control', result)
plt.savefig('dunnett_results.png', dpi=150, bbox_inches='tight')
plt.show()

Practical Considerations and Common Pitfalls

Choosing the Correct Control Group

The most common mistake is misspecifying the control. In scipy.stats.dunnett(), the control must be the first positional argument. Double-check by printing group means before running the test:

# Always verify your groups before testing
print("Group means (control should match your expectations):")
for name, data in [('Control', control), ('10mg', dose_10mg), 
                    ('20mg', dose_20mg), ('40mg', dose_40mg)]:
    print(f"  {name}: {data.mean():.2f}")

Handling Unequal Sample Sizes

Dunnett’s test handles unequal sample sizes, but power decreases when groups are imbalanced. If your control group is much smaller than treatment groups, consider whether your design is optimal:

# Check for balance
group_sizes = df.groupby('group').size()
print(f"Sample sizes: {group_sizes.to_dict()}")

if group_sizes.std() / group_sizes.mean() > 0.2:
    print("Warning: Substantial imbalance detected. Consider implications for power.")

When Assumptions Are Violated

If normality is questionable, check with Shapiro-Wilk:

# Test normality for each group
for group in df['group'].unique():
    data = df[df['group'] == group]['score']
    stat, p = stats.shapiro(data)
    print(f"{group}: Shapiro-Wilk p = {p:.4f}")

For non-normal data with similar distributions, consider the Kruskal-Wallis test followed by Dunn’s test with Bonferroni correction. For heterogeneous variances, Welch’s ANOVA followed by Games-Howell might be more appropriate, though you lose the specific control-comparison focus.

Don’t Forget the Preliminary ANOVA

Running Dunnett’s test without first confirming an overall effect is poor practice. Always start with ANOVA:

# One-way ANOVA first
f_stat, anova_p = stats.f_oneway(control, dose_10mg, dose_20mg, dose_40mg)
print(f"ANOVA: F = {f_stat:.2f}, p = {anova_p:.4f}")

if anova_p >= 0.05:
    print("No significant overall effect. Post-hoc tests may not be appropriate.")

Dunnett’s test is a focused, powerful tool when your research question involves comparing treatments to a control. Use it instead of broader post-hoc tests when that’s genuinely your question—you’ll get more statistical power and clearer answers.