How to Perform Dunnett's Test in Python
When you run an experiment with multiple treatment groups and a control, you need a statistical test that answers a specific question: 'Which treatments differ significantly from the control?'...
Key Insights
- Dunnett’s test is specifically designed for comparing multiple treatment groups against a single control, making it more powerful than Tukey’s HSD when you only care about control comparisons.
- SciPy 1.9+ includes
scipy.stats.dunnett(), which provides both p-values and confidence intervals with proper family-wise error rate control. - Always verify your control group is correctly specified—a common mistake is accidentally comparing treatments to each other instead of to the control.
Introduction to Dunnett’s Test
When you run an experiment with multiple treatment groups and a control, you need a statistical test that answers a specific question: “Which treatments differ significantly from the control?” Dunnett’s test does exactly this, and nothing more.
Consider a clinical trial testing three drug dosages against a placebo. You don’t care whether the 10mg dose differs from the 20mg dose—you care whether each dose outperforms the placebo. Running all pairwise comparisons with Tukey’s HSD wastes statistical power on comparisons you don’t need.
Dunnett’s test was designed in 1955 for precisely this scenario. It’s the right choice when:
- You have one designated control group
- You want to compare each treatment to the control (not to each other)
- You need to control the family-wise error rate across these comparisons
Common applications include pharmaceutical dose-response studies, A/B testing with a baseline variant, agricultural experiments comparing fertilizers to untreated plots, and manufacturing quality control comparing process changes to standard procedures.
Statistical Background
Before running any statistical test, you should understand its assumptions and mechanics.
Assumptions:
- Independence: Observations within and between groups are independent
- Normality: Data in each group follows a normal distribution (or sample sizes are large enough for CLT)
- Homogeneity of variance: All groups have equal population variances
These are the same assumptions as one-way ANOVA, which you should run first to confirm an overall effect exists.
Family-wise Error Rate Control:
When you make multiple comparisons, the probability of at least one false positive increases. With three treatment-vs-control comparisons at α = 0.05, your actual error rate could reach 1 - (0.95)³ ≈ 0.14 if you used uncorrected t-tests.
Dunnett’s test controls this by using a multivariate t-distribution that accounts for the correlation structure between comparisons. This makes it more powerful than Bonferroni correction while maintaining the desired error rate.
The Test Statistic:
For each treatment group i, the Dunnett statistic is:
t_i = (X̄_i - X̄_control) / (s_pooled * √(1/n_i + 1/n_control))
Where s_pooled is the pooled standard deviation from all groups. The critical values come from the Dunnett distribution, which depends on the number of treatment groups and total degrees of freedom.
Setting Up Your Python Environment
You’ll need SciPy 1.9 or later for the built-in Dunnett’s test. For alternatives and additional functionality, scikit-posthocs is useful.
# Core requirements
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
# Optional: for alternative implementation
# pip install scikit-posthocs
import scikit_posthocs as sp
# Check your SciPy version
print(f"SciPy version: {stats.__version__ if hasattr(stats, '__version__') else 'unknown'}")
Let’s create a realistic dataset simulating a drug efficacy study. We have a placebo control and three dosage levels, measuring some continuous outcome like pain reduction score.
# Set seed for reproducibility
np.random.seed(42)
# Simulate drug efficacy study
# Control (placebo) and three dosage levels
n_per_group = 25
control = np.random.normal(loc=5.0, scale=2.0, size=n_per_group)
dose_10mg = np.random.normal(loc=6.5, scale=2.1, size=n_per_group)
dose_20mg = np.random.normal(loc=7.8, scale=1.9, size=n_per_group)
dose_40mg = np.random.normal(loc=8.2, scale=2.2, size=n_per_group)
# Create DataFrame for easier handling
df = pd.DataFrame({
'score': np.concatenate([control, dose_10mg, dose_20mg, dose_40mg]),
'group': ['Control']*n_per_group + ['10mg']*n_per_group +
['20mg']*n_per_group + ['40mg']*n_per_group
})
print(df.groupby('group')['score'].agg(['mean', 'std', 'count']))
Performing Dunnett’s Test with scipy.stats
The scipy.stats.dunnett() function takes the control group as the first argument, followed by treatment groups. This ordering matters—get it wrong and you’ll compare treatments to the wrong baseline.
# Run Dunnett's test
# IMPORTANT: Control group comes FIRST, then treatment groups
result = stats.dunnett(control, dose_10mg, dose_20mg, dose_40mg)
print("Dunnett's Test Results")
print("=" * 50)
print(f"Test statistic(s): {result.statistic}")
print(f"P-value(s): {result.pvalue}")
The output gives you one statistic and p-value per treatment group. Let’s format this more clearly:
# Better formatted output
treatment_names = ['10mg vs Control', '20mg vs Control', '40mg vs Control']
print("\nDunnett's Test: Treatment vs Control Comparisons")
print("-" * 55)
print(f"{'Comparison':<25} {'Statistic':>12} {'P-value':>12}")
print("-" * 55)
for name, stat, pval in zip(treatment_names, result.statistic, result.pvalue):
sig = "***" if pval < 0.001 else "**" if pval < 0.01 else "*" if pval < 0.05 else ""
print(f"{name:<25} {stat:>12.4f} {pval:>12.4f} {sig}")
print("-" * 55)
print("Significance: * p<0.05, ** p<0.01, *** p<0.001")
You can also get confidence intervals for the mean differences:
# Get confidence intervals
ci = result.confidence_interval(confidence_level=0.95)
print("\n95% Confidence Intervals for Mean Differences")
print("-" * 55)
print(f"{'Comparison':<25} {'Lower':>12} {'Upper':>12}")
print("-" * 55)
for name, low, high in zip(treatment_names, ci.low, ci.high):
print(f"{name:<25} {low:>12.4f} {high:>12.4f}")
If the confidence interval excludes zero, the difference is statistically significant at your chosen alpha level.
Alternative: Using scikit-posthocs
The scikit-posthocs library offers an alternative implementation that works well with pandas DataFrames and provides a familiar interface if you’re already using it for other post-hoc tests.
import scikit_posthocs as sp
# Dunnett's test with DataFrame input
# Note: You must specify the control group name
dunnett_results = sp.posthoc_dunnett(
df,
val_col='score',
group_col='group',
control='Control'
)
print("Dunnett's Test Results (scikit-posthocs)")
print(dunnett_results)
This returns a DataFrame with p-values, which integrates nicely into pandas-based workflows. The library also provides posthoc_dunnett_test() for array inputs if you prefer that interface.
One advantage of scikit-posthocs is consistency—if you’re already using it for Tukey, Dunn, or other post-hoc tests, the API is familiar. However, for Dunnett’s test specifically, the SciPy implementation is more complete since it provides confidence intervals directly.
Visualizing Results
A good visualization communicates your findings faster than a table of numbers. Here’s how to create a comparison plot with significance markers:
def plot_dunnett_results(df, group_col, value_col, control_name, result):
"""Create a bar chart with Dunnett's test significance markers."""
# Calculate group statistics
group_stats = df.groupby(group_col)[value_col].agg(['mean', 'std', 'count'])
group_stats['se'] = group_stats['std'] / np.sqrt(group_stats['count'])
# Reorder so control is first
groups = [control_name] + [g for g in group_stats.index if g != control_name]
group_stats = group_stats.reindex(groups)
# Create plot
fig, ax = plt.subplots(figsize=(10, 6))
x = np.arange(len(groups))
colors = ['#808080'] + ['#4C72B0'] * (len(groups) - 1) # Gray for control
bars = ax.bar(x, group_stats['mean'], yerr=group_stats['se'] * 1.96,
capsize=5, color=colors, edgecolor='black', linewidth=1.2)
# Add significance markers
control_mean = group_stats.loc[control_name, 'mean']
y_max = (group_stats['mean'] + group_stats['se'] * 1.96).max()
for i, (pval, group) in enumerate(zip(result.pvalue, groups[1:])):
if pval < 0.05:
marker = '***' if pval < 0.001 else '**' if pval < 0.01 else '*'
treatment_mean = group_stats.loc[group, 'mean']
y_pos = y_max + 0.5 + i * 0.8
# Draw bracket
ax.plot([0, 0, i+1, i+1],
[y_pos - 0.2, y_pos, y_pos, y_pos - 0.2],
'k-', linewidth=1)
ax.text((i+1) / 2, y_pos + 0.1, marker, ha='center', fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(groups)
ax.set_ylabel('Score (mean ± 95% CI)')
ax.set_xlabel('Treatment Group')
ax.set_title("Drug Efficacy Study: Dunnett's Test Results")
ax.axhline(y=control_mean, color='gray', linestyle='--', alpha=0.5, label='Control mean')
ax.legend()
plt.tight_layout()
return fig, ax
# Create the visualization
fig, ax = plot_dunnett_results(df, 'group', 'score', 'Control', result)
plt.savefig('dunnett_results.png', dpi=150, bbox_inches='tight')
plt.show()
Practical Considerations and Common Pitfalls
Choosing the Correct Control Group
The most common mistake is misspecifying the control. In scipy.stats.dunnett(), the control must be the first positional argument. Double-check by printing group means before running the test:
# Always verify your groups before testing
print("Group means (control should match your expectations):")
for name, data in [('Control', control), ('10mg', dose_10mg),
('20mg', dose_20mg), ('40mg', dose_40mg)]:
print(f" {name}: {data.mean():.2f}")
Handling Unequal Sample Sizes
Dunnett’s test handles unequal sample sizes, but power decreases when groups are imbalanced. If your control group is much smaller than treatment groups, consider whether your design is optimal:
# Check for balance
group_sizes = df.groupby('group').size()
print(f"Sample sizes: {group_sizes.to_dict()}")
if group_sizes.std() / group_sizes.mean() > 0.2:
print("Warning: Substantial imbalance detected. Consider implications for power.")
When Assumptions Are Violated
If normality is questionable, check with Shapiro-Wilk:
# Test normality for each group
for group in df['group'].unique():
data = df[df['group'] == group]['score']
stat, p = stats.shapiro(data)
print(f"{group}: Shapiro-Wilk p = {p:.4f}")
For non-normal data with similar distributions, consider the Kruskal-Wallis test followed by Dunn’s test with Bonferroni correction. For heterogeneous variances, Welch’s ANOVA followed by Games-Howell might be more appropriate, though you lose the specific control-comparison focus.
Don’t Forget the Preliminary ANOVA
Running Dunnett’s test without first confirming an overall effect is poor practice. Always start with ANOVA:
# One-way ANOVA first
f_stat, anova_p = stats.f_oneway(control, dose_10mg, dose_20mg, dose_40mg)
print(f"ANOVA: F = {f_stat:.2f}, p = {anova_p:.4f}")
if anova_p >= 0.05:
print("No significant overall effect. Post-hoc tests may not be appropriate.")
Dunnett’s test is a focused, powerful tool when your research question involves comparing treatments to a control. Use it instead of broader post-hoc tests when that’s genuinely your question—you’ll get more statistical power and clearer answers.