How to Perform a One-Way ANOVA in Python

Key Insights

One-way ANOVA determines whether three or more group means are statistically different, but it won’t tell you which groups differ—you need post-hoc tests for that.
Always check your assumptions (normality and homogeneity of variance) before running ANOVA; violations can invalidate your results or push you toward alternative tests.
Python’s scipy.stats provides quick ANOVA calculations, while statsmodels offers richer output including effect sizes and integrated post-hoc analysis.

Introduction to One-Way ANOVA

One-way Analysis of Variance (ANOVA) answers a straightforward question: do the means of three or more independent groups differ significantly? While a t-test compares two groups, ANOVA extends this logic to multiple groups without inflating your Type I error rate through repeated pairwise comparisons.

Consider a practical scenario: you’re an education researcher comparing test scores across three teaching methods—traditional lecture, flipped classroom, and project-based learning. Running three separate t-tests (lecture vs. flipped, lecture vs. project-based, flipped vs. project-based) would increase your chance of a false positive. ANOVA handles this in a single test.

The test works by comparing variance between groups to variance within groups. If the between-group variance is substantially larger than the within-group variance, you have evidence that at least one group mean differs from the others.

Before running ANOVA, you must satisfy three key assumptions:

Independence: Observations are independent of each other
Normality: The dependent variable is approximately normally distributed within each group
Homogeneity of variance: The variance is roughly equal across all groups

Violating these assumptions doesn’t necessarily doom your analysis, but it does affect how you interpret results and may push you toward alternative tests.

Setting Up Your Environment

You’ll need four core libraries for a complete ANOVA workflow. Install them if you haven’t already:

pip install scipy statsmodels pandas numpy matplotlib seaborn

Now let’s set up our environment and create a realistic dataset:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.multicomp import pairwise_tukeyhsd

# Set random seed for reproducibility
np.random.seed(42)

# Create sample dataset: test scores across three teaching methods
n_per_group = 30

traditional = np.random.normal(loc=72, scale=10, size=n_per_group)
flipped = np.random.normal(loc=78, scale=12, size=n_per_group)
project_based = np.random.normal(loc=82, scale=11, size=n_per_group)

# Combine into a DataFrame
df = pd.DataFrame({
    'score': np.concatenate([traditional, flipped, project_based]),
    'method': ['Traditional'] * n_per_group + 
              ['Flipped'] * n_per_group + 
              ['Project-Based'] * n_per_group
})

print(df.head())
print(f"\nTotal observations: {len(df)}")

This creates a dataset with 90 students—30 per teaching method—with intentionally different population means to demonstrate the ANOVA workflow.

Preparing and Exploring Your Data

Before any statistical test, explore your data. Calculate descriptive statistics and visualize the distributions:

# Descriptive statistics by group
desc_stats = df.groupby('method')['score'].agg([
    'count', 'mean', 'std', 'min', 'max'
]).round(2)
print("Descriptive Statistics by Teaching Method:")
print(desc_stats)

# Calculate 95% confidence intervals
def ci_95(x):
    mean = x.mean()
    se = x.std() / np.sqrt(len(x))
    return mean - 1.96 * se, mean + 1.96 * se

for method in df['method'].unique():
    group_data = df[df['method'] == method]['score']
    ci = ci_95(group_data)
    print(f"{method}: 95% CI = ({ci[0]:.2f}, {ci[1]:.2f})")

Visualize the distributions with a boxplot:

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Boxplot
sns.boxplot(x='method', y='score', data=df, ax=axes[0], palette='Set2')
axes[0].set_title('Test Scores by Teaching Method')
axes[0].set_xlabel('Teaching Method')
axes[0].set_ylabel('Test Score')

# Violin plot for distribution shape
sns.violinplot(x='method', y='score', data=df, ax=axes[1], palette='Set2')
axes[1].set_title('Score Distribution by Teaching Method')
axes[1].set_xlabel('Teaching Method')
axes[1].set_ylabel('Test Score')

plt.tight_layout()
plt.savefig('anova_exploration.png', dpi=150)
plt.show()

Look for obvious differences in central tendency and spread. Overlapping distributions don’t preclude significant differences—that’s what the statistical test determines.

Checking ANOVA Assumptions

Never skip assumption checking. Here’s how to test both normality and homogeneity of variance:

# Test normality with Shapiro-Wilk for each group
print("Shapiro-Wilk Normality Test (H0: data is normally distributed):")
print("-" * 60)

for method in df['method'].unique():
    group_data = df[df['method'] == method]['score']
    stat, p_value = stats.shapiro(group_data)
    normality = "Normal" if p_value > 0.05 else "Non-normal"
    print(f"{method}: W = {stat:.4f}, p = {p_value:.4f} -> {normality}")

# Test homogeneity of variance with Levene's test
traditional_scores = df[df['method'] == 'Traditional']['score']
flipped_scores = df[df['method'] == 'Flipped']['score']
project_scores = df[df['method'] == 'Project-Based']['score']

levene_stat, levene_p = stats.levene(
    traditional_scores, 
    flipped_scores, 
    project_scores
)

print(f"\nLevene's Test for Homogeneity of Variance:")
print(f"Statistic = {levene_stat:.4f}, p = {levene_p:.4f}")

if levene_p > 0.05:
    print("Variances are homogeneous (assumption met)")
else:
    print("Variances are NOT homogeneous (consider Welch's ANOVA)")

Interpretation guidelines:

Shapiro-Wilk: p > 0.05 suggests normality. ANOVA is robust to mild violations, especially with larger samples (n > 30 per group).
Levene’s test: p > 0.05 suggests equal variances. If violated, use Welch’s ANOVA instead.

Performing the One-Way ANOVA

With assumptions checked, run the ANOVA using two approaches:

Quick Method: scipy.stats

# Scipy's f_oneway - quick and simple
f_stat, p_value = stats.f_oneway(
    traditional_scores,
    flipped_scores,
    project_scores
)

print("One-Way ANOVA Results (scipy.stats.f_oneway)")
print("=" * 50)
print(f"F-statistic: {f_stat:.4f}")
print(f"p-value: {p_value:.6f}")

alpha = 0.05
if p_value < alpha:
    print(f"\nResult: Reject H0 at α = {alpha}")
    print("At least one group mean is significantly different.")
else:
    print(f"\nResult: Fail to reject H0 at α = {alpha}")
    print("No significant difference between group means.")

Detailed Method: statsmodels

For publication-quality output with effect sizes:

# Statsmodels approach - more detailed output
model = ols('score ~ C(method)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print("\nOne-Way ANOVA Table (statsmodels)")
print("=" * 60)
print(anova_table.round(4))

# Calculate effect size (eta-squared)
ss_between = anova_table['sum_sq']['C(method)']
ss_total = anova_table['sum_sq'].sum()
eta_squared = ss_between / ss_total

print(f"\nEffect Size (η²): {eta_squared:.4f}")

# Interpret effect size
if eta_squared < 0.01:
    effect = "negligible"
elif eta_squared < 0.06:
    effect = "small"
elif eta_squared < 0.14:
    effect = "medium"
else:
    effect = "large"
    
print(f"Interpretation: {effect} effect")

The F-statistic is the ratio of between-group variance to within-group variance. A larger F indicates greater differences between groups relative to differences within groups. The p-value tells you the probability of observing such an F-statistic if the null hypothesis (all means are equal) were true.

Post-Hoc Analysis with Tukey’s HSD

A significant ANOVA result tells you that at least one group differs, but not which groups. Tukey’s Honestly Significant Difference (HSD) test performs all pairwise comparisons while controlling the family-wise error rate:

# Tukey's HSD post-hoc test
tukey_results = pairwise_tukeyhsd(
    endog=df['score'],
    groups=df['method'],
    alpha=0.05
)

print("Tukey's HSD Post-Hoc Test")
print("=" * 70)
print(tukey_results)

# Visualize pairwise comparisons
fig, ax = plt.subplots(figsize=(8, 5))
tukey_results.plot_simultaneous(ax=ax)
ax.set_title("Tukey's HSD: 95% Confidence Intervals")
ax.set_xlabel('Mean Score Difference')
plt.tight_layout()
plt.savefig('tukey_hsd_results.png', dpi=150)
plt.show()

# Extract specific comparisons
print("\nDetailed Pairwise Comparisons:")
print("-" * 70)
for i, row in enumerate(tukey_results.summary().data[1:]):
    group1, group2, meandiff, p_adj, lower, upper, reject = row
    sig = "***" if reject else ""
    print(f"{group1} vs {group2}: diff = {meandiff:.2f}, "
          f"p = {p_adj:.4f}, 95% CI = [{lower:.2f}, {upper:.2f}] {sig}")

The reject column indicates whether each pairwise comparison is statistically significant. Confidence intervals that don’t cross zero indicate significant differences.

Conclusion and Best Practices

Here’s the complete ANOVA workflow:

Explore data: Calculate descriptive statistics and visualize distributions
Check assumptions: Test normality (Shapiro-Wilk) and homogeneity of variance (Levene’s)
Run ANOVA: Use scipy.stats.f_oneway() for quick results or statsmodels for detailed output
Post-hoc testing: If significant, use Tukey’s HSD to identify which groups differ
Report effect size: Always include eta-squared alongside p-values

When assumptions fail, consider these alternatives:

Non-normal data: Use Kruskal-Wallis test (scipy.stats.kruskal())
Unequal variances: Use Welch’s ANOVA (scipy.stats.alexandergovern() or the pingouin library)
Repeated measures: Use repeated measures ANOVA via statsmodels or pingouin

# Kruskal-Wallis alternative for non-normal data
h_stat, kw_p = stats.kruskal(
    traditional_scores, 
    flipped_scores, 
    project_scores
)
print(f"Kruskal-Wallis: H = {h_stat:.4f}, p = {kw_p:.4f}")

Report your results completely: “A one-way ANOVA revealed a statistically significant difference in test scores across teaching methods, F(2, 87) = 7.23, p = .001, η² = .14. Tukey’s HSD post-hoc tests indicated that project-based learning (M = 82.1, SD = 11.2) produced significantly higher scores than traditional instruction (M = 72.3, SD = 10.1), p = .002.”

ANOVA is a workhorse of statistical analysis. Master this workflow, and you’ll handle the majority of group comparison scenarios you encounter in practice.