How to Use scipy.stats.f_oneway in Python

Key Insights

scipy.stats.f_oneway performs one-way ANOVA to test whether three or more groups have the same population mean, returning an F-statistic and p-value that tell you if differences exist but not where they are.
Always validate ANOVA assumptions (normality and equal variances) before trusting your results—use Shapiro-Wilk and Levene’s tests, and fall back to Kruskal-Wallis when assumptions fail.
A significant ANOVA result is just the beginning; you need post-hoc tests like Tukey’s HSD to identify which specific groups differ from each other.

Introduction to One-Way ANOVA

One-way ANOVA (Analysis of Variance) answers a simple question: do three or more groups have different means? While a t-test compares two groups, ANOVA scales to any number of groups without inflating your Type I error rate.

Consider these scenarios where one-way ANOVA shines:

Comparing conversion rates across four marketing campaigns
Testing whether three manufacturing processes produce different defect rates
Evaluating student performance across multiple teaching methods

The scipy.stats.f_oneway function is Python’s standard tool for this analysis. It’s fast, requires minimal setup, and integrates seamlessly with NumPy arrays. Unlike more complex statistical packages, it does one thing well: compute the F-statistic and p-value for your group comparison.

Function Syntax and Parameters

The function signature is straightforward:

from scipy.stats import f_oneway

result = f_oneway(sample1, sample2, sample3, ...)

Parameters:

sample1, sample2, ...: Array-like objects containing the sample data for each group. You can pass any number of groups (minimum two, but use a t-test for two groups).

Returns:

statistic: The computed F-statistic
pvalue: The p-value for the test

The function accepts arrays of different lengths, which is useful for unbalanced designs. Here’s the basic structure:

import numpy as np
from scipy.stats import f_oneway

# Three groups with different sample sizes
group_a = np.array([23, 25, 28, 22, 24])
group_b = np.array([30, 32, 29, 31, 33, 28])
group_c = np.array([18, 20, 22, 19])

result = f_oneway(group_a, group_b, group_c)

print(f"F-statistic: {result.statistic:.4f}")
print(f"P-value: {result.pvalue:.4f}")

Basic Usage Example

Let’s walk through a complete example comparing test scores across three different study methods:

import numpy as np
from scipy.stats import f_oneway

# Test scores for students using different study methods
traditional = np.array([72, 75, 78, 71, 73, 76, 74, 77])
flashcards = np.array([78, 82, 80, 85, 79, 83, 81, 84])
practice_tests = np.array([85, 88, 90, 87, 86, 89, 91, 88])

# Run one-way ANOVA
f_stat, p_value = f_oneway(traditional, flashcards, practice_tests)

print(f"F-statistic: {f_stat:.4f}")
print(f"P-value: {p_value:.6f}")

# Interpret results
alpha = 0.05
if p_value < alpha:
    print(f"\nReject null hypothesis (p < {alpha})")
    print("At least one study method produces significantly different scores.")
else:
    print(f"\nFail to reject null hypothesis (p >= {alpha})")
    print("No significant difference between study methods.")

Output:

F-statistic: 45.2308
P-value: 0.000000

Reject null hypothesis (p < 0.05)
At least one study method produces significantly different scores.

Interpreting the results:

The F-statistic (45.23) represents the ratio of between-group variance to within-group variance. A larger F-statistic indicates greater differences between groups relative to the variation within groups.

The p-value tells you the probability of observing this F-statistic (or more extreme) if the null hypothesis were true. Here, it’s essentially zero, providing strong evidence that the study methods produce different outcomes.

Assumptions and Data Validation

ANOVA results are only valid when three assumptions hold. Ignoring these can lead to false conclusions.

1. Independence: Observations must be independent. This is a study design issue—you can’t test for it statistically.

2. Normality: Each group should be approximately normally distributed.

3. Homogeneity of variances: Groups should have similar variances (homoscedasticity).

Here’s how to test assumptions 2 and 3:

import numpy as np
from scipy.stats import f_oneway, shapiro, levene

# Sample data
group_a = np.array([72, 75, 78, 71, 73, 76, 74, 77, 70, 79])
group_b = np.array([78, 82, 80, 85, 79, 83, 81, 84, 77, 86])
group_c = np.array([85, 88, 90, 87, 86, 89, 91, 88, 84, 92])

# Test normality with Shapiro-Wilk (H0: data is normally distributed)
print("Normality Tests (Shapiro-Wilk):")
print("-" * 40)
for name, group in [("Group A", group_a), ("Group B", group_b), ("Group C", group_c)]:
    stat, p = shapiro(group)
    normality = "Normal" if p > 0.05 else "NOT Normal"
    print(f"{name}: W={stat:.4f}, p={p:.4f} -> {normality}")

# Test homogeneity of variances with Levene's test
# (H0: all groups have equal variances)
print("\nHomogeneity of Variances (Levene's Test):")
print("-" * 40)
lev_stat, lev_p = levene(group_a, group_b, group_c)
equal_var = "Equal variances" if lev_p > 0.05 else "Unequal variances"
print(f"W={lev_stat:.4f}, p={lev_p:.4f} -> {equal_var}")

# Only proceed with ANOVA if assumptions are met
if lev_p > 0.05:
    f_stat, p_value = f_oneway(group_a, group_b, group_c)
    print(f"\nANOVA Results: F={f_stat:.4f}, p={p_value:.6f}")
else:
    print("\nConsider using Kruskal-Wallis test instead.")

Real-World Application

Let’s apply ANOVA to a practical scenario: comparing customer satisfaction scores across three product versions during A/B/C testing.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import f_oneway, levene

# Simulated satisfaction scores (1-10 scale) from A/B/C test
np.random.seed(42)
version_a = np.random.normal(6.5, 1.2, 50)  # Control
version_b = np.random.normal(7.2, 1.1, 48)  # New UI
version_c = np.random.normal(7.8, 1.3, 52)  # New UI + Features

# Clip to valid range
version_a = np.clip(version_a, 1, 10)
version_b = np.clip(version_b, 1, 10)
version_c = np.clip(version_c, 1, 10)

# Check assumptions
lev_stat, lev_p = levene(version_a, version_b, version_c)
print(f"Levene's Test: p={lev_p:.4f}")

# Run ANOVA
f_stat, p_value = f_oneway(version_a, version_b, version_c)

print(f"\nA/B/C Test Results")
print("=" * 40)
print(f"Version A (Control):     Mean = {version_a.mean():.2f}, n = {len(version_a)}")
print(f"Version B (New UI):      Mean = {version_b.mean():.2f}, n = {len(version_b)}")
print(f"Version C (UI+Features): Mean = {version_c.mean():.2f}, n = {len(version_c)}")
print(f"\nF-statistic: {f_stat:.4f}")
print(f"P-value: {p_value:.6f}")

# Visualize with boxplot
fig, ax = plt.subplots(figsize=(8, 6))
data = [version_a, version_b, version_c]
labels = ['Version A\n(Control)', 'Version B\n(New UI)', 'Version C\n(UI+Features)']

bp = ax.boxplot(data, labels=labels, patch_artist=True)
colors = ['#ff9999', '#99ccff', '#99ff99']
for patch, color in zip(bp['boxes'], colors):
    patch.set_facecolor(color)

ax.set_ylabel('Satisfaction Score')
ax.set_title(f'A/B/C Test Results (F={f_stat:.2f}, p={p_value:.4f})')
ax.axhline(y=np.mean([g.mean() for g in data]), color='red', 
           linestyle='--', alpha=0.5, label='Grand Mean')
ax.legend()
plt.tight_layout()
plt.savefig('abc_test_results.png', dpi=150)
plt.show()

Post-Hoc Analysis

A significant ANOVA tells you that differences exist, but not where. Post-hoc tests identify which specific pairs of groups differ. Tukey’s Honestly Significant Difference (HSD) is the standard choice.

import numpy as np
from scipy.stats import f_oneway, tukey_hsd

# Sample data
traditional = np.array([72, 75, 78, 71, 73, 76, 74, 77, 70, 79])
flashcards = np.array([78, 82, 80, 85, 79, 83, 81, 84, 77, 86])
practice_tests = np.array([85, 88, 90, 87, 86, 89, 91, 88, 84, 92])

# First, confirm ANOVA is significant
f_stat, p_value = f_oneway(traditional, flashcards, practice_tests)
print(f"ANOVA: F={f_stat:.4f}, p={p_value:.6f}")

if p_value < 0.05:
    print("\nSignificant result - proceeding with Tukey's HSD\n")
    
    # Perform Tukey's HSD
    result = tukey_hsd(traditional, flashcards, practice_tests)
    print(result)
    
    # Access specific comparisons
    print("\nPairwise Comparisons:")
    print("-" * 50)
    groups = ['Traditional', 'Flashcards', 'Practice Tests']
    
    for i in range(3):
        for j in range(i + 1, 3):
            p = result.pvalue[i, j]
            sig = "*" if p < 0.05 else ""
            print(f"{groups[i]} vs {groups[j]}: p={p:.6f} {sig}")

The output shows which specific pairs have significantly different means, letting you make precise claims about your data.

Common Pitfalls and Best Practices

Unequal sample sizes: f_oneway handles unbalanced designs, but severely unequal groups reduce statistical power. Aim for similar sample sizes when possible.

Handling NaN values: The function doesn’t automatically handle missing data. Clean your data first:

import numpy as np
from scipy.stats import f_oneway

# Data with NaN values
group_a = np.array([1, 2, np.nan, 4, 5])
group_b = np.array([2, 3, 4, np.nan, 6])

# Remove NaN before analysis
group_a_clean = group_a[~np.isnan(group_a)]
group_b_clean = group_b[~np.isnan(group_b)]

f_stat, p_value = f_oneway(group_a_clean, group_b_clean)

When assumptions fail: Use the Kruskal-Wallis test, a non-parametric alternative:

from scipy.stats import kruskal

# Non-parametric alternative when normality/equal variance assumptions fail
h_stat, p_value = kruskal(group_a, group_b, group_c)
print(f"Kruskal-Wallis: H={h_stat:.4f}, p={p_value:.4f}")

Effect size matters: Statistical significance doesn’t imply practical significance. Calculate eta-squared to understand effect magnitude:

def eta_squared(groups):
    """Calculate eta-squared effect size for one-way ANOVA."""
    all_data = np.concatenate(groups)
    grand_mean = all_data.mean()
    
    ss_between = sum(len(g) * (g.mean() - grand_mean)**2 for g in groups)
    ss_total = sum((x - grand_mean)**2 for x in all_data)
    
    return ss_between / ss_total

groups = [traditional, flashcards, practice_tests]
eta_sq = eta_squared(groups)
print(f"Eta-squared: {eta_sq:.4f}")
# Interpretation: 0.01 = small, 0.06 = medium, 0.14 = large

Final recommendations:

Always visualize your data before running statistical tests
Check assumptions—don’t skip this step
Report effect sizes alongside p-values
Use post-hoc tests when ANOVA is significant
Consider the Welch ANOVA (scipy.stats.alexandergovern) when variances are unequal but normality holds

One-way ANOVA with scipy.stats.f_oneway is a powerful tool when used correctly. Validate your assumptions, interpret results in context, and follow up with appropriate post-hoc analyses to extract meaningful insights from your group comparisons.