How to Perform the Bartlett Test in Python

Key Insights

Bartlett’s test checks whether multiple groups have equal variances (homoscedasticity), a critical assumption for ANOVA and other parametric tests—but it’s highly sensitive to non-normal data, so verify normality first.
Python’s scipy.stats.bartlett() function makes implementation straightforward: pass your group arrays and interpret the p-value against your significance level (typically 0.05).
When your data isn’t normally distributed, skip Bartlett’s test entirely and use Levene’s test instead—it’s more robust and often the safer default choice in practice.

Introduction to Bartlett’s Test

Bartlett’s test answers a simple but critical question: do multiple groups in your data have the same variance? This property—called homoscedasticity or homogeneity of variances—is a fundamental assumption for many statistical tests, most notably one-way ANOVA.

Named after Maurice Bartlett who developed it in 1937, the test compares the variances of two or more independent samples and determines whether the differences between them are statistically significant. The null hypothesis states that all group variances are equal. A significant result (low p-value) indicates that at least one group has a different variance.

You’ll typically reach for Bartlett’s test when preparing to run ANOVA, comparing experimental conditions, or validating assumptions before regression analysis. It’s particularly useful when you have three or more groups to compare—though it works with just two as well.

Assumptions and Prerequisites

Before running Bartlett’s test, understand what it requires:

Normality: Bartlett’s test assumes your data follows a normal distribution within each group. This isn’t a minor footnote—the test is notoriously sensitive to departures from normality. Even moderate skewness or kurtosis can inflate your Type I error rate, leading to false positives.

Independence: Observations must be independent both within and across groups. No repeated measures, no paired data.

Continuous data: The test works with continuous numerical variables, not categorical or ordinal data.

Here’s the critical practical advice: always check normality before using Bartlett’s test. If your data isn’t normal, the test results are unreliable regardless of what the p-value says.

from scipy import stats
import numpy as np

# Check normality with Shapiro-Wilk test for each group
group1 = np.array([23, 25, 28, 22, 27, 26, 24, 29, 25, 23])
group2 = np.array([31, 33, 29, 35, 32, 34, 30, 36, 33, 31])
group3 = np.array([18, 22, 20, 19, 21, 23, 17, 20, 22, 19])

for i, group in enumerate([group1, group2, group3], 1):
    stat, p = stats.shapiro(group)
    print(f"Group {i}: Shapiro-Wilk p-value = {p:.4f}")

If any group fails the normality test (p < 0.05), consider using Levene’s test instead.

Performing Bartlett’s Test with SciPy

The scipy.stats.bartlett() function provides a clean, straightforward implementation. It accepts any number of sample arrays and returns the test statistic and p-value.

from scipy import stats
import numpy as np

# Sample data: test scores from three different teaching methods
method_a = np.array([85, 88, 90, 82, 87, 89, 84, 86, 88, 85])
method_b = np.array([78, 82, 75, 80, 77, 83, 79, 81, 76, 80])
method_c = np.array([92, 88, 95, 90, 87, 93, 89, 91, 94, 90])

# Perform Bartlett's test
statistic, p_value = stats.bartlett(method_a, method_b, method_c)

print(f"Bartlett's test statistic: {statistic:.4f}")
print(f"P-value: {p_value:.4f}")

# Interpret the result
alpha = 0.05
if p_value > alpha:
    print("Fail to reject null hypothesis: variances are equal")
else:
    print("Reject null hypothesis: variances are NOT equal")

The test statistic follows a chi-squared distribution with k-1 degrees of freedom, where k is the number of groups. A larger statistic indicates greater differences between group variances.

Interpretation is straightforward:

p-value > 0.05: No significant difference in variances (homoscedasticity assumption holds)
p-value ≤ 0.05: Significant difference in variances (assumption violated)

Complete Worked Example with Real Data

Let’s work through a realistic scenario. Imagine you’re analyzing crop yields from three different fertilizer treatments and need to verify variance equality before running ANOVA.

import numpy as np
import pandas as pd
from scipy import stats

# Simulated crop yield data (kg per plot) for three fertilizer types
np.random.seed(42)

fertilizer_a = np.array([45.2, 47.8, 44.1, 46.5, 48.2, 45.9, 47.1, 44.8, 46.2, 45.5,
                         47.3, 44.6, 46.8, 45.1, 47.5])
fertilizer_b = np.array([52.1, 54.3, 51.8, 53.2, 55.1, 52.7, 54.8, 51.5, 53.9, 52.4,
                         54.1, 51.9, 53.5, 52.8, 54.6])
fertilizer_c = np.array([48.5, 51.2, 47.8, 50.1, 52.3, 49.2, 51.8, 48.1, 50.5, 49.8,
                         51.1, 48.9, 50.8, 49.5, 51.5])

# Create a summary DataFrame
summary_data = {
    'Fertilizer': ['A', 'B', 'C'],
    'Mean': [fertilizer_a.mean(), fertilizer_b.mean(), fertilizer_c.mean()],
    'Variance': [fertilizer_a.var(ddof=1), fertilizer_b.var(ddof=1), fertilizer_c.var(ddof=1)],
    'Std Dev': [fertilizer_a.std(ddof=1), fertilizer_b.std(ddof=1), fertilizer_c.std(ddof=1)],
    'N': [len(fertilizer_a), len(fertilizer_b), len(fertilizer_c)]
}

summary_df = pd.DataFrame(summary_data)
print("Group Statistics:")
print(summary_df.to_string(index=False))
print()

# Step 1: Check normality for each group
print("Normality Check (Shapiro-Wilk):")
groups = {'A': fertilizer_a, 'B': fertilizer_b, 'C': fertilizer_c}
all_normal = True

for name, data in groups.items():
    stat, p = stats.shapiro(data)
    status = "Normal" if p > 0.05 else "Non-normal"
    if p <= 0.05:
        all_normal = False
    print(f"  Fertilizer {name}: p = {p:.4f} ({status})")

print()

# Step 2: Perform Bartlett's test (only if normality assumption holds)
if all_normal:
    statistic, p_value = stats.bartlett(fertilizer_a, fertilizer_b, fertilizer_c)
    
    print("Bartlett's Test Results:")
    print(f"  Test statistic: {statistic:.4f}")
    print(f"  P-value: {p_value:.4f}")
    print(f"  Degrees of freedom: {len(groups) - 1}")
    print()
    
    alpha = 0.05
    if p_value > alpha:
        print(f"Conclusion: p-value ({p_value:.4f}) > α ({alpha})")
        print("The variances are homogeneous. Proceed with ANOVA.")
    else:
        print(f"Conclusion: p-value ({p_value:.4f}) ≤ α ({alpha})")
        print("The variances are NOT homogeneous. Consider Welch's ANOVA.")
else:
    print("Warning: Normality assumption violated. Use Levene's test instead.")

This example demonstrates the proper workflow: summarize your data, check normality, then run Bartlett’s test only if the normality assumption holds.

Visualizing Variance Differences

Statistical tests tell you whether differences are significant, but visualization helps you understand the practical magnitude of those differences. Box plots and violin plots excel at displaying variance comparisons.

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Prepare data
np.random.seed(42)
fertilizer_a = np.array([45.2, 47.8, 44.1, 46.5, 48.2, 45.9, 47.1, 44.8, 46.2, 45.5,
                         47.3, 44.6, 46.8, 45.1, 47.5])
fertilizer_b = np.array([52.1, 54.3, 51.8, 53.2, 55.1, 52.7, 54.8, 51.5, 53.9, 52.4,
                         54.1, 51.9, 53.5, 52.8, 54.6])
fertilizer_c = np.array([48.5, 51.2, 47.8, 50.1, 52.3, 49.2, 51.8, 48.1, 50.5, 49.8,
                         51.1, 48.9, 50.8, 49.5, 51.5])

# Combine into long format for seaborn
import pandas as pd
df = pd.DataFrame({
    'Yield': np.concatenate([fertilizer_a, fertilizer_b, fertilizer_c]),
    'Fertilizer': ['A']*15 + ['B']*15 + ['C']*15
})

# Perform Bartlett's test
stat, p_value = stats.bartlett(fertilizer_a, fertilizer_b, fertilizer_c)

# Create visualization
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Box plot
sns.boxplot(x='Fertilizer', y='Yield', data=df, ax=axes[0], palette='Set2')
axes[0].set_title('Box Plot: Yield by Fertilizer Type')
axes[0].set_ylabel('Yield (kg)')

# Violin plot with individual points
sns.violinplot(x='Fertilizer', y='Yield', data=df, ax=axes[1], palette='Set2', alpha=0.7)
sns.stripplot(x='Fertilizer', y='Yield', data=df, ax=axes[1], color='black', alpha=0.5, size=4)
axes[1].set_title('Violin Plot: Yield Distribution by Fertilizer')
axes[1].set_ylabel('Yield (kg)')

# Add Bartlett's test result as annotation
fig.suptitle(f"Bartlett's Test: statistic = {stat:.3f}, p-value = {p_value:.4f}", 
             fontsize=11, y=1.02)

plt.tight_layout()
plt.savefig('variance_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

The box plot shows the interquartile range and outliers, while the violin plot reveals the full distribution shape. Similar widths across groups suggest equal variances; dramatically different spreads indicate heteroscedasticity.

Bartlett’s Test Before ANOVA

The most common use case for Bartlett’s test is validating ANOVA assumptions. Here’s a complete workflow that checks variance equality before proceeding with the appropriate analysis:

import numpy as np
from scipy import stats

def anova_with_variance_check(groups, group_names, alpha=0.05):
    """
    Perform ANOVA with preliminary variance check.
    Uses Bartlett's test if data is normal, Levene's otherwise.
    """
    # Check normality for all groups
    normality_results = []
    for name, data in zip(group_names, groups):
        _, p = stats.shapiro(data)
        normality_results.append(p > alpha)
    
    all_normal = all(normality_results)
    
    # Choose appropriate variance test
    if all_normal:
        var_stat, var_p = stats.bartlett(*groups)
        var_test = "Bartlett's"
    else:
        var_stat, var_p = stats.levene(*groups)
        var_test = "Levene's"
    
    print(f"Variance Test ({var_test}): p = {var_p:.4f}")
    
    # Choose appropriate ANOVA
    if var_p > alpha:
        # Variances are equal: use standard one-way ANOVA
        f_stat, anova_p = stats.f_oneway(*groups)
        anova_type = "One-way ANOVA"
    else:
        # Variances are unequal: use Welch's ANOVA
        # Note: scipy doesn't have Welch's ANOVA directly, 
        # but we can use the Alexander-Govern test or pingouin library
        f_stat, anova_p = stats.f_oneway(*groups)  # Simplified for example
        anova_type = "One-way ANOVA (consider Welch's ANOVA)"
        print("Warning: Unequal variances detected. Results may be unreliable.")
    
    print(f"{anova_type}: F = {f_stat:.4f}, p = {anova_p:.4f}")
    
    return {'variance_test': var_test, 'variance_p': var_p,
            'anova_f': f_stat, 'anova_p': anova_p}

# Example usage
group1 = np.array([23, 25, 28, 22, 27, 26, 24, 29, 25, 23])
group2 = np.array([31, 33, 29, 35, 32, 34, 30, 36, 33, 31])
group3 = np.array([28, 30, 27, 29, 31, 28, 30, 29, 27, 30])

results = anova_with_variance_check(
    [group1, group2, group3],
    ['Control', 'Treatment A', 'Treatment B']
)

This pattern ensures you’re not blindly running ANOVA without checking its assumptions first.

Alternatives and Limitations

Bartlett’s test has a significant weakness: its sensitivity to non-normality. When your data deviates from normal distribution, consider these alternatives:

Levene’s Test: The go-to alternative when normality is questionable. It tests variance equality using deviations from group means (or medians) and is far more robust to non-normal data.

from scipy import stats

# Levene's test with median (more robust)
stat, p = stats.levene(group1, group2, group3, center='median')
print(f"Levene's test: statistic = {stat:.4f}, p = {p:.4f}")

Brown-Forsythe Test: A variant of Levene’s test that uses deviations from the median rather than the mean. It’s essentially stats.levene() with center='median'.

Fligner-Killeen Test: A non-parametric alternative that’s robust to non-normality and works well with smaller samples.

# Fligner-Killeen test
stat, p = stats.fligner(group1, group2, group3)
print(f"Fligner-Killeen test: statistic = {stat:.4f}, p = {p:.4f}")

My practical recommendation: default to Levene’s test with center='median' unless you’ve confirmed your data is normally distributed. The cost of using a slightly less powerful test is far lower than the cost of invalid results from a violated assumption. Bartlett’s test is powerful when its assumptions hold, but those assumptions rarely hold perfectly in real-world data.