How to Perform the Brown-Forsythe Test in Python

Before running ANOVA or similar parametric tests, you need to verify a critical assumption: that all groups have roughly equal variances. This property, called homoscedasticity or homogeneity of...

Key Insights

  • The Brown-Forsythe test checks equality of variances across groups using medians, making it more robust than Levene’s original test when your data isn’t normally distributed.
  • In Python, you perform the Brown-Forsythe test using scipy.stats.levene() with the center='median' parameter—there’s no separate function needed.
  • When the test rejects variance homogeneity, switch from standard one-way ANOVA to Welch’s ANOVA, which doesn’t assume equal variances.

Introduction to the Brown-Forsythe Test

Before running ANOVA or similar parametric tests, you need to verify a critical assumption: that all groups have roughly equal variances. This property, called homoscedasticity or homogeneity of variance, directly affects the validity of your statistical conclusions.

The Brown-Forsythe test, published in 1974, addresses this by testing whether the variances across multiple groups are statistically equal. It’s a modification of Levene’s test with one key difference: it uses the median of each group rather than the mean when calculating deviations.

This distinction matters. The original Levene’s test (using means) is sensitive to outliers and performs poorly with skewed distributions. By substituting the median, Brown and Forsythe created a test that remains reliable even when your data violates normality assumptions—which happens constantly in real-world datasets.

Use the Brown-Forsythe test when:

  • You’re preparing to run one-way ANOVA and need to check assumptions
  • Your data might be non-normal or contain outliers
  • You’re comparing variability across experimental conditions or demographic groups

Mathematical Foundation

The Brown-Forsythe test transforms the original observations into deviations from group medians, then performs an ANOVA-like analysis on these transformed values.

For each observation $x_{ij}$ in group $i$, compute:

$$z_{ij} = |x_{ij} - \tilde{x}_i|$$

where $\tilde{x}_i$ is the median of group $i$.

The test statistic follows an F-distribution:

$$W = \frac{(N - k) \sum_{i=1}^{k} n_i (\bar{z}i - \bar{z})^2}{(k - 1) \sum{i=1}^{k} \sum_{j=1}^{n_i} (z_{ij} - \bar{z}_i)^2}$$

where $N$ is total sample size, $k$ is the number of groups, $n_i$ is the size of group $i$, $\bar{z}_i$ is the mean of transformed values in group $i$, and $\bar{z}$ is the grand mean of all transformed values.

The hypotheses are straightforward:

  • H₀: All group variances are equal ($\sigma_1^2 = \sigma_2^2 = … = \sigma_k^2$)
  • H₁: At least one group variance differs

A small p-value (typically < 0.05) means you reject the null hypothesis—the variances are significantly different, and you shouldn’t proceed with standard ANOVA.

Performing the Test with SciPy

SciPy doesn’t have a dedicated Brown-Forsythe function. Instead, you use scipy.stats.levene() with the center parameter set to 'median'.

import numpy as np
from scipy import stats

# Generate three sample groups with different characteristics
np.random.seed(42)
group_a = np.random.normal(loc=50, scale=5, size=30)   # SD = 5
group_b = np.random.normal(loc=52, scale=5, size=35)   # SD = 5
group_c = np.random.normal(loc=48, scale=5, size=28)   # SD = 5

# Perform Brown-Forsythe test
statistic, p_value = stats.levene(group_a, group_b, group_c, center='median')

print(f"Brown-Forsythe Test Results")
print(f"Test statistic (W): {statistic:.4f}")
print(f"P-value: {p_value:.4f}")

# Interpret the result
alpha = 0.05
if p_value > alpha:
    print(f"\nResult: Fail to reject H₀ (p={p_value:.4f} > {alpha})")
    print("Variances are homogeneous. Proceed with standard ANOVA.")
else:
    print(f"\nResult: Reject H₀ (p={p_value:.4f} < {alpha})")
    print("Variances are NOT homogeneous. Consider Welch's ANOVA.")

Output:

Brown-Forsythe Test Results
Test statistic (W): 0.1089
P-value: 0.8970

Result: Fail to reject H₀ (p=0.8970 > 0.05)
Variances are homogeneous. Proceed with standard ANOVA.

Since all three groups were generated with the same standard deviation (5), the test correctly identifies homogeneous variances.

Now let’s see what happens with unequal variances:

# Groups with clearly different variances
group_x = np.random.normal(loc=50, scale=3, size=30)   # SD = 3
group_y = np.random.normal(loc=50, scale=10, size=30)  # SD = 10
group_z = np.random.normal(loc=50, scale=15, size=30)  # SD = 15

statistic, p_value = stats.levene(group_x, group_y, group_z, center='median')
print(f"Test statistic: {statistic:.4f}")
print(f"P-value: {p_value:.6f}")

Output:

Test statistic: 14.2531
P-value: 0.000004

The tiny p-value correctly flags the variance heterogeneity.

Real-World Example: Comparing Multiple Groups

Let’s work through a realistic scenario. Suppose you’re analyzing student test scores across four different teaching methods to determine which approach is most effective.

import pandas as pd
import numpy as np
from scipy import stats

# Simulate test scores from four teaching methods
np.random.seed(123)

data = {
    'method': ['Traditional'] * 45 + ['Flipped'] * 42 + ['Online'] * 38 + ['Hybrid'] * 40,
    'score': np.concatenate([
        np.random.normal(72, 8, 45),    # Traditional: moderate variance
        np.random.normal(75, 7, 42),    # Flipped: similar variance
        np.random.normal(70, 15, 38),   # Online: HIGH variance
        np.random.normal(74, 9, 40)     # Hybrid: moderate variance
    ])
}

df = pd.DataFrame(data)

# Quick summary statistics
summary = df.groupby('method')['score'].agg(['mean', 'std', 'count'])
print("Group Summary Statistics:")
print(summary.round(2))
print()

# Extract groups for the test
groups = [group['score'].values for name, group in df.groupby('method')]

# Run Brown-Forsythe test
stat, p_val = stats.levene(*groups, center='median')

print(f"Brown-Forsythe Test")
print(f"W-statistic: {stat:.4f}")
print(f"P-value: {p_val:.4f}")
print()

# Decision
alpha = 0.05
if p_val < alpha:
    print(f"⚠️  Significant variance heterogeneity detected (p < {alpha})")
    print("   Standard one-way ANOVA assumptions violated.")
    print("   Recommendation: Use Welch's ANOVA instead.")
else:
    print(f"✓  Variances are homogeneous (p >= {alpha})")
    print("   Safe to proceed with standard one-way ANOVA.")

Output:

Group Summary Statistics:
             mean    std  count
method                         
Flipped     75.12   6.89     42
Hybrid      73.47   8.54     40
Online      68.92  14.23     38
Traditional 71.89   7.93     45

Brown-Forsythe Test
W-statistic: 5.8234
P-value: 0.0009

⚠️  Significant variance heterogeneity detected (p < 0.05)
   Standard one-way ANOVA assumptions violated.
   Recommendation: Use Welch's ANOVA instead.

The Online group’s standard deviation (14.23) is roughly double the others. The Brown-Forsythe test catches this, warning us that standard ANOVA would be inappropriate.

Visualizing Variance Differences

Statistical tests give you numbers, but visualization helps you understand the data structure and communicate findings to stakeholders.

import matplotlib.pyplot as plt
import seaborn as sns

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Box plot showing spread differences
sns.boxplot(data=df, x='method', y='score', ax=axes[0], palette='Set2')
axes[0].set_title('Score Distribution by Teaching Method')
axes[0].set_xlabel('Teaching Method')
axes[0].set_ylabel('Test Score')

# Add variance annotations
for i, method in enumerate(summary.index):
    std = summary.loc[method, 'std']
    axes[0].annotate(f'SD={std:.1f}', 
                     xy=(i, summary.loc[method, 'mean'] + 20),
                     ha='center', fontsize=9, color='darkred')

# Strip plot for individual points
sns.stripplot(data=df, x='method', y='score', ax=axes[1], 
              palette='Set2', alpha=0.6, jitter=True)
axes[1].set_title('Individual Scores by Teaching Method')
axes[1].set_xlabel('Teaching Method')
axes[1].set_ylabel('Test Score')

# Add test result annotation
fig.suptitle(f'Brown-Forsythe Test: W={stat:.2f}, p={p_val:.4f}', 
             fontsize=11, style='italic')

plt.tight_layout()
plt.savefig('variance_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

The box plot immediately reveals the Online group’s wider spread. The strip plot shows the actual data points, making the variance difference even more apparent.

Handling Test Results in Your Analysis Pipeline

In practice, you’ll want to automate the decision process. Here’s a robust workflow that tests for homogeneity and selects the appropriate ANOVA variant:

from scipy import stats
import numpy as np

def compare_groups(groups, group_names=None, alpha=0.05):
    """
    Complete workflow: test variance homogeneity, then run appropriate ANOVA.
    
    Parameters:
    -----------
    groups : list of array-like
        Data for each group
    group_names : list of str, optional
        Names for each group
    alpha : float
        Significance level
    
    Returns:
    --------
    dict : Results including test choice and statistics
    """
    if group_names is None:
        group_names = [f"Group_{i+1}" for i in range(len(groups))]
    
    results = {'alpha': alpha, 'n_groups': len(groups)}
    
    # Step 1: Brown-Forsythe test for variance homogeneity
    bf_stat, bf_pval = stats.levene(*groups, center='median')
    results['brown_forsythe'] = {'statistic': bf_stat, 'p_value': bf_pval}
    
    variances_equal = bf_pval >= alpha
    results['variances_equal'] = variances_equal
    
    # Step 2: Choose and run appropriate ANOVA
    if variances_equal:
        # Standard one-way ANOVA
        anova_stat, anova_pval = stats.f_oneway(*groups)
        results['test_used'] = 'One-way ANOVA'
        results['anova'] = {'statistic': anova_stat, 'p_value': anova_pval}
    else:
        # Welch's ANOVA (doesn't assume equal variances)
        # Available in scipy as alexander_govern for >2 groups
        # For robustness, we'll use the direct Welch implementation
        from scipy.stats import alexandergovern
        welch_result = alexandergovern(*groups)
        results['test_used'] = "Welch's ANOVA (Alexander-Govern)"
        results['anova'] = {'statistic': welch_result.statistic, 
                           'p_value': welch_result.pvalue}
    
    # Step 3: Interpret results
    anova_significant = results['anova']['p_value'] < alpha
    results['groups_differ'] = anova_significant
    
    return results

def print_results(results):
    """Pretty print the analysis results."""
    print("=" * 50)
    print("VARIANCE HOMOGENEITY & GROUP COMPARISON ANALYSIS")
    print("=" * 50)
    
    print(f"\n1. Brown-Forsythe Test (Variance Homogeneity)")
    print(f"   W-statistic: {results['brown_forsythe']['statistic']:.4f}")
    print(f"   P-value: {results['brown_forsythe']['p_value']:.4f}")
    print(f"   Variances equal: {'Yes' if results['variances_equal'] else 'No'}")
    
    print(f"\n2. {results['test_used']}")
    print(f"   Test statistic: {results['anova']['statistic']:.4f}")
    print(f"   P-value: {results['anova']['p_value']:.4f}")
    print(f"   Groups significantly different: {'Yes' if results['groups_differ'] else 'No'}")
    
    print("\n" + "=" * 50)

# Example usage
np.random.seed(42)
control = np.random.normal(100, 15, 50)
treatment_a = np.random.normal(110, 15, 48)
treatment_b = np.random.normal(105, 25, 52)  # Higher variance

results = compare_groups(
    [control, treatment_a, treatment_b],
    ['Control', 'Treatment A', 'Treatment B']
)
print_results(results)

Output:

==================================================
VARIANCE HOMOGENEITY & GROUP COMPARISON ANALYSIS
==================================================

1. Brown-Forsythe Test (Variance Homogeneity)
   W-statistic: 4.8921
   P-value: 0.0086
   Variances equal: No

2. Welch's ANOVA (Alexander-Govern)
   Test statistic: 7.2145
   P-value: 0.0271
   Groups significantly different: Yes

==================================================

The pipeline automatically detected unequal variances and switched to Welch’s ANOVA, giving you valid results without manual intervention.

Summary and Best Practices

The Brown-Forsythe test is your go-to method for checking variance homogeneity before ANOVA. Here’s what to remember:

Do:

  • Always check variance homogeneity before running standard ANOVA
  • Use center='median' in scipy.stats.levene() for the Brown-Forsythe variant
  • Visualize your data—box plots reveal variance differences intuitively
  • Have a fallback plan (Welch’s ANOVA) when homogeneity fails

Don’t:

  • Ignore the test result and run standard ANOVA anyway
  • Use arbitrary p-value thresholds without considering your domain
  • Rely solely on the test with very small samples (n < 10 per group)

Sample size considerations:

  • The test has low power with small samples—you might miss real variance differences
  • With very large samples, trivial variance differences become “significant”
  • Use effect sizes and practical significance alongside p-values

Quick reference:

from scipy import stats

# Brown-Forsythe test
stat, pval = stats.levene(group1, group2, group3, center='median')

# If p >= 0.05: use stats.f_oneway()
# If p < 0.05: use stats.alexandergovern() or pingouin.welch_anova()

The Brown-Forsythe test takes seconds to run and can save you from drawing invalid conclusions. Make it a standard part of your analysis workflow.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.