How to Perform the Brown-Forsythe Test in Python
Before running ANOVA or similar parametric tests, you need to verify a critical assumption: that all groups have roughly equal variances. This property, called homoscedasticity or homogeneity of...
Key Insights
- The Brown-Forsythe test checks equality of variances across groups using medians, making it more robust than Levene’s original test when your data isn’t normally distributed.
- In Python, you perform the Brown-Forsythe test using
scipy.stats.levene()with thecenter='median'parameter—there’s no separate function needed. - When the test rejects variance homogeneity, switch from standard one-way ANOVA to Welch’s ANOVA, which doesn’t assume equal variances.
Introduction to the Brown-Forsythe Test
Before running ANOVA or similar parametric tests, you need to verify a critical assumption: that all groups have roughly equal variances. This property, called homoscedasticity or homogeneity of variance, directly affects the validity of your statistical conclusions.
The Brown-Forsythe test, published in 1974, addresses this by testing whether the variances across multiple groups are statistically equal. It’s a modification of Levene’s test with one key difference: it uses the median of each group rather than the mean when calculating deviations.
This distinction matters. The original Levene’s test (using means) is sensitive to outliers and performs poorly with skewed distributions. By substituting the median, Brown and Forsythe created a test that remains reliable even when your data violates normality assumptions—which happens constantly in real-world datasets.
Use the Brown-Forsythe test when:
- You’re preparing to run one-way ANOVA and need to check assumptions
- Your data might be non-normal or contain outliers
- You’re comparing variability across experimental conditions or demographic groups
Mathematical Foundation
The Brown-Forsythe test transforms the original observations into deviations from group medians, then performs an ANOVA-like analysis on these transformed values.
For each observation $x_{ij}$ in group $i$, compute:
$$z_{ij} = |x_{ij} - \tilde{x}_i|$$
where $\tilde{x}_i$ is the median of group $i$.
The test statistic follows an F-distribution:
$$W = \frac{(N - k) \sum_{i=1}^{k} n_i (\bar{z}i - \bar{z})^2}{(k - 1) \sum{i=1}^{k} \sum_{j=1}^{n_i} (z_{ij} - \bar{z}_i)^2}$$
where $N$ is total sample size, $k$ is the number of groups, $n_i$ is the size of group $i$, $\bar{z}_i$ is the mean of transformed values in group $i$, and $\bar{z}$ is the grand mean of all transformed values.
The hypotheses are straightforward:
- H₀: All group variances are equal ($\sigma_1^2 = \sigma_2^2 = … = \sigma_k^2$)
- H₁: At least one group variance differs
A small p-value (typically < 0.05) means you reject the null hypothesis—the variances are significantly different, and you shouldn’t proceed with standard ANOVA.
Performing the Test with SciPy
SciPy doesn’t have a dedicated Brown-Forsythe function. Instead, you use scipy.stats.levene() with the center parameter set to 'median'.
import numpy as np
from scipy import stats
# Generate three sample groups with different characteristics
np.random.seed(42)
group_a = np.random.normal(loc=50, scale=5, size=30) # SD = 5
group_b = np.random.normal(loc=52, scale=5, size=35) # SD = 5
group_c = np.random.normal(loc=48, scale=5, size=28) # SD = 5
# Perform Brown-Forsythe test
statistic, p_value = stats.levene(group_a, group_b, group_c, center='median')
print(f"Brown-Forsythe Test Results")
print(f"Test statistic (W): {statistic:.4f}")
print(f"P-value: {p_value:.4f}")
# Interpret the result
alpha = 0.05
if p_value > alpha:
print(f"\nResult: Fail to reject H₀ (p={p_value:.4f} > {alpha})")
print("Variances are homogeneous. Proceed with standard ANOVA.")
else:
print(f"\nResult: Reject H₀ (p={p_value:.4f} < {alpha})")
print("Variances are NOT homogeneous. Consider Welch's ANOVA.")
Output:
Brown-Forsythe Test Results
Test statistic (W): 0.1089
P-value: 0.8970
Result: Fail to reject H₀ (p=0.8970 > 0.05)
Variances are homogeneous. Proceed with standard ANOVA.
Since all three groups were generated with the same standard deviation (5), the test correctly identifies homogeneous variances.
Now let’s see what happens with unequal variances:
# Groups with clearly different variances
group_x = np.random.normal(loc=50, scale=3, size=30) # SD = 3
group_y = np.random.normal(loc=50, scale=10, size=30) # SD = 10
group_z = np.random.normal(loc=50, scale=15, size=30) # SD = 15
statistic, p_value = stats.levene(group_x, group_y, group_z, center='median')
print(f"Test statistic: {statistic:.4f}")
print(f"P-value: {p_value:.6f}")
Output:
Test statistic: 14.2531
P-value: 0.000004
The tiny p-value correctly flags the variance heterogeneity.
Real-World Example: Comparing Multiple Groups
Let’s work through a realistic scenario. Suppose you’re analyzing student test scores across four different teaching methods to determine which approach is most effective.
import pandas as pd
import numpy as np
from scipy import stats
# Simulate test scores from four teaching methods
np.random.seed(123)
data = {
'method': ['Traditional'] * 45 + ['Flipped'] * 42 + ['Online'] * 38 + ['Hybrid'] * 40,
'score': np.concatenate([
np.random.normal(72, 8, 45), # Traditional: moderate variance
np.random.normal(75, 7, 42), # Flipped: similar variance
np.random.normal(70, 15, 38), # Online: HIGH variance
np.random.normal(74, 9, 40) # Hybrid: moderate variance
])
}
df = pd.DataFrame(data)
# Quick summary statistics
summary = df.groupby('method')['score'].agg(['mean', 'std', 'count'])
print("Group Summary Statistics:")
print(summary.round(2))
print()
# Extract groups for the test
groups = [group['score'].values for name, group in df.groupby('method')]
# Run Brown-Forsythe test
stat, p_val = stats.levene(*groups, center='median')
print(f"Brown-Forsythe Test")
print(f"W-statistic: {stat:.4f}")
print(f"P-value: {p_val:.4f}")
print()
# Decision
alpha = 0.05
if p_val < alpha:
print(f"⚠️ Significant variance heterogeneity detected (p < {alpha})")
print(" Standard one-way ANOVA assumptions violated.")
print(" Recommendation: Use Welch's ANOVA instead.")
else:
print(f"✓ Variances are homogeneous (p >= {alpha})")
print(" Safe to proceed with standard one-way ANOVA.")
Output:
Group Summary Statistics:
mean std count
method
Flipped 75.12 6.89 42
Hybrid 73.47 8.54 40
Online 68.92 14.23 38
Traditional 71.89 7.93 45
Brown-Forsythe Test
W-statistic: 5.8234
P-value: 0.0009
⚠️ Significant variance heterogeneity detected (p < 0.05)
Standard one-way ANOVA assumptions violated.
Recommendation: Use Welch's ANOVA instead.
The Online group’s standard deviation (14.23) is roughly double the others. The Brown-Forsythe test catches this, warning us that standard ANOVA would be inappropriate.
Visualizing Variance Differences
Statistical tests give you numbers, but visualization helps you understand the data structure and communicate findings to stakeholders.
import matplotlib.pyplot as plt
import seaborn as sns
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
# Box plot showing spread differences
sns.boxplot(data=df, x='method', y='score', ax=axes[0], palette='Set2')
axes[0].set_title('Score Distribution by Teaching Method')
axes[0].set_xlabel('Teaching Method')
axes[0].set_ylabel('Test Score')
# Add variance annotations
for i, method in enumerate(summary.index):
std = summary.loc[method, 'std']
axes[0].annotate(f'SD={std:.1f}',
xy=(i, summary.loc[method, 'mean'] + 20),
ha='center', fontsize=9, color='darkred')
# Strip plot for individual points
sns.stripplot(data=df, x='method', y='score', ax=axes[1],
palette='Set2', alpha=0.6, jitter=True)
axes[1].set_title('Individual Scores by Teaching Method')
axes[1].set_xlabel('Teaching Method')
axes[1].set_ylabel('Test Score')
# Add test result annotation
fig.suptitle(f'Brown-Forsythe Test: W={stat:.2f}, p={p_val:.4f}',
fontsize=11, style='italic')
plt.tight_layout()
plt.savefig('variance_comparison.png', dpi=150, bbox_inches='tight')
plt.show()
The box plot immediately reveals the Online group’s wider spread. The strip plot shows the actual data points, making the variance difference even more apparent.
Handling Test Results in Your Analysis Pipeline
In practice, you’ll want to automate the decision process. Here’s a robust workflow that tests for homogeneity and selects the appropriate ANOVA variant:
from scipy import stats
import numpy as np
def compare_groups(groups, group_names=None, alpha=0.05):
"""
Complete workflow: test variance homogeneity, then run appropriate ANOVA.
Parameters:
-----------
groups : list of array-like
Data for each group
group_names : list of str, optional
Names for each group
alpha : float
Significance level
Returns:
--------
dict : Results including test choice and statistics
"""
if group_names is None:
group_names = [f"Group_{i+1}" for i in range(len(groups))]
results = {'alpha': alpha, 'n_groups': len(groups)}
# Step 1: Brown-Forsythe test for variance homogeneity
bf_stat, bf_pval = stats.levene(*groups, center='median')
results['brown_forsythe'] = {'statistic': bf_stat, 'p_value': bf_pval}
variances_equal = bf_pval >= alpha
results['variances_equal'] = variances_equal
# Step 2: Choose and run appropriate ANOVA
if variances_equal:
# Standard one-way ANOVA
anova_stat, anova_pval = stats.f_oneway(*groups)
results['test_used'] = 'One-way ANOVA'
results['anova'] = {'statistic': anova_stat, 'p_value': anova_pval}
else:
# Welch's ANOVA (doesn't assume equal variances)
# Available in scipy as alexander_govern for >2 groups
# For robustness, we'll use the direct Welch implementation
from scipy.stats import alexandergovern
welch_result = alexandergovern(*groups)
results['test_used'] = "Welch's ANOVA (Alexander-Govern)"
results['anova'] = {'statistic': welch_result.statistic,
'p_value': welch_result.pvalue}
# Step 3: Interpret results
anova_significant = results['anova']['p_value'] < alpha
results['groups_differ'] = anova_significant
return results
def print_results(results):
"""Pretty print the analysis results."""
print("=" * 50)
print("VARIANCE HOMOGENEITY & GROUP COMPARISON ANALYSIS")
print("=" * 50)
print(f"\n1. Brown-Forsythe Test (Variance Homogeneity)")
print(f" W-statistic: {results['brown_forsythe']['statistic']:.4f}")
print(f" P-value: {results['brown_forsythe']['p_value']:.4f}")
print(f" Variances equal: {'Yes' if results['variances_equal'] else 'No'}")
print(f"\n2. {results['test_used']}")
print(f" Test statistic: {results['anova']['statistic']:.4f}")
print(f" P-value: {results['anova']['p_value']:.4f}")
print(f" Groups significantly different: {'Yes' if results['groups_differ'] else 'No'}")
print("\n" + "=" * 50)
# Example usage
np.random.seed(42)
control = np.random.normal(100, 15, 50)
treatment_a = np.random.normal(110, 15, 48)
treatment_b = np.random.normal(105, 25, 52) # Higher variance
results = compare_groups(
[control, treatment_a, treatment_b],
['Control', 'Treatment A', 'Treatment B']
)
print_results(results)
Output:
==================================================
VARIANCE HOMOGENEITY & GROUP COMPARISON ANALYSIS
==================================================
1. Brown-Forsythe Test (Variance Homogeneity)
W-statistic: 4.8921
P-value: 0.0086
Variances equal: No
2. Welch's ANOVA (Alexander-Govern)
Test statistic: 7.2145
P-value: 0.0271
Groups significantly different: Yes
==================================================
The pipeline automatically detected unequal variances and switched to Welch’s ANOVA, giving you valid results without manual intervention.
Summary and Best Practices
The Brown-Forsythe test is your go-to method for checking variance homogeneity before ANOVA. Here’s what to remember:
Do:
- Always check variance homogeneity before running standard ANOVA
- Use
center='median'inscipy.stats.levene()for the Brown-Forsythe variant - Visualize your data—box plots reveal variance differences intuitively
- Have a fallback plan (Welch’s ANOVA) when homogeneity fails
Don’t:
- Ignore the test result and run standard ANOVA anyway
- Use arbitrary p-value thresholds without considering your domain
- Rely solely on the test with very small samples (n < 10 per group)
Sample size considerations:
- The test has low power with small samples—you might miss real variance differences
- With very large samples, trivial variance differences become “significant”
- Use effect sizes and practical significance alongside p-values
Quick reference:
from scipy import stats
# Brown-Forsythe test
stat, pval = stats.levene(group1, group2, group3, center='median')
# If p >= 0.05: use stats.f_oneway()
# If p < 0.05: use stats.alexandergovern() or pingouin.welch_anova()
The Brown-Forsythe test takes seconds to run and can save you from drawing invalid conclusions. Make it a standard part of your analysis workflow.