Levene's Test in R: Step-by-Step Guide

Key Insights

Levene’s test checks whether groups have equal variances, a critical assumption for ANOVA and t-tests—use the median-centered version (Brown-Forsythe) for robustness against non-normal data
A significant result (p < 0.05) indicates unequal variances, signaling you should use Welch’s ANOVA or other robust alternatives instead of standard ANOVA
Don’t rely solely on the p-value; combine Levene’s test with visual inspection of distributions and consider practical significance of variance differences

Introduction to Levene’s Test

Levene’s test answers a fundamental question in statistical analysis: do your groups have equal variances? This assumption, called homogeneity of variance or homoscedasticity, underpins many common statistical tests including ANOVA, t-tests, and linear regression.

When you run a one-way ANOVA comparing treatment groups, you’re assuming the variability within each group is roughly the same. If one group has wildly different spread than others, your F-test results become unreliable. Levene’s test gives you a formal way to check this assumption before proceeding.

Why Levene’s over Bartlett’s test? Bartlett’s test is highly sensitive to departures from normality—if your data isn’t perfectly normal, Bartlett’s will often reject the null hypothesis even when variances are actually equal. Levene’s test is more robust, making it the practical choice for real-world data that rarely follows textbook distributions.

Prerequisites and Setup

You’ll need the car package for the standard implementation. The lawstat package offers an alternative with additional options. Install and load them:

# Install packages (run once)
install.packages("car")
install.packages("lawstat")

# Load libraries
library(car)
library(lawstat)

Your data should be in one of two formats: a numeric vector with a corresponding grouping factor, or a data frame with columns for the response variable and group membership. Both wide and long formats work, but long format is more common for ANOVA-style analyses.

Minimum sample sizes matter. While Levene’s test technically works with small samples, you need at least 3-4 observations per group for meaningful results. With very small samples, the test has low power and may miss real variance differences.

Understanding the Test Mechanics

Levene’s test transforms the problem of comparing variances into a problem of comparing means—something we already know how to do with ANOVA.

Here’s the logic: instead of directly comparing variances, Levene’s test calculates how far each observation deviates from its group center. These deviations become the new data points. If variances are equal across groups, the average deviation should be similar across groups. An ANOVA on these deviations tests exactly that.

The “center” can be defined three ways:

Mean: The original Levene’s test. Uses deviations from group means.
Median: Called the Brown-Forsythe test. More robust to outliers and skewed distributions.
Trimmed mean: A compromise—trims extreme values before calculating the center.

The test produces an F-statistic and p-value. A small p-value (typically < 0.05) means you reject the null hypothesis of equal variances. The groups have significantly different spreads.

Basic Implementation with the `car` Package

Let’s start with the PlantGrowth dataset, which contains plant weights across three treatment conditions:

# Load the data
data(PlantGrowth)

# Examine structure
str(PlantGrowth)
# 'data.frame': 30 obs. of 2 variables:
#  $ weight: num  4.17 5.58 5.18 6.11 4.5 ...
#  $ group : Factor w/ 3 levels "ctrl","trt1","trt2": 1 1 1 1 1 ...

# Check group sizes
table(PlantGrowth$group)
# ctrl trt1 trt2 
#   10   10   10

Now run Levene’s test using the formula interface:

# Run Levene's test
leveneTest(weight ~ group, data = PlantGrowth)

The output looks like this:

Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   2  1.1192 0.3412
       27

Breaking down the output:

Df: Degrees of freedom. The first row (2) is between-groups df (k-1 where k=3 groups). The second row (27) is within-groups df (N-k where N=30).
F value: The test statistic (1.1192). Larger values suggest greater variance differences.
Pr(>F): The p-value (0.3412). Since this exceeds 0.05, we fail to reject the null hypothesis. The variances are not significantly different.

This result means the homogeneity assumption holds for these data. You can proceed with standard ANOVA.

Comparing Center Options

The default in car::leveneTest() is center = "median". Let’s see how the choice affects results:

# Compare all three center options
leveneTest(weight ~ group, data = PlantGrowth, center = "mean")
leveneTest(weight ~ group, data = PlantGrowth, center = "median")
leveneTest(weight ~ group, data = PlantGrowth, center = "trimmed")

For the PlantGrowth data, results are similar because the data is reasonably symmetric. The differences become stark with skewed data:

# Create skewed data with different variances
set.seed(42)
skewed_data <- data.frame(
  value = c(
    rexp(30, rate = 1),      # Group A: exponential, var ≈ 1
    rexp(30, rate = 0.5),    # Group B: exponential, var ≈ 4
    rexp(30, rate = 0.25)    # Group C: exponential, var ≈ 16
  ),
  group = factor(rep(c("A", "B", "C"), each = 30))
)

# Compare centers with skewed data
cat("Mean-centered:\n")
leveneTest(value ~ group, data = skewed_data, center = "mean")

cat("\nMedian-centered (Brown-Forsythe):\n")
leveneTest(value ~ group, data = skewed_data, center = "median")

Mean-centered:
Levene's Test for Homogeneity of Variance (center = mean)
       Df F value   Pr(>F)   
group   2  7.2841 0.001194 **
       87                    

Median-centered (Brown-Forsythe):
Levene's Test for Homogeneity of Variance (center = median)
       Df F value   Pr(>F)   
group   2  5.8924 0.004012 **
       87

Both detect the variance difference, but notice the F-value is lower with median centering. With skewed data containing outliers, the mean-centered version can be overly sensitive, flagging variance differences that are actually driven by a few extreme values rather than genuine heteroscedasticity.

Recommendation: Use center = "median" (the default) unless you have strong reasons to believe your data is symmetric and outlier-free.

Practical Application: Pre-ANOVA Workflow

Here’s a complete workflow for checking assumptions before running ANOVA:

# Load required packages
library(car)
library(ggplot2)

# Example: comparing fuel efficiency across cylinder counts
data(mtcars)
mtcars$cyl <- factor(mtcars$cyl)

# Step 1: Visualize distributions
ggplot(mtcars, aes(x = cyl, y = mpg, fill = cyl)) +
  geom_boxplot(alpha = 0.7) +
  geom_jitter(width = 0.2, alpha = 0.5) +
  labs(title = "MPG by Cylinder Count",
       x = "Cylinders", y = "Miles per Gallon") +
  theme_minimal() +
  theme(legend.position = "none")

# Step 2: Check normality within groups (Shapiro-Wilk)
by(mtcars$mpg, mtcars$cyl, shapiro.test)

# Step 3: Check homogeneity of variance (Levene's test)
levene_result <- leveneTest(mpg ~ cyl, data = mtcars)
print(levene_result)

# Step 4: Choose appropriate ANOVA based on results
if (levene_result$`Pr(>F)`[1] < 0.05) {
  # Variances unequal: use Welch's ANOVA
  cat("\nVariances unequal. Using Welch's ANOVA:\n")
  welch_result <- oneway.test(mpg ~ cyl, data = mtcars, var.equal = FALSE)
  print(welch_result)
} else {
  # Variances equal: use standard ANOVA
  cat("\nVariances equal. Using standard ANOVA:\n")
  anova_result <- aov(mpg ~ cyl, data = mtcars)
  print(summary(anova_result))
}

Running this produces:

Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   2  0.9355 0.4037
       29

Variances equal. Using standard ANOVA:
            Df Sum Sq Mean Sq F value   Pr(>F)    
cyl          2  824.8   412.4   39.7 4.98e-09 ***
Residuals   29  301.3    10.4

When Levene’s test is significant (unequal variances), you have several options:

# Option 1: Welch's ANOVA (recommended)
oneway.test(mpg ~ cyl, data = mtcars, var.equal = FALSE)

# Option 2: Kruskal-Wallis (non-parametric alternative)
kruskal.test(mpg ~ cyl, data = mtcars)

# Option 3: Transform the response variable
mtcars$log_mpg <- log(mtcars$mpg)
leveneTest(log_mpg ~ cyl, data = mtcars)

Common Pitfalls and Best Practices

Sample size imbalance: Levene’s test becomes less reliable with very unequal group sizes. If one group has 100 observations and another has 10, interpret results cautiously.

Over-reliance on p-values: A non-significant Levene’s test doesn’t prove variances are equal—it means you lack evidence to reject equality. With small samples, you might miss real differences. With very large samples, trivial differences become significant.

Multiple testing: If you’re running Levene’s test across many variables, you’re inflating your Type I error rate. Consider adjusting your significance threshold.

Ignoring visual inspection: Always plot your data. Sometimes variances look obviously different, but the test isn’t significant (low power), or the test is significant but the practical difference is negligible.

Scenario	Levene’s p-value	Recommendation
p > 0.10, visually similar	Use standard ANOVA
p < 0.05, visually different	Use Welch’s ANOVA
p < 0.05, visually similar	Check sample sizes; consider Welch’s anyway
p > 0.10, visually different	Increase sample size or use Welch’s cautiously

Final advice: When in doubt, use Welch’s ANOVA. It performs nearly as well as standard ANOVA when variances are equal and much better when they’re not. The cost of using it unnecessarily is minimal; the cost of using standard ANOVA with unequal variances can be substantial.