How to Perform the Bartlett Test in R

Key Insights

The Bartlett test checks whether multiple groups have equal variances (homoscedasticity), a critical assumption for valid ANOVA results—but it’s highly sensitive to non-normality, so verify your data is normally distributed first.
Use the formula interface bartlett.test(value ~ group, data = df) for clean, readable code that integrates well with R’s statistical workflow.
When Bartlett’s test rejects equal variances, don’t abandon your analysis—switch to Welch’s ANOVA with oneway.test() or consider Levene’s test if normality is questionable.

Introduction to the Bartlett Test

The Bartlett test is a statistical procedure that tests whether multiple samples have equal variances. This property—called homogeneity of variances or homoscedasticity—is a fundamental assumption of one-way ANOVA and many other parametric tests.

When you run an ANOVA, you’re comparing means across groups. The math behind ANOVA pools variance estimates from all groups, which only makes sense if those variances are roughly equal. If one group has dramatically higher variance than others, your F-statistic becomes unreliable, and your p-values can’t be trusted.

The Bartlett test gives you a formal way to check this assumption before proceeding. It’s named after Maurice Bartlett, who developed it in 1937. The test is powerful and widely used, but it comes with an important caveat: it’s extremely sensitive to departures from normality. This sensitivity is both a strength and a weakness, depending on your data.

Assumptions and When to Use

Before running the Bartlett test, understand what it requires:

Normality: Each group’s data must be approximately normally distributed. This is non-negotiable. The Bartlett test is so sensitive to non-normality that violations will inflate your Type I error rate, causing you to reject equal variances even when they’re actually equal.

Continuous data: The test works with continuous measurements. It’s not appropriate for count data or ordinal scales.

Independent samples: Observations within and between groups must be independent. No repeated measures or paired designs.

When to choose Bartlett over Levene’s test: Use Bartlett when you’re confident your data is normally distributed. It’s more powerful than Levene’s test under normality—meaning it’s better at detecting true differences in variance. However, if you have any doubts about normality, Levene’s test is the safer choice because it’s robust to non-normal distributions.

A practical workflow: first test for normality using Shapiro-Wilk tests on each group. If normality holds, use Bartlett. If not, use Levene’s test instead.

Basic Syntax and Implementation

R provides the Bartlett test through the built-in bartlett.test() function in the stats package. No additional packages required.

The function accepts two main interfaces:

# Formula interface (recommended)
bartlett.test(response ~ group, data = your_data)

# List interface
bartlett.test(list(group1, group2, group3))

The formula interface is cleaner and integrates better with data frames. Let’s see it in action with R’s built-in PlantGrowth dataset:

# Load and examine the data
data(PlantGrowth)
str(PlantGrowth)

'data.frame':	30 obs. of  2 variables:
 $ weight: num  4.17 5.58 5.18 6.11 4.5 4.61 5.17 4.53 5.33 5.14 ...
 $ group : Factor w/ 3 levels "ctrl","trt1",..: 1 1 1 1 1 1 1 1 1 1 ...

This dataset contains plant weights across three treatment groups: control, treatment 1, and treatment 2. Perfect for demonstrating variance homogeneity testing.

# Run the Bartlett test
bartlett.test(weight ~ group, data = PlantGrowth)

	Bartlett test of homogeneity of variances

data:  weight by group
Bartlett's K-squared = 2.8786, df = 2, p-value = 0.2371

That’s all it takes. One line of code gives you everything you need to assess variance homogeneity.

Interpreting the Results

The output contains three key components:

Bartlett’s K-squared (test statistic): This measures how much the group variances deviate from a common variance. Higher values indicate greater heterogeneity. Under the null hypothesis of equal variances, this statistic follows a chi-squared distribution.

Degrees of freedom (df): Equal to k-1, where k is the number of groups. With three groups in PlantGrowth, we have 2 degrees of freedom.

p-value: The probability of observing a test statistic this extreme if variances are truly equal.

The null hypothesis states that all group variances are equal. The alternative hypothesis states that at least one group has a different variance.

Decision rule: If p-value < α (typically 0.05), reject the null hypothesis and conclude variances are unequal. If p-value ≥ α, fail to reject—variances can be considered equal for practical purposes.

In our PlantGrowth example, p = 0.2371 > 0.05, so we fail to reject the null hypothesis. The variances are homogeneous, and we can proceed with standard ANOVA.

You can extract individual components programmatically:

# Store the result
bt_result <- bartlett.test(weight ~ group, data = PlantGrowth)

# Extract components
bt_result$statistic  # K-squared value
bt_result$parameter  # degrees of freedom
bt_result$p.value    # p-value

# Programmatic decision
if (bt_result$p.value >= 0.05) {
  message("Variances are homogeneous. Proceed with standard ANOVA.")
} else {
  message("Variances are heterogeneous. Consider Welch's ANOVA.")
}

Variances are homogeneous. Proceed with standard ANOVA.

Practical Example with Real Data

Let’s walk through a complete analysis workflow. We’ll use the iris dataset to compare sepal widths across species, demonstrating the full process from assumption checking to ANOVA.

# Load data
data(iris)

# Step 1: Visual inspection of distributions
boxplot(Sepal.Width ~ Species, data = iris,
        main = "Sepal Width by Species",
        ylab = "Sepal Width (cm)")

# Step 2: Check normality for each group
by(iris$Sepal.Width, iris$Species, shapiro.test)

iris$Species: setosa

	Shapiro-Wilk normality test

data:  dd[x, ]
W = 0.97172, p-value = 0.2715

------------------------------------------------------------ 
iris$Species: versicolor

	Shapiro-Wilk normality test

data:  dd[x, ]
W = 0.97413, p-value = 0.338

------------------------------------------------------------ 
iris$Species: virginica

	Shapiro-Wilk normality test

data:  dd[x, ]
W = 0.96739, p-value = 0.1809

All three species pass the normality test (p > 0.05), so Bartlett’s test is appropriate.

# Step 3: Test variance homogeneity
bartlett_result <- bartlett.test(Sepal.Width ~ Species, data = iris)
print(bartlett_result)

	Bartlett test of homogeneity of variances

data:  Sepal.Width by Species
Bartlett's K-squared = 2.0911, df = 2, p-value = 0.3515

# Step 4: Proceed with ANOVA (variances are homogeneous)
if (bartlett_result$p.value >= 0.05) {
  anova_result <- aov(Sepal.Width ~ Species, data = iris)
  print(summary(anova_result))
} else {
  welch_result <- oneway.test(Sepal.Width ~ Species, data = iris)
  print(welch_result)
}

             Df Sum Sq Mean Sq F value Pr(>F)    
Species       2  11.35   5.672   49.16 <2e-16 ***
Residuals   147  16.96   0.115                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The workflow is clean: check normality, test variances, then choose the appropriate ANOVA method. This approach protects you from invalid statistical conclusions.

Handling Test Failures and Alternatives

When Bartlett’s test indicates unequal variances (p < 0.05), you have several options.

Option 1: Use Welch’s ANOVA

Welch’s ANOVA doesn’t assume equal variances. It’s the most straightforward solution:

# Welch's ANOVA (does not assume equal variances)
oneway.test(Sepal.Width ~ Species, data = iris, var.equal = FALSE)

	One-way analysis of means (not assuming equal variances)

data:  Sepal.Width and Species
F = 45.012, df = 2.00, 97.45, p-value = 1.433e-14

Option 2: Use Levene’s test when normality is questionable

If you suspect non-normality is causing Bartlett’s test to fail, switch to Levene’s test:

# Install car package if needed
# install.packages("car")
library(car)

# Levene's test (robust to non-normality)
leveneTest(Sepal.Width ~ Species, data = iris)

Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   2  0.5902 0.5555
      147

Levene’s test uses deviations from the group median by default, making it robust against non-normal distributions. You can also use center = "mean" for the original formulation, though the median version is generally preferred.

Option 3: Transform your data

Sometimes a log or square root transformation stabilizes variances:

# Log transformation example
iris$log_width <- log(iris$Sepal.Width)
bartlett.test(log_width ~ Species, data = iris)

Only use transformations if they make theoretical sense for your data. Don’t transform just to pass a test.

Option 4: Use non-parametric alternatives

If assumptions are severely violated, consider the Kruskal-Wallis test:

kruskal.test(Sepal.Width ~ Species, data = iris)

This test compares distributions without assuming normality or equal variances, though it’s less powerful than ANOVA when assumptions are met.

Conclusion

The Bartlett test is a valuable tool for validating ANOVA assumptions, but use it wisely. Always check normality first—running Bartlett on non-normal data produces unreliable results. When normality holds, Bartlett is more powerful than alternatives. When it doesn’t, switch to Levene’s test.

Build assumption checking into your standard workflow. A few extra lines of code can save you from publishing invalid results. When variances are unequal, don’t panic—Welch’s ANOVA handles heteroscedasticity gracefully and is increasingly recommended as the default approach regardless of Bartlett test results.

The key is understanding what each test tells you and making informed decisions based on the complete picture of your data.