How to Perform a Two-Way ANOVA in R

Two-way ANOVA extends one-way ANOVA by examining the effects of two categorical independent variables on a continuous dependent variable simultaneously. While one-way ANOVA answers 'Does fertilizer...

Key Insights

  • Two-way ANOVA tests how two categorical independent variables (and their interaction) affect a continuous outcome—use it when you need to understand whether effects of one factor depend on levels of another
  • The interaction term (factor1 * factor2) is the most important part of your analysis; a significant interaction means you cannot interpret main effects in isolation
  • Always check assumptions with residual diagnostics before trusting your p-values, and use post-hoc tests only when ANOVA results are significant

Introduction to Two-Way ANOVA

Two-way ANOVA extends one-way ANOVA by examining the effects of two categorical independent variables on a continuous dependent variable simultaneously. While one-way ANOVA answers “Does fertilizer type affect crop yield?”, two-way ANOVA answers “Do fertilizer type AND watering frequency affect crop yield, and does the effect of fertilizer depend on how much you water?”

This distinction matters. If you run two separate one-way ANOVAs, you miss the interaction effect—the possibility that fertilizer A works better with frequent watering while fertilizer B works better with infrequent watering. Two-way ANOVA captures this complexity in a single model.

The analysis produces three key outputs: two main effects (one for each factor) and one interaction effect. The interaction effect tells you whether the impact of one factor changes across levels of the other factor. When interaction is significant, it takes interpretive priority over main effects.

Assumptions and Prerequisites

Two-way ANOVA requires three core assumptions:

  1. Independence: Observations must be independent of each other
  2. Normality: Residuals should be approximately normally distributed
  3. Homogeneity of variance: Variance should be roughly equal across all group combinations

Violating these assumptions doesn’t automatically invalidate your analysis—ANOVA is reasonably robust to moderate violations, especially with balanced designs and larger sample sizes. But you should check them.

Start by installing and loading the necessary packages:

# Install packages if needed
install.packages(c("car", "tidyverse", "ggplot2"))

# Load packages
library(car)       # For Levene's test
library(tidyverse) # For data manipulation
library(ggplot2)   # For visualization

The car package provides leveneTest() for assumption checking. Base R handles everything else, but tidyverse and ggplot2 make data manipulation and visualization cleaner.

Preparing Your Data

Your data needs a specific structure: one continuous response variable and two categorical predictor variables stored as factors. R will coerce character variables to factors automatically in aov(), but explicit factor conversion gives you control over level ordering.

Let’s create a realistic dataset. Imagine an agricultural experiment testing three fertilizer types across two watering frequencies, measuring crop yield in kilograms:

# Create sample dataset
set.seed(42)

crops <- data.frame(
  fertilizer = factor(rep(c("None", "Organic", "Synthetic"), each = 20)),
  water = factor(rep(rep(c("Low", "High"), each = 10), 3)),
  yield = c(
    # No fertilizer: Low water, High water
    rnorm(10, mean = 20, sd = 3), rnorm(10, mean = 25, sd = 3),
    # Organic fertilizer: Low water, High water
    rnorm(10, mean = 28, sd = 3), rnorm(10, mean = 38, sd = 3),
    # Synthetic fertilizer: Low water, High water
    rnorm(10, mean = 35, sd = 3), rnorm(10, mean = 40, sd = 3)
  )
)

# Check data structure
str(crops)
'data.frame':	60 obs. of  3 variables:
 $ fertilizer: Factor w/ 3 levels "None","Organic",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ water     : Factor w/ 2 levels "High","Low": 2 2 2 2 2 2 2 2 2 2 ...
 $ yield     : num  24.1 18.3 21.1 21.9 21.2 ...

Verify factor levels and get summary statistics:

# Check factor levels
levels(crops$fertilizer)
levels(crops$water)

# Summary statistics by group
crops %>%
  group_by(fertilizer, water) %>%
  summarise(
    n = n(),
    mean_yield = mean(yield),
    sd_yield = sd(yield),
    .groups = "drop"
  )

This summary table reveals your cell sizes (should be balanced for optimal power) and gives you a preview of potential effects before running the formal analysis.

Running the Two-Way ANOVA

The aov() function fits the ANOVA model. The formula syntax yield ~ fertilizer * water expands to yield ~ fertilizer + water + fertilizer:water, testing both main effects and their interaction:

# Fit two-way ANOVA model
model <- aov(yield ~ fertilizer * water, data = crops)

# View ANOVA table
summary(model)
                   Df Sum Sq Mean Sq F value   Pr(>F)    
fertilizer          2 2847.5  1423.8  158.51  < 2e-16 ***
water               1  726.4   726.4   80.86 1.27e-12 ***
fertilizer:water    2  188.2    94.1   10.48 0.000144 ***
Residuals          54  485.1     9.0                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpret this output systematically:

  1. fertilizer main effect (p < 2e-16): Significant. Fertilizer type affects yield when averaged across watering levels.
  2. water main effect (p = 1.27e-12): Significant. Watering frequency affects yield when averaged across fertilizer types.
  3. fertilizer:water interaction (p = 0.000144): Significant. The effect of fertilizer depends on watering frequency (or vice versa).

The significant interaction is the critical finding. It means the main effects tell an incomplete story—you need to examine how fertilizer effects vary across watering conditions.

Checking Assumptions

Never report ANOVA results without checking assumptions. Start with normality of residuals using the Shapiro-Wilk test:

# Extract residuals
residuals <- residuals(model)

# Shapiro-Wilk test for normality
shapiro.test(residuals)
	Shapiro-Wilk normality test

data:  residuals
W = 0.98432, p-value = 0.6284

A non-significant result (p > 0.05) indicates residuals don’t significantly deviate from normality. Here, p = 0.63, so normality holds.

Next, test homogeneity of variance with Levene’s test:

# Levene's test for homogeneity of variance
leveneTest(yield ~ fertilizer * water, data = crops)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  5  0.3892  0.854
      54               

Non-significant (p = 0.854), so equal variance assumption holds.

Visualize residuals for a more complete picture:

# Residual diagnostics plots
par(mfrow = c(2, 2))
plot(model)

Look for: random scatter in residuals vs. fitted (no patterns), points following the Q-Q line, and no influential outliers in the leverage plot.

Post-Hoc Analysis

Significant ANOVA results tell you that differences exist—not where they are. Post-hoc tests identify which specific group pairs differ. Tukey’s Honest Significant Difference (HSD) controls for multiple comparisons:

# Tukey's HSD post-hoc test
tukey_results <- TukeyHSD(model)
print(tukey_results)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = yield ~ fertilizer * water, data = crops)

$fertilizer
                     diff       lwr       upr     p adj
Organic-None     10.51234   8.04567  12.97901 0.0000000
Synthetic-None   14.87432  12.40765  17.34099 0.0000000
Synthetic-Organic 4.36198   1.89531   6.82865 0.0002438

$water
          diff      lwr      upr p adj
Low-High -6.96 -8.51211 -5.40789     0

$`fertilizer:water`
                              diff        lwr        upr     p adj
Organic:High-None:High     12.8623   8.382123  17.342477 0.0000000
...

Focus on the interaction comparisons (fertilizer:water) when interaction is significant. These tell you exactly which fertilizer-water combinations differ from each other.

For cleaner output, extract specific comparisons:

# View interaction comparisons only
tukey_results$`fertilizer:water`

Visualizing Results

Visualization makes interaction effects interpretable. Start with an interaction plot:

# Base R interaction plot
interaction.plot(
  x.factor = crops$water,
  trace.factor = crops$fertilizer,
  response = crops$yield,
  type = "b",
  legend = TRUE,
  xlab = "Watering Frequency",
  ylab = "Mean Yield (kg)",
  main = "Interaction Plot: Fertilizer × Water",
  col = c("black", "blue", "red"),
  pch = c(1, 2, 3)
)

Non-parallel lines indicate interaction. If lines cross or diverge substantially, the effect of one factor depends on the other.

For publication-quality graphics, use ggplot2:

# Calculate means and standard errors
plot_data <- crops %>%
  group_by(fertilizer, water) %>%
  summarise(
    mean_yield = mean(yield),
    se = sd(yield) / sqrt(n()),
    .groups = "drop"
  )

# Interaction plot with ggplot2
ggplot(plot_data, aes(x = water, y = mean_yield, 
                       color = fertilizer, group = fertilizer)) +
  geom_point(size = 3) +
  geom_line(linewidth = 1) +
  geom_errorbar(aes(ymin = mean_yield - se, ymax = mean_yield + se), 
                width = 0.1) +
  labs(
    x = "Watering Frequency",
    y = "Mean Yield (kg)",
    color = "Fertilizer Type",
    title = "Crop Yield by Fertilizer and Watering Frequency"
  ) +
  theme_minimal() +
  scale_color_brewer(palette = "Set1")

Box plots show the full distribution:

# Box plots with faceting
ggplot(crops, aes(x = fertilizer, y = yield, fill = water)) +
  geom_boxplot() +
  labs(
    x = "Fertilizer Type",
    y = "Yield (kg)",
    fill = "Watering",
    title = "Yield Distribution by Treatment Combination"
  ) +
  theme_minimal() +
  scale_fill_brewer(palette = "Pastel1")

The interaction in our data shows that organic fertilizer benefits more from high watering than synthetic fertilizer does—organic yield jumps dramatically with more water, while synthetic shows a smaller increase. This insight only emerges from two-way ANOVA; separate one-way analyses would miss it entirely.

When reporting results, include F-statistics, degrees of freedom, and p-values for all three effects, plus effect sizes (partial eta-squared) for context on practical significance. The interaction effect should drive your conclusions when significant.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.