How to Perform an ANCOVA in R

Key Insights

ANCOVA combines ANOVA and regression to compare group means while statistically controlling for continuous covariates, increasing statistical power and reducing error variance.
The homogeneity of regression slopes assumption is critical and often overlooked—always test for an interaction between your covariate and grouping variable before proceeding.
Use the emmeans package for post-hoc comparisons because it calculates adjusted means that account for covariate effects, giving you accurate group comparisons.

Introduction to ANCOVA

Analysis of Covariance (ANCOVA) is a statistical technique that blends ANOVA with linear regression. It allows you to compare group means on a dependent variable while controlling for one or more continuous covariates that might influence the outcome.

Consider a practical scenario: you’re comparing the effectiveness of three teaching methods on student test scores. Students enter the study with different baseline abilities, which will affect their final scores regardless of which teaching method they receive. ANCOVA lets you statistically control for these baseline differences, giving you a cleaner comparison of the teaching methods themselves.

Use ANCOVA over standard ANOVA when:

You have a continuous covariate that correlates with your outcome variable
You want to reduce error variance and increase statistical power
You need to control for pre-existing differences between groups (common in quasi-experimental designs)
You’re analyzing pre-test/post-test designs where baseline scores matter

The key insight is that ANCOVA adjusts group means based on the covariate, allowing you to ask: “What would the group means look like if all participants had the same covariate value?”

ANCOVA Assumptions

Before running ANCOVA, you must verify several assumptions. Violating these can lead to incorrect conclusions.

Linearity: The relationship between the covariate and dependent variable must be linear within each group. Non-linear relationships require transformation or different analytical approaches.

Homogeneity of regression slopes: This is the most critical and unique assumption for ANCOVA. The regression slopes relating the covariate to the outcome must be equal across groups. If slopes differ significantly, the covariate’s effect varies by group, and standard ANCOVA is inappropriate.

Normality of residuals: The residuals from the model should be approximately normally distributed. ANCOVA is reasonably robust to minor violations with larger samples.

Homogeneity of variances: The variance of residuals should be similar across groups. This is the same assumption as in standard ANOVA.

Independence: Observations must be independent of each other. This is a design issue, not something you can test post-hoc.

Preparing Your Data

Let’s work with a simulated dataset that represents a realistic research scenario. We’ll examine how three different exercise programs affect weight loss while controlling for participants’ initial fitness levels.

# Set seed for reproducibility
set.seed(42)

# Create simulated dataset
n_per_group <- 30

exercise_data <- data.frame(
  participant_id = 1:(n_per_group * 3),
  program = factor(rep(c("Cardio", "Strength", "Combined"), each = n_per_group)),
  initial_fitness = c(
    rnorm(n_per_group, mean = 50, sd = 10),
    rnorm(n_per_group, mean = 52, sd = 10),
    rnorm(n_per_group, mean = 48, sd = 10)
  )
)

# Generate weight loss with program effects and fitness covariate effect
exercise_data$weight_loss <- with(exercise_data,
  5 +                                           # baseline
  0.15 * initial_fitness +                      # covariate effect
  ifelse(program == "Cardio", 2, 
         ifelse(program == "Strength", 1, 4)) + # program effects
  rnorm(n_per_group * 3, mean = 0, sd = 2)      # random error
)

# Check data structure
str(exercise_data)

'data.frame':	90 obs. of  4 variables:
 $ participant_id : int  1 2 3 4 5 6 7 8 9 10 ...
 $ program        : Factor w/ 3 levels "Cardio","Combined",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ initial_fitness: num  63.7 44.4 53.6 56.3 54.1 ...
 $ weight_loss    : num  17.2 14.4 13.6 15.8 15.5 ...

# Verify factor levels
levels(exercise_data$program)

# Quick summary by group
aggregate(weight_loss ~ program, data = exercise_data, 
          FUN = function(x) c(mean = mean(x), sd = sd(x)))

Ensure your grouping variable is a factor with appropriate levels. If it’s stored as character or numeric, convert it explicitly with factor().

Checking Assumptions

Testing assumptions before analysis prevents you from drawing invalid conclusions.

# Load required packages
library(car)
library(ggplot2)

# 1. Check linearity with scatter plots
ggplot(exercise_data, aes(x = initial_fitness, y = weight_loss, color = program)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Linearity Check: Covariate vs Outcome by Group",
       x = "Initial Fitness Score",
       y = "Weight Loss (lbs)") +
  theme_minimal()

The scatter plot should show roughly parallel regression lines. If lines cross or diverge dramatically, you have a homogeneity of slopes problem.

# 2. Test homogeneity of regression slopes (critical assumption)
# Fit model with interaction term
interaction_model <- lm(weight_loss ~ initial_fitness * program, data = exercise_data)
Anova(interaction_model, type = "III")

Anova Table (Type III tests)

Response: weight_loss
                           Sum Sq Df  F value    Pr(>F)    
(Intercept)                 53.42  1  13.5765 0.0004034 ***
initial_fitness            449.32  1 114.2026 < 2.2e-16 ***
program                      4.15  2   0.5271 0.5924133    
initial_fitness:program      2.89  2   0.3676 0.6936892    
Residuals                  330.42 84

The interaction term (initial_fitness:program) is not significant (p = 0.69), confirming homogeneity of slopes. If this were significant (p < 0.05), you’d need to consider alternatives like Johnson-Neyman analysis or separate regressions per group.

# 3. Fit the ANCOVA model (without interaction) for residual diagnostics
ancova_model <- lm(weight_loss ~ initial_fitness + program, data = exercise_data)

# 4. Test normality of residuals
shapiro.test(residuals(ancova_model))

	Shapiro-Wilk normality test

data:  residuals(ancova_model)
W = 0.98587, p-value = 0.4297

# 5. Test homogeneity of variances
leveneTest(weight_loss ~ program, data = exercise_data)

Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  2  0.2841 0.7534
      87

Both tests are non-significant, indicating assumptions are met. For the Shapiro-Wilk test, p > 0.05 suggests normality. For Levene’s test, p > 0.05 suggests equal variances.

Running the ANCOVA Model

With assumptions verified, fit the ANCOVA model. You can use either aov() or lm()—both give equivalent results, but lm() provides more flexibility for diagnostics.

# Fit ANCOVA using aov()
ancova_aov <- aov(weight_loss ~ initial_fitness + program, data = exercise_data)
summary(ancova_aov)

                Df Sum Sq Mean Sq F value   Pr(>F)    
initial_fitness  1 454.77  454.77  117.99 < 2e-16 ***
program          2 127.16   63.58   16.50 8.3e-07 ***
Residuals       86 331.47    3.85                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Alternative: using lm() with Type III sums of squares
ancova_lm <- lm(weight_loss ~ initial_fitness + program, data = exercise_data)
Anova(ancova_lm, type = "III")

Anova Table (Type III tests)

Response: weight_loss
                 Sum Sq Df  F value    Pr(>F)    
(Intercept)       57.38  1  14.8895 0.0002233 ***
initial_fitness  454.77  1 117.9907 < 2.2e-16 ***
program          127.16  2  16.4969 8.298e-07 ***
Residuals        331.47 86

Interpreting the output: The covariate (initial_fitness) is highly significant (p < 0.001), confirming it explains substantial variance in weight loss. More importantly, the program effect is also significant (p < 0.001), indicating that after controlling for initial fitness, the exercise programs produce different weight loss outcomes.

# Get model coefficients for deeper understanding
summary(ancova_lm)

The coefficients show the adjusted effect of each program relative to the reference category (alphabetically first: Cardio). The initial_fitness coefficient tells you how much weight loss increases for each unit increase in fitness score.

Post-Hoc Analysis and Adjusted Means

A significant overall effect tells you groups differ, but not which specific groups differ from each other. Use emmeans for proper post-hoc comparisons that account for the covariate.

library(emmeans)

# Calculate estimated marginal means (adjusted means)
emm <- emmeans(ancova_lm, ~ program)
emm

 program  emmean    SE df lower.CL upper.CL
 Cardio     14.5 0.359 86     13.8     15.2
 Combined   16.4 0.358 86     15.7     17.1
 Strength   13.5 0.359 86     12.8     14.2

Confidence level used: 0.95

These are the adjusted means—what each group’s mean weight loss would be if all participants had the same initial fitness level (the grand mean of the covariate).

# Pairwise comparisons with Tukey adjustment
pairs(emm, adjust = "tukey")

 contrast            estimate    SE df t.ratio p.value
 Cardio - Combined      -1.88 0.508 86  -3.709  0.0010
 Cardio - Strength       1.02 0.508 86   2.016  0.1131
 Combined - Strength     2.91 0.507 86   5.736  <.0001

P value adjustment: tukey method for comparing a family of 3 estimates

The Combined program produces significantly more weight loss than both Cardio (p = 0.001) and Strength (p < 0.001). Cardio and Strength don’t differ significantly from each other (p = 0.11).

# Effect sizes (Cohen's d equivalent)
eff_size(emm, sigma = sigma(ancova_lm), edf = df.residual(ancova_lm))

Reporting and Visualization

Create a publication-ready visualization showing both the raw data and adjusted group means.

# Get adjusted means as a data frame
emm_df <- as.data.frame(emm)

# Calculate grand mean of covariate for plotting
grand_mean_fitness <- mean(exercise_data$initial_fitness)

# Create visualization
ggplot(exercise_data, aes(x = initial_fitness, y = weight_loss, color = program)) +
  geom_point(alpha = 0.6, size = 2) +
  geom_smooth(method = "lm", se = FALSE, linetype = "dashed") +
  geom_vline(xintercept = grand_mean_fitness, linetype = "dotted", color = "gray40") +
  geom_point(data = emm_df, 
             aes(x = grand_mean_fitness, y = emmean),
             size = 4, shape = 18) +
  geom_errorbar(data = emm_df,
                aes(x = grand_mean_fitness, y = emmean, 
                    ymin = lower.CL, ymax = upper.CL),
                width = 1, size = 1) +
  labs(title = "Weight Loss by Exercise Program",
       subtitle = "Diamonds show adjusted means (±95% CI) at mean initial fitness",
       x = "Initial Fitness Score",
       y = "Weight Loss (lbs)",
       color = "Program") +
  theme_minimal() +
  theme(legend.position = "bottom")

Reporting in text: “An ANCOVA was conducted to compare weight loss across three exercise programs while controlling for initial fitness levels. After adjusting for initial fitness, there was a statistically significant difference between programs, F(2, 86) = 16.50, p < .001. Post-hoc comparisons using Tukey’s HSD indicated that the Combined program (M = 16.4 lbs, SE = 0.36) produced significantly greater weight loss than both the Cardio program (M = 14.5 lbs, SE = 0.36, p = .001) and the Strength program (M = 13.5 lbs, SE = 0.36, p < .001).”

ANCOVA is a powerful technique when used appropriately. Always check assumptions—especially homogeneity of slopes—and report adjusted means rather than raw means when describing group differences. The emmeans package is your best friend for accurate post-hoc comparisons that properly account for covariate effects.