How to Perform an ANCOVA in R
Analysis of Covariance (ANCOVA) is a statistical technique that blends ANOVA with linear regression. It allows you to compare group means on a dependent variable while controlling for one or more...
Key Insights
- ANCOVA combines ANOVA and regression to compare group means while statistically controlling for continuous covariates, increasing statistical power and reducing error variance.
- The homogeneity of regression slopes assumption is critical and often overlooked—always test for an interaction between your covariate and grouping variable before proceeding.
- Use the
emmeanspackage for post-hoc comparisons because it calculates adjusted means that account for covariate effects, giving you accurate group comparisons.
Introduction to ANCOVA
Analysis of Covariance (ANCOVA) is a statistical technique that blends ANOVA with linear regression. It allows you to compare group means on a dependent variable while controlling for one or more continuous covariates that might influence the outcome.
Consider a practical scenario: you’re comparing the effectiveness of three teaching methods on student test scores. Students enter the study with different baseline abilities, which will affect their final scores regardless of which teaching method they receive. ANCOVA lets you statistically control for these baseline differences, giving you a cleaner comparison of the teaching methods themselves.
Use ANCOVA over standard ANOVA when:
- You have a continuous covariate that correlates with your outcome variable
- You want to reduce error variance and increase statistical power
- You need to control for pre-existing differences between groups (common in quasi-experimental designs)
- You’re analyzing pre-test/post-test designs where baseline scores matter
The key insight is that ANCOVA adjusts group means based on the covariate, allowing you to ask: “What would the group means look like if all participants had the same covariate value?”
ANCOVA Assumptions
Before running ANCOVA, you must verify several assumptions. Violating these can lead to incorrect conclusions.
Linearity: The relationship between the covariate and dependent variable must be linear within each group. Non-linear relationships require transformation or different analytical approaches.
Homogeneity of regression slopes: This is the most critical and unique assumption for ANCOVA. The regression slopes relating the covariate to the outcome must be equal across groups. If slopes differ significantly, the covariate’s effect varies by group, and standard ANCOVA is inappropriate.
Normality of residuals: The residuals from the model should be approximately normally distributed. ANCOVA is reasonably robust to minor violations with larger samples.
Homogeneity of variances: The variance of residuals should be similar across groups. This is the same assumption as in standard ANOVA.
Independence: Observations must be independent of each other. This is a design issue, not something you can test post-hoc.
Preparing Your Data
Let’s work with a simulated dataset that represents a realistic research scenario. We’ll examine how three different exercise programs affect weight loss while controlling for participants’ initial fitness levels.
# Set seed for reproducibility
set.seed(42)
# Create simulated dataset
n_per_group <- 30
exercise_data <- data.frame(
participant_id = 1:(n_per_group * 3),
program = factor(rep(c("Cardio", "Strength", "Combined"), each = n_per_group)),
initial_fitness = c(
rnorm(n_per_group, mean = 50, sd = 10),
rnorm(n_per_group, mean = 52, sd = 10),
rnorm(n_per_group, mean = 48, sd = 10)
)
)
# Generate weight loss with program effects and fitness covariate effect
exercise_data$weight_loss <- with(exercise_data,
5 + # baseline
0.15 * initial_fitness + # covariate effect
ifelse(program == "Cardio", 2,
ifelse(program == "Strength", 1, 4)) + # program effects
rnorm(n_per_group * 3, mean = 0, sd = 2) # random error
)
# Check data structure
str(exercise_data)
'data.frame': 90 obs. of 4 variables:
$ participant_id : int 1 2 3 4 5 6 7 8 9 10 ...
$ program : Factor w/ 3 levels "Cardio","Combined",..: 1 1 1 1 1 1 1 1 1 1 ...
$ initial_fitness: num 63.7 44.4 53.6 56.3 54.1 ...
$ weight_loss : num 17.2 14.4 13.6 15.8 15.5 ...
# Verify factor levels
levels(exercise_data$program)
# Quick summary by group
aggregate(weight_loss ~ program, data = exercise_data,
FUN = function(x) c(mean = mean(x), sd = sd(x)))
Ensure your grouping variable is a factor with appropriate levels. If it’s stored as character or numeric, convert it explicitly with factor().
Checking Assumptions
Testing assumptions before analysis prevents you from drawing invalid conclusions.
# Load required packages
library(car)
library(ggplot2)
# 1. Check linearity with scatter plots
ggplot(exercise_data, aes(x = initial_fitness, y = weight_loss, color = program)) +
geom_point(alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Linearity Check: Covariate vs Outcome by Group",
x = "Initial Fitness Score",
y = "Weight Loss (lbs)") +
theme_minimal()
The scatter plot should show roughly parallel regression lines. If lines cross or diverge dramatically, you have a homogeneity of slopes problem.
# 2. Test homogeneity of regression slopes (critical assumption)
# Fit model with interaction term
interaction_model <- lm(weight_loss ~ initial_fitness * program, data = exercise_data)
Anova(interaction_model, type = "III")
Anova Table (Type III tests)
Response: weight_loss
Sum Sq Df F value Pr(>F)
(Intercept) 53.42 1 13.5765 0.0004034 ***
initial_fitness 449.32 1 114.2026 < 2.2e-16 ***
program 4.15 2 0.5271 0.5924133
initial_fitness:program 2.89 2 0.3676 0.6936892
Residuals 330.42 84
The interaction term (initial_fitness:program) is not significant (p = 0.69), confirming homogeneity of slopes. If this were significant (p < 0.05), you’d need to consider alternatives like Johnson-Neyman analysis or separate regressions per group.
# 3. Fit the ANCOVA model (without interaction) for residual diagnostics
ancova_model <- lm(weight_loss ~ initial_fitness + program, data = exercise_data)
# 4. Test normality of residuals
shapiro.test(residuals(ancova_model))
Shapiro-Wilk normality test
data: residuals(ancova_model)
W = 0.98587, p-value = 0.4297
# 5. Test homogeneity of variances
leveneTest(weight_loss ~ program, data = exercise_data)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 2 0.2841 0.7534
87
Both tests are non-significant, indicating assumptions are met. For the Shapiro-Wilk test, p > 0.05 suggests normality. For Levene’s test, p > 0.05 suggests equal variances.
Running the ANCOVA Model
With assumptions verified, fit the ANCOVA model. You can use either aov() or lm()—both give equivalent results, but lm() provides more flexibility for diagnostics.
# Fit ANCOVA using aov()
ancova_aov <- aov(weight_loss ~ initial_fitness + program, data = exercise_data)
summary(ancova_aov)
Df Sum Sq Mean Sq F value Pr(>F)
initial_fitness 1 454.77 454.77 117.99 < 2e-16 ***
program 2 127.16 63.58 16.50 8.3e-07 ***
Residuals 86 331.47 3.85
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Alternative: using lm() with Type III sums of squares
ancova_lm <- lm(weight_loss ~ initial_fitness + program, data = exercise_data)
Anova(ancova_lm, type = "III")
Anova Table (Type III tests)
Response: weight_loss
Sum Sq Df F value Pr(>F)
(Intercept) 57.38 1 14.8895 0.0002233 ***
initial_fitness 454.77 1 117.9907 < 2.2e-16 ***
program 127.16 2 16.4969 8.298e-07 ***
Residuals 331.47 86
Interpreting the output: The covariate (initial_fitness) is highly significant (p < 0.001), confirming it explains substantial variance in weight loss. More importantly, the program effect is also significant (p < 0.001), indicating that after controlling for initial fitness, the exercise programs produce different weight loss outcomes.
# Get model coefficients for deeper understanding
summary(ancova_lm)
The coefficients show the adjusted effect of each program relative to the reference category (alphabetically first: Cardio). The initial_fitness coefficient tells you how much weight loss increases for each unit increase in fitness score.
Post-Hoc Analysis and Adjusted Means
A significant overall effect tells you groups differ, but not which specific groups differ from each other. Use emmeans for proper post-hoc comparisons that account for the covariate.
library(emmeans)
# Calculate estimated marginal means (adjusted means)
emm <- emmeans(ancova_lm, ~ program)
emm
program emmean SE df lower.CL upper.CL
Cardio 14.5 0.359 86 13.8 15.2
Combined 16.4 0.358 86 15.7 17.1
Strength 13.5 0.359 86 12.8 14.2
Confidence level used: 0.95
These are the adjusted means—what each group’s mean weight loss would be if all participants had the same initial fitness level (the grand mean of the covariate).
# Pairwise comparisons with Tukey adjustment
pairs(emm, adjust = "tukey")
contrast estimate SE df t.ratio p.value
Cardio - Combined -1.88 0.508 86 -3.709 0.0010
Cardio - Strength 1.02 0.508 86 2.016 0.1131
Combined - Strength 2.91 0.507 86 5.736 <.0001
P value adjustment: tukey method for comparing a family of 3 estimates
The Combined program produces significantly more weight loss than both Cardio (p = 0.001) and Strength (p < 0.001). Cardio and Strength don’t differ significantly from each other (p = 0.11).
# Effect sizes (Cohen's d equivalent)
eff_size(emm, sigma = sigma(ancova_lm), edf = df.residual(ancova_lm))
Reporting and Visualization
Create a publication-ready visualization showing both the raw data and adjusted group means.
# Get adjusted means as a data frame
emm_df <- as.data.frame(emm)
# Calculate grand mean of covariate for plotting
grand_mean_fitness <- mean(exercise_data$initial_fitness)
# Create visualization
ggplot(exercise_data, aes(x = initial_fitness, y = weight_loss, color = program)) +
geom_point(alpha = 0.6, size = 2) +
geom_smooth(method = "lm", se = FALSE, linetype = "dashed") +
geom_vline(xintercept = grand_mean_fitness, linetype = "dotted", color = "gray40") +
geom_point(data = emm_df,
aes(x = grand_mean_fitness, y = emmean),
size = 4, shape = 18) +
geom_errorbar(data = emm_df,
aes(x = grand_mean_fitness, y = emmean,
ymin = lower.CL, ymax = upper.CL),
width = 1, size = 1) +
labs(title = "Weight Loss by Exercise Program",
subtitle = "Diamonds show adjusted means (±95% CI) at mean initial fitness",
x = "Initial Fitness Score",
y = "Weight Loss (lbs)",
color = "Program") +
theme_minimal() +
theme(legend.position = "bottom")
Reporting in text: “An ANCOVA was conducted to compare weight loss across three exercise programs while controlling for initial fitness levels. After adjusting for initial fitness, there was a statistically significant difference between programs, F(2, 86) = 16.50, p < .001. Post-hoc comparisons using Tukey’s HSD indicated that the Combined program (M = 16.4 lbs, SE = 0.36) produced significantly greater weight loss than both the Cardio program (M = 14.5 lbs, SE = 0.36, p = .001) and the Strength program (M = 13.5 lbs, SE = 0.36, p < .001).”
ANCOVA is a powerful technique when used appropriately. Always check assumptions—especially homogeneity of slopes—and report adjusted means rather than raw means when describing group differences. The emmeans package is your best friend for accurate post-hoc comparisons that properly account for covariate effects.