How to Perform an ANCOVA in Python

Key Insights

ANCOVA extends ANOVA by statistically controlling for continuous covariates, giving you more precise group comparisons and reducing error variance that would otherwise obscure real treatment effects.
Always verify assumptions before running ANCOVA—homogeneity of regression slopes is particularly critical and often overlooked; if slopes differ across groups, your results will be misleading.
Use statsmodels for full control and detailed diagnostics, or pingouin for quick exploratory analysis; both produce valid results, but understanding the underlying model matters more than the tool you choose.

Introduction to ANCOVA

Analysis of Covariance (ANCOVA) combines ANOVA with regression to compare group means while controlling for one or more continuous variables called covariates. This technique solves a common problem: you want to compare groups, but there’s a confounding variable that affects your outcome.

Consider a practical example. You’re evaluating three different training programs on employee productivity. However, employees entered these programs with varying levels of prior experience. Without accounting for experience, you can’t tell whether productivity differences stem from the training itself or from pre-existing skill levels. ANCOVA handles this by statistically adjusting group means based on the covariate.

Use ANCOVA when you have:

A categorical independent variable (groups to compare)
A continuous dependent variable (your outcome)
One or more continuous covariates that correlate with the outcome

The key advantage over standard ANOVA is increased statistical power. By removing variance explained by the covariate, ANCOVA reduces error variance and makes it easier to detect genuine group differences.

Assumptions of ANCOVA

ANCOVA carries stricter assumptions than ANOVA. Violating these can invalidate your results entirely.

Linearity: The relationship between the covariate and dependent variable must be linear within each group. Non-linear relationships require transformation or different methods.

Homogeneity of regression slopes: This is the critical assumption. The regression slopes relating the covariate to the outcome must be equal across all groups. If one training program benefits experienced employees more than another, ANCOVA results become uninterpretable.

Normality of residuals: Model residuals should be approximately normally distributed. Minor violations are tolerable with larger samples.

Homogeneity of variance: Residual variance should be similar across groups (homoscedasticity).

Independence of covariate and treatment: The covariate should not be affected by the treatment. Measure covariates before treatment assignment when possible.

Here’s how to check these assumptions visually:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Create sample data
np.random.seed(42)
n_per_group = 30

data = pd.DataFrame({
    'group': np.repeat(['A', 'B', 'C'], n_per_group),
    'experience': np.concatenate([
        np.random.normal(5, 2, n_per_group),
        np.random.normal(5, 2, n_per_group),
        np.random.normal(5, 2, n_per_group)
    ]),
})

# Create outcome with group effects and covariate relationship
data['productivity'] = (
    2 * data['experience'] +  # covariate effect
    np.where(data['group'] == 'A', 0, 
             np.where(data['group'] == 'B', 3, 6)) +  # group effects
    np.random.normal(0, 2, len(data))  # error
)

# Check homogeneity of regression slopes
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Plot 1: Regression lines by group
for group in data['group'].unique():
    subset = data[data['group'] == group]
    axes[0].scatter(subset['experience'], subset['productivity'], 
                    label=group, alpha=0.7)
    z = np.polyfit(subset['experience'], subset['productivity'], 1)
    p = np.poly1d(z)
    x_line = np.linspace(subset['experience'].min(), 
                         subset['experience'].max(), 100)
    axes[0].plot(x_line, p(x_line), linestyle='--')

axes[0].set_xlabel('Experience (covariate)')
axes[0].set_ylabel('Productivity (outcome)')
axes[0].set_title('Homogeneity of Regression Slopes Check')
axes[0].legend()

# Fit model for residual diagnostics
model = ols('productivity ~ experience + C(group)', data=data).fit()

# Plot 2: Residuals vs fitted for homoscedasticity
axes[1].scatter(model.fittedvalues, model.resid, alpha=0.7)
axes[1].axhline(y=0, color='r', linestyle='--')
axes[1].set_xlabel('Fitted Values')
axes[1].set_ylabel('Residuals')
axes[1].set_title('Residuals vs Fitted (Homoscedasticity Check)')

plt.tight_layout()
plt.show()

# Normality check with Q-Q plot
sm.qqplot(model.resid, line='45', fit=True)
plt.title('Q-Q Plot of Residuals')
plt.show()

Parallel lines in the first plot indicate homogeneous slopes. Random scatter around zero in the residual plot suggests homoscedasticity. Points following the diagonal in the Q-Q plot confirm normality.

Setting Up Your Environment and Data

Install the required packages:

pip install pandas numpy statsmodels scipy pingouin matplotlib seaborn

Let’s work with a realistic dataset—test scores from students taught with three different methods, controlling for their baseline aptitude:

import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm
import pingouin as pg

# Create realistic educational dataset
np.random.seed(123)
n = 40  # students per method

teaching_data = pd.DataFrame({
    'method': np.repeat(['Traditional', 'Blended', 'Online'], n),
    'aptitude': np.concatenate([
        np.random.normal(100, 15, n),
        np.random.normal(100, 15, n),
        np.random.normal(100, 15, n)
    ])
})

# Simulate test scores with method effects
base_effects = {'Traditional': 70, 'Blended': 75, 'Online': 72}
teaching_data['test_score'] = (
    teaching_data['method'].map(base_effects) +
    0.3 * (teaching_data['aptitude'] - 100) +  # aptitude effect
    np.random.normal(0, 8, len(teaching_data))
)

# Basic EDA
print("Dataset Shape:", teaching_data.shape)
print("\nDescriptive Statistics by Group:")
print(teaching_data.groupby('method').agg({
    'test_score': ['mean', 'std', 'count'],
    'aptitude': ['mean', 'std']
}).round(2))

print("\nCovariate-Outcome Correlation:")
print(f"r = {teaching_data['test_score'].corr(teaching_data['aptitude']):.3f}")

This exploratory analysis confirms the covariate correlates with the outcome—a prerequisite for ANCOVA to be useful.

Performing ANCOVA with Statsmodels

Statsmodels provides full control over the analysis. The key is specifying the correct model formula and using Type III sums of squares for unbalanced designs:

from statsmodels.stats.anova import anova_lm

# Fit the ANCOVA model
# C(method) treats 'method' as categorical
# aptitude is the covariate
ancova_model = ols('test_score ~ aptitude + C(method)', 
                   data=teaching_data).fit()

# Get Type III ANCOVA table (preferred for unbalanced designs)
ancova_table = anova_lm(ancova_model, typ=3)
print("ANCOVA Results (Type III SS):")
print(ancova_table.round(4))

# Model summary for detailed coefficients
print("\nModel Coefficients:")
print(ancova_model.summary().tables[1])

The output shows:

aptitude: Tests whether the covariate significantly predicts the outcome
C(method): Tests whether group means differ after controlling for aptitude
Residual: Unexplained variance

A significant p-value for C(method) indicates that teaching methods differ in effectiveness beyond what aptitude explains.

For proper Type III sums of squares with custom contrasts:

from statsmodels.stats.anova import AnovaRM

# Alternative: explicit Type III with sum coding
teaching_data['method_coded'] = pd.Categorical(teaching_data['method'])

model_type3 = ols('test_score ~ C(method, Sum) + aptitude', 
                  data=teaching_data).fit()

anova_type3 = sm.stats.anova_lm(model_type3, typ=3)
print("Type III ANCOVA with Sum Coding:")
print(anova_type3.round(4))

Alternative: ANCOVA with Pingouin

Pingouin offers a cleaner interface for quick analysis:

import pingouin as pg

# One-liner ANCOVA
ancova_results = pg.ancova(data=teaching_data, 
                           dv='test_score', 
                           covar='aptitude', 
                           between='method')

print("Pingouin ANCOVA Results:")
print(ancova_results.round(4))

Pingouin automatically calculates partial eta-squared (effect size), making interpretation straightforward. Values of 0.01, 0.06, and 0.14 represent small, medium, and large effects respectively.

The trade-off: Pingouin is convenient but offers less flexibility for complex models or custom contrasts.

Interpreting Results and Post-Hoc Analysis

Understanding ANCOVA output requires examining adjusted means—group means recalculated as if all groups had the same covariate value:

# Calculate adjusted means
grand_mean_aptitude = teaching_data['aptitude'].mean()

# Get model predictions at the grand mean of the covariate
adjusted_means = {}
for method in teaching_data['method'].unique():
    # Predict at grand mean aptitude for each group
    pred_data = pd.DataFrame({
        'method': [method],
        'aptitude': [grand_mean_aptitude]
    })
    adjusted_means[method] = ancova_model.predict(pred_data)[0]

print("Adjusted Means (at mean aptitude):")
for method, adj_mean in adjusted_means.items():
    raw_mean = teaching_data[teaching_data['method'] == method]['test_score'].mean()
    print(f"{method}: Adjusted = {adj_mean:.2f}, Raw = {raw_mean:.2f}")

# Calculate partial eta-squared manually
ss_method = ancova_table.loc['C(method)', 'sum_sq']
ss_residual = ancova_table.loc['Residual', 'sum_sq']
partial_eta_sq = ss_method / (ss_method + ss_residual)
print(f"\nPartial η² for method: {partial_eta_sq:.4f}")

When the main effect is significant, conduct post-hoc pairwise comparisons:

# Post-hoc pairwise comparisons using pingouin
posthoc = pg.pairwise_tests(data=teaching_data,
                            dv='test_score',
                            between='method',
                            covar='aptitude',
                            padjust='bonferroni')

print("\nPost-Hoc Pairwise Comparisons (Bonferroni corrected):")
print(posthoc[['A', 'B', 'T', 'p-unc', 'p-corr']].round(4))

# Alternative: emmeans-style comparisons with statsmodels
from statsmodels.stats.multicomp import pairwise_tukeyhsd

# Get residualized scores (outcome adjusted for covariate)
teaching_data['adjusted_score'] = ancova_model.resid + ancova_model.fittedvalues.mean()

tukey = pairwise_tukeyhsd(teaching_data['adjusted_score'], 
                          teaching_data['method'])
print("\nTukey HSD on Adjusted Scores:")
print(tukey)

Conclusion

ANCOVA is a powerful technique when you need to compare groups while accounting for continuous confounders. The workflow is straightforward:

Verify assumptions, especially homogeneity of regression slopes
Fit the model with your covariate and grouping variable
Examine the ANCOVA table for significant effects
Calculate adjusted means and effect sizes
Run post-hoc tests if the main effect is significant

Choose statsmodels when you need detailed diagnostics, custom contrasts, or integration with other regression analyses. Use pingouin for quick exploratory work or when you want automatic effect size calculations.

When ANCOVA assumptions fail, consider alternatives: if slopes differ across groups, interaction models or Johnson-Neyman analysis may be appropriate. For non-normal data or small samples, robust regression or permutation-based approaches offer protection against violations. For repeated measures with covariates, mixed-effects models provide the flexibility ANCOVA lacks.