How to Perform a MANOVA in Python

Key Insights

MANOVA extends ANOVA to multiple dependent variables simultaneously, controlling Type I error inflation while detecting multivariate group differences that univariate tests might miss.
The four test statistics (Wilks’ Lambda, Pillai’s Trace, Hotelling-Lawley, Roy’s Greatest Root) have different strengths: Pillai’s Trace is most robust to assumption violations, while Wilks’ Lambda remains the most commonly reported.
Always follow significant MANOVA results with univariate ANOVAs and post-hoc tests to identify which specific variables and group comparisons drive the overall effect.

Introduction to MANOVA

Multivariate Analysis of Variance (MANOVA) answers a question that single-variable ANOVA cannot: do groups differ across multiple outcome variables considered together? When you have two or more dependent variables that are conceptually related, running separate ANOVAs inflates your Type I error rate and ignores correlations between outcomes.

Consider a clinical trial measuring both blood pressure and cholesterol levels across treatment groups. These outcomes are correlated—patients with high blood pressure often have elevated cholesterol. MANOVA accounts for this relationship, providing a single omnibus test of whether treatment groups differ on the combined outcome profile.

Real-world applications include educational research (comparing teaching methods on multiple test scores), psychology (treatment effects on anxiety, depression, and stress simultaneously), and manufacturing (quality control across multiple product dimensions).

MANOVA Assumptions

MANOVA requires several assumptions, and violations affect your results’ validity differently:

Multivariate Normality: Each group’s dependent variables should follow a multivariate normal distribution. Moderate violations are tolerable with balanced designs and sample sizes above 20 per group. Severe skewness or outliers warrant data transformation or non-parametric alternatives.

Homogeneity of Covariance Matrices: The variance-covariance matrices should be equal across groups. Box’s M test checks this assumption, but it’s notoriously sensitive to non-normality. With equal group sizes, MANOVA is reasonably robust to violations.

Independence: Observations must be independent. This is a design issue—violated independence (e.g., repeated measures, clustered data) requires different analytical approaches entirely.

Absence of Multicollinearity: Dependent variables should be correlated but not redundant. Correlations above 0.90 suggest you might combine variables or drop one. Perfectly correlated variables cause computational problems.

Linear Relationships: The dependent variables should have linear relationships with each other within each group.

Setting Up Your Environment

You’ll need several Python libraries for a complete MANOVA workflow:

# Core libraries
import numpy as np
import pandas as pd
from scipy import stats

# Statistical modeling
import statsmodels.api as sm
from statsmodels.multivariate.manova import MANOVA
from statsmodels.formula.api import ols
from statsmodels.stats.multicomp import pairwise_tukeyhsd
from statsmodels.stats.anova import anova_lm

# Visualization
import seaborn as sns
import matplotlib.pyplot as plt

# Set random seed for reproducibility
np.random.seed(42)

Let’s create a realistic dataset simulating a study comparing three exercise programs on cardiovascular health outcomes:

# Simulate clinical trial data
n_per_group = 40

# Control group
control = pd.DataFrame({
    'program': 'Control',
    'systolic_bp': np.random.normal(135, 12, n_per_group),
    'diastolic_bp': np.random.normal(88, 8, n_per_group),
    'resting_hr': np.random.normal(75, 10, n_per_group)
})

# Moderate exercise program
moderate = pd.DataFrame({
    'program': 'Moderate',
    'systolic_bp': np.random.normal(128, 11, n_per_group),
    'diastolic_bp': np.random.normal(82, 7, n_per_group),
    'resting_hr': np.random.normal(68, 9, n_per_group)
})

# Intensive exercise program
intensive = pd.DataFrame({
    'program': 'Intensive',
    'systolic_bp': np.random.normal(122, 10, n_per_group),
    'diastolic_bp': np.random.normal(78, 7, n_per_group),
    'resting_hr': np.random.normal(62, 8, n_per_group)
})

# Combine datasets
df = pd.concat([control, moderate, intensive], ignore_index=True)
print(df.groupby('program')[['systolic_bp', 'diastolic_bp', 'resting_hr']].mean().round(2))

Performing MANOVA with Statsmodels

The statsmodels library provides a straightforward interface for MANOVA through the MANOVA class. The formula syntax mirrors R’s conventions:

# Define the MANOVA model
# Dependent variables on the left, separated by +
# Independent variable(s) on the right after ~
manova = MANOVA.from_formula(
    'systolic_bp + diastolic_bp + resting_hr ~ program', 
    data=df
)

# Run the test and get results
results = manova.mv_test()
print(results)

This produces output containing all four multivariate test statistics:

# Access specific test results programmatically
print("\n=== Detailed MANOVA Results ===")
print(results.results['program']['stat'])

The output table includes:

Wilks’ Lambda: Ranges from 0 to 1; smaller values indicate larger group differences
Pillai’s Trace: Ranges from 0 to number of groups minus 1; larger values indicate larger differences
Hotelling-Lawley Trace: No upper bound; larger values indicate larger differences
Roy’s Greatest Root: Based on the largest eigenvalue; most powerful when differences exist on one dimension

Interpreting MANOVA Results

Reading MANOVA output requires understanding what each statistic tells you:

# Extract and display results in a cleaner format
stat_table = results.results['program']['stat']

print("Test Statistic Interpretation Guide:")
print("=" * 50)
print(f"Wilks' Lambda: {stat_table.loc['Wilks\' lambda', 'Value']:.4f}")
print(f"  - Closer to 0 = stronger effect")
print(f"  - F({stat_table.loc['Wilks\' lambda', 'Num DF']:.0f}, "
      f"{stat_table.loc['Wilks\' lambda', 'Den DF']:.0f}) = "
      f"{stat_table.loc['Wilks\' lambda', 'F Value']:.2f}")
print(f"  - p-value: {stat_table.loc['Wilks\' lambda', 'Pr > F']:.6f}")

Which test statistic should you report? Here’s practical guidance:

Pillai’s Trace: Most robust to violations of assumptions. Use when you have unequal group sizes, suspect non-normality, or have small samples. It’s the safest default choice.
Wilks’ Lambda: Most commonly reported in literature. Use with balanced designs and when assumptions are reasonably met.
Hotelling-Lawley Trace: More powerful than Pillai’s when group differences concentrate on one linear combination of dependent variables.
Roy’s Greatest Root: Most powerful when differences exist primarily on one dimension, but also most sensitive to assumption violations. Avoid with unequal covariance matrices.

A significant MANOVA (typically p < 0.05) indicates that at least one group differs from others on the combined dependent variables. It doesn’t tell you which groups differ or on which variables.

Post-Hoc Analysis

When MANOVA yields significance, follow up with univariate ANOVAs to identify which dependent variables contribute to the effect:

# Run univariate ANOVAs for each dependent variable
dependent_vars = ['systolic_bp', 'diastolic_bp', 'resting_hr']

print("=== Follow-up Univariate ANOVAs ===\n")

for dv in dependent_vars:
    model = ols(f'{dv} ~ C(program)', data=df).fit()
    anova_table = anova_lm(model, typ=2)
    
    f_val = anova_table.loc['C(program)', 'F']
    p_val = anova_table.loc['C(program)', 'PR(>F)']
    
    # Calculate eta-squared effect size
    ss_between = anova_table.loc['C(program)', 'sum_sq']
    ss_total = anova_table['sum_sq'].sum()
    eta_sq = ss_between / ss_total
    
    print(f"{dv}:")
    print(f"  F = {f_val:.2f}, p = {p_val:.6f}, η² = {eta_sq:.3f}")
    print()

For significant ANOVAs, conduct pairwise comparisons:

# Tukey HSD post-hoc tests
print("=== Tukey HSD Pairwise Comparisons ===\n")

for dv in dependent_vars:
    tukey = pairwise_tukeyhsd(df[dv], df['program'], alpha=0.05)
    print(f"\n{dv}:")
    print(tukey.summary())

Apply Bonferroni correction when running multiple univariate tests. With three dependent variables, use α = 0.05/3 ≈ 0.017 for each ANOVA.

Complete Workflow Example

Here’s a full reproducible analysis pipeline:

import numpy as np
import pandas as pd
from scipy import stats
from statsmodels.multivariate.manova import MANOVA
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm
from statsmodels.stats.multicomp import pairwise_tukeyhsd
import seaborn as sns
import matplotlib.pyplot as plt

# Generate data
np.random.seed(42)
n = 40

df = pd.concat([
    pd.DataFrame({
        'program': 'Control',
        'systolic_bp': np.random.normal(135, 12, n),
        'diastolic_bp': np.random.normal(88, 8, n),
        'resting_hr': np.random.normal(75, 10, n)
    }),
    pd.DataFrame({
        'program': 'Moderate',
        'systolic_bp': np.random.normal(128, 11, n),
        'diastolic_bp': np.random.normal(82, 7, n),
        'resting_hr': np.random.normal(68, 9, n)
    }),
    pd.DataFrame({
        'program': 'Intensive',
        'systolic_bp': np.random.normal(122, 10, n),
        'diastolic_bp': np.random.normal(78, 7, n),
        'resting_hr': np.random.normal(62, 8, n)
    })
], ignore_index=True)

# Step 1: Check assumptions
print("=== ASSUMPTION CHECKS ===\n")

# Check for multicollinearity
corr_matrix = df[['systolic_bp', 'diastolic_bp', 'resting_hr']].corr()
print("Correlation matrix (check for r > 0.90):")
print(corr_matrix.round(3))

# Shapiro-Wilk test for normality within groups
print("\nShapiro-Wilk normality tests (p > 0.05 suggests normality):")
for prog in df['program'].unique():
    subset = df[df['program'] == prog]
    for var in ['systolic_bp', 'diastolic_bp', 'resting_hr']:
        stat, p = stats.shapiro(subset[var])
        status = "OK" if p > 0.05 else "VIOLATION"
        print(f"  {prog} - {var}: p = {p:.4f} [{status}]")

# Step 2: Run MANOVA
print("\n=== MANOVA RESULTS ===\n")
manova = MANOVA.from_formula(
    'systolic_bp + diastolic_bp + resting_hr ~ program',
    data=df
)
results = manova.mv_test()
print(results)

# Step 3: Follow-up analyses
print("\n=== UNIVARIATE FOLLOW-UPS ===\n")
alpha_corrected = 0.05 / 3  # Bonferroni correction

for dv in ['systolic_bp', 'diastolic_bp', 'resting_hr']:
    model = ols(f'{dv} ~ C(program)', data=df).fit()
    anova_table = anova_lm(model, typ=2)
    p_val = anova_table.loc['C(program)', 'PR(>F)']
    
    sig = "***" if p_val < alpha_corrected else ""
    print(f"{dv}: p = {p_val:.6f} {sig}")
    
    if p_val < alpha_corrected:
        tukey = pairwise_tukeyhsd(df[dv], df['program'])
        print(tukey.summary())
    print()

# Step 4: Visualize results
fig, axes = plt.subplots(1, 3, figsize=(14, 4))
dvs = ['systolic_bp', 'diastolic_bp', 'resting_hr']
titles = ['Systolic BP', 'Diastolic BP', 'Resting Heart Rate']

for ax, dv, title in zip(axes, dvs, titles):
    sns.boxplot(data=df, x='program', y=dv, ax=ax, 
                order=['Control', 'Moderate', 'Intensive'])
    ax.set_title(title)
    ax.set_xlabel('Exercise Program')

plt.tight_layout()
plt.savefig('manova_results.png', dpi=150, bbox_inches='tight')
plt.show()

This workflow gives you a complete, defensible MANOVA analysis. The key is treating MANOVA as the gatekeeper—only proceed to univariate tests if the omnibus test is significant. This protects against inflated false positive rates while still allowing you to pinpoint specific effects.

When reporting results, include the multivariate test statistic (typically Pillai’s Trace or Wilks’ Lambda), approximate F-value, degrees of freedom, and p-value. Follow with univariate results and effect sizes for each dependent variable, then specific pairwise comparisons for significant effects.