How to Perform the Mood's Median Test in Python

Mood's Median Test answers a straightforward question: do two or more groups have the same median? It's a nonparametric test, meaning it doesn't assume your data follows a normal distribution. This...

Key Insights

  • Mood’s Median Test is a robust nonparametric alternative when your data violates normality assumptions, but it’s less powerful than Kruskal-Wallis—use it specifically when you care about median differences and have outlier-prone data.
  • The test works by constructing a contingency table based on how many observations in each group fall above or below the grand median, then applying a chi-square test to that table.
  • SciPy’s median_test() function handles the heavy lifting, but understanding the ties parameter is critical—the default behavior excludes values equal to the grand median, which can significantly affect your results.

Introduction to Mood’s Median Test

Mood’s Median Test answers a straightforward question: do two or more groups have the same median? It’s a nonparametric test, meaning it doesn’t assume your data follows a normal distribution. This makes it particularly useful for skewed data, ordinal measurements, or datasets with outliers that would wreak havoc on parametric alternatives.

Here’s the decision framework I use:

  • Use Mood’s Median Test when you specifically care about medians, have outliers, or want a test that’s robust to distribution shape differences between groups.
  • Use Kruskal-Wallis when you want more statistical power and can assume similar distribution shapes across groups (it tests stochastic dominance, not strictly medians).
  • Use Mann-Whitney U for two-group comparisons when you want the power advantage over Mood’s test.

The key assumptions are simple: independent samples and continuous data (though it works reasonably well with ordinal data). Unlike Kruskal-Wallis, Mood’s test doesn’t require groups to have similarly shaped distributions—it genuinely tests median equality.

Setting Up Your Environment

You’ll need the standard scientific Python stack. Here’s the setup:

import numpy as np
import pandas as pd
from scipy import stats

# Set random seed for reproducibility
np.random.seed(42)

# Create sample dataset: three treatment groups with different medians
group_a = np.random.exponential(scale=10, size=30) + 5
group_b = np.random.exponential(scale=10, size=35) + 12
group_c = np.random.exponential(scale=10, size=28) + 8

# Combine into a DataFrame for easier manipulation
df = pd.DataFrame({
    'value': np.concatenate([group_a, group_b, group_c]),
    'group': ['A'] * 30 + ['B'] * 35 + ['C'] * 28
})

print(f"Group A median: {np.median(group_a):.2f}")
print(f"Group B median: {np.median(group_b):.2f}")
print(f"Group C median: {np.median(group_c):.2f}")

This creates three groups with intentionally different medians using exponential distributions—exactly the kind of skewed data where Mood’s test shines.

Understanding the Test Mechanics

Before blindly calling a function, you should understand what’s happening under the hood. Mood’s Median Test follows these steps:

  1. Calculate the grand median of all observations combined
  2. Count how many observations in each group fall above and below this grand median
  3. Construct a contingency table from these counts
  4. Apply a chi-square test to determine if the distribution of above/below counts differs across groups

Here’s a manual implementation that exposes the mechanics:

def manual_moods_median_test(*groups):
    """
    Manual implementation of Mood's Median Test to show the mechanics.
    """
    # Combine all groups and calculate grand median
    all_data = np.concatenate(groups)
    grand_median = np.median(all_data)
    
    print(f"Grand median: {grand_median:.2f}")
    print(f"Total observations: {len(all_data)}")
    print()
    
    # Build contingency table
    contingency_table = []
    for i, group in enumerate(groups):
        above = np.sum(group > grand_median)
        below = np.sum(group < grand_median)
        at_median = np.sum(group == grand_median)
        contingency_table.append([above, below])
        print(f"Group {i+1}: {above} above, {below} below, {at_median} at median")
    
    contingency_table = np.array(contingency_table).T
    print(f"\nContingency table:\n{contingency_table}")
    
    # Perform chi-square test on contingency table
    chi2, p_value = stats.chi2_contingency(contingency_table)[:2]
    
    return chi2, p_value, grand_median, contingency_table

stat, p, med, table = manual_moods_median_test(group_a, group_b, group_c)
print(f"\nChi-square statistic: {stat:.4f}")
print(f"P-value: {p:.4f}")

The contingency table has two rows (above median, below median) and as many columns as you have groups. If the groups have the same median, you’d expect roughly equal proportions above and below in each group. The chi-square test quantifies how much the observed counts deviate from this expectation.

Performing the Test with SciPy

Now let’s use the production-ready implementation. SciPy’s median_test() function returns four values:

# Two-group comparison
stat, p, med, table = stats.median_test(group_a, group_b)

print("Two-group comparison (A vs B):")
print(f"Test statistic: {stat:.4f}")
print(f"P-value: {p:.4f}")
print(f"Grand median: {med:.4f}")
print(f"Contingency table:\n{table}")
print()

# Multi-group comparison
stat, p, med, table = stats.median_test(group_a, group_b, group_c)

print("Three-group comparison (A vs B vs C):")
print(f"Test statistic: {stat:.4f}")
print(f"P-value: {p:.4f}")
print(f"Grand median: {med:.4f}")
print(f"Contingency table:\n{table}")

Interpreting the output:

  • stat: The chi-square test statistic. Larger values indicate greater deviation from the null hypothesis.
  • p: The p-value. Below your significance threshold (typically 0.05), you reject the null hypothesis that all groups have the same median.
  • med: The grand median used for the test.
  • table: The contingency table showing counts above (row 0) and below (row 1) the grand median for each group.

Practical Example: Real-World Dataset

Let’s work through a realistic scenario. Suppose you’re analyzing response times across three different server configurations:

# Simulating server response times (milliseconds)
# Configuration A: baseline
# Configuration B: with caching
# Configuration C: with caching + load balancing

np.random.seed(123)

# Response times often follow log-normal or exponential distributions
config_a = np.random.lognormal(mean=4.5, sigma=0.8, size=50)
config_b = np.random.lognormal(mean=4.2, sigma=0.9, size=55)
config_c = np.random.lognormal(mean=4.0, sigma=0.7, size=48)

# Add some outliers (server hiccups)
config_a = np.append(config_a, [500, 600, 450])
config_b = np.append(config_b, [480, 520])

# Create DataFrame for analysis
response_df = pd.DataFrame({
    'response_time': np.concatenate([config_a, config_b, config_c]),
    'configuration': ['Baseline'] * len(config_a) + 
                     ['Cached'] * len(config_b) + 
                     ['Cached+LB'] * len(config_c)
})

# Descriptive statistics
print("Response Time Statistics by Configuration:")
print(response_df.groupby('configuration')['response_time'].agg([
    'count', 'mean', 'median', 'std'
]).round(2))
print()

# Notice how mean is heavily influenced by outliers, but median is stable

# Perform Mood's Median Test
stat, p, grand_med, table = stats.median_test(config_a, config_b, config_c)

print(f"Mood's Median Test Results:")
print(f"Grand median: {grand_med:.2f} ms")
print(f"Test statistic: {stat:.4f}")
print(f"P-value: {p:.4f}")
print()

# Interpretation
alpha = 0.05
if p < alpha:
    print(f"Result: Reject null hypothesis (p={p:.4f} < {alpha})")
    print("At least one configuration has a significantly different median response time.")
else:
    print(f"Result: Fail to reject null hypothesis (p={p:.4f} >= {alpha})")
    print("No significant difference in median response times detected.")

This example demonstrates why Mood’s test is valuable: the outliers (server hiccups causing 400-600ms responses) would heavily skew a t-test or ANOVA, but the median-based test remains robust.

Handling Edge Cases and Best Practices

The ties parameter is crucial and often overlooked. It controls how observations exactly equal to the grand median are handled:

# Create data with ties at the median
group_x = np.array([1, 2, 3, 4, 5, 5, 5, 6, 7, 8])
group_y = np.array([3, 4, 5, 5, 5, 5, 6, 7, 8, 9])

grand_med = np.median(np.concatenate([group_x, group_y]))
print(f"Grand median: {grand_med}")
print(f"Ties at median - Group X: {np.sum(group_x == grand_med)}")
print(f"Ties at median - Group Y: {np.sum(group_y == grand_med)}")
print()

# Default: ties='below' - values equal to median counted as below
stat, p, med, table = stats.median_test(group_x, group_y, ties='below')
print(f"ties='below': stat={stat:.4f}, p={p:.4f}")
print(f"Table:\n{table}\n")

# Alternative: ties='above' - values equal to median counted as above
stat, p, med, table = stats.median_test(group_x, group_y, ties='above')
print(f"ties='above': stat={stat:.4f}, p={p:.4f}")
print(f"Table:\n{table}\n")

# Alternative: ties='ignore' - exclude values equal to median
stat, p, med, table = stats.median_test(group_x, group_y, ties='ignore')
print(f"ties='ignore': stat={stat:.4f}, p={p:.4f}")
print(f"Table:\n{table}")

Best practices for Mood’s Median Test:

  1. Sample size: Aim for at least 10 observations per group. With very small samples, the chi-square approximation becomes unreliable.

  2. Ties handling: If you have many ties at the median, use ties='ignore' to be conservative, or report results with multiple tie-handling methods.

  3. Effect size: The test only tells you if medians differ, not by how much. Always report the actual group medians alongside your p-value.

  4. Multiple comparisons: If you reject the null with multiple groups, you’ll need post-hoc pairwise tests (with correction) to identify which specific groups differ.

# Post-hoc pairwise comparisons with Bonferroni correction
from itertools import combinations

groups = {'A': group_a, 'B': group_b, 'C': group_c}
pairs = list(combinations(groups.keys(), 2))
n_comparisons = len(pairs)
bonferroni_alpha = 0.05 / n_comparisons

print(f"Pairwise comparisons (Bonferroni-corrected alpha: {bonferroni_alpha:.4f}):")
for g1, g2 in pairs:
    stat, p, _, _ = stats.median_test(groups[g1], groups[g2])
    sig = "***" if p < bonferroni_alpha else ""
    print(f"{g1} vs {g2}: p={p:.4f} {sig}")

Conclusion

Mood’s Median Test occupies a specific niche: use it when you genuinely care about median differences, have outlier-prone or heavily skewed data, and want robustness over statistical power. It’s not the default choice for comparing groups—Kruskal-Wallis usually wins on power—but it’s the right tool when medians are your focus.

Quick reference for scipy.stats.median_test():

Parameter Description Default
*args Sample arrays to compare Required
ties How to handle values equal to grand median: 'below', 'above', or 'ignore' 'below'
correction Apply Yates’ continuity correction True
lambda_ Power divergence statistic parameter None (chi-square)

Remember: always visualize your data first, report actual medians alongside p-values, and consider the practical significance of any differences you find.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.