How to Perform the Kruskal-Wallis Test in Python
The Kruskal-Wallis test is the non-parametric equivalent of one-way ANOVA. When your data violates normality assumptions or you're working with ordinal scales (like survey ratings), this test becomes...
Key Insights
- The Kruskal-Wallis test is your go-to method when comparing three or more groups with non-normal data or ordinal measurements—it ranks all observations and compares median ranks rather than means.
- A significant Kruskal-Wallis result only tells you that differences exist somewhere; you must follow up with Dunn’s test or similar post-hoc analysis to identify which specific groups differ.
- Always visualize your data with box plots before and after testing—statistical significance means nothing without understanding the practical magnitude of group differences.
Introduction to the Kruskal-Wallis Test
The Kruskal-Wallis test is the non-parametric equivalent of one-way ANOVA. When your data violates normality assumptions or you’re working with ordinal scales (like survey ratings), this test becomes essential.
Use Kruskal-Wallis when you need to compare three or more independent groups and at least one of these conditions applies:
- Your data isn’t normally distributed
- Sample sizes are small (making normality hard to verify)
- You’re working with ordinal data (rankings, Likert scales)
- You have significant outliers you can’t justify removing
The test works by ranking all observations across all groups, then comparing whether the mean ranks differ significantly between groups. It tests the null hypothesis that all groups come from populations with the same median.
Three key assumptions must hold:
- Independence: Observations within and between groups are independent
- Ordinal scale: Data must be at least ordinal (rankable)
- Similar shapes: Group distributions should have similar shapes (though not necessarily normal)
That third assumption often gets overlooked. If distributions have vastly different shapes, a significant result might reflect shape differences rather than location differences.
Prerequisites and Setup
You’ll need a few standard libraries. SciPy handles the main test, while scikit-posthocs provides post-hoc comparisons:
import numpy as np
import pandas as pd
from scipy import stats
import scikit_posthocs as sp
import matplotlib.pyplot as plt
import seaborn as sns
# Set random seed for reproducibility
np.random.seed(42)
# Create sample dataset: treatment effectiveness scores (0-100)
# Three treatment groups with intentionally non-normal distributions
treatment_a = np.concatenate([
np.random.normal(65, 10, 25),
np.random.normal(85, 5, 5) # Some high responders
])
treatment_b = np.random.exponential(scale=15, size=30) + 40
treatment_c = np.random.normal(75, 8, 35)
# Clip to realistic range
treatment_a = np.clip(treatment_a, 0, 100)
treatment_b = np.clip(treatment_b, 0, 100)
treatment_c = np.clip(treatment_c, 0, 100)
Install scikit-posthocs if you haven’t already:
pip install scikit-posthocs
Performing the Test with SciPy
The scipy.stats.kruskal() function makes execution straightforward. Pass your groups as separate arrays:
# Perform Kruskal-Wallis test
h_statistic, p_value = stats.kruskal(treatment_a, treatment_b, treatment_c)
print(f"Kruskal-Wallis H-statistic: {h_statistic:.4f}")
print(f"P-value: {p_value:.4f}")
# Interpretation
alpha = 0.05
if p_value < alpha:
print(f"\nResult: Reject null hypothesis (p < {alpha})")
print("At least one group differs significantly from the others.")
else:
print(f"\nResult: Fail to reject null hypothesis (p >= {alpha})")
print("No significant difference detected between groups.")
The H-statistic follows a chi-square distribution with k-1 degrees of freedom (where k is the number of groups). Larger H values indicate greater differences between group ranks.
A common mistake: treating a significant p-value as proof that all groups differ. It only means at least one group is different—you don’t know which ones yet.
Preparing Real-World Data
Real data rarely comes in separate arrays. Here’s how to handle typical DataFrame structures:
# Simulate loading data from CSV
data = pd.DataFrame({
'patient_id': range(1, 96),
'treatment': ['A'] * 30 + ['B'] * 30 + ['C'] * 35,
'effectiveness_score': np.concatenate([treatment_a, treatment_b, treatment_c])
})
# Preview the data
print(data.groupby('treatment')['effectiveness_score'].describe())
# Method 1: Extract arrays using groupby
groups = [group['effectiveness_score'].values
for name, group in data.groupby('treatment')]
h_stat, p_val = stats.kruskal(*groups)
print(f"\nH-statistic: {h_stat:.4f}, p-value: {p_val:.4f}")
# Method 2: Direct extraction by condition
group_a = data[data['treatment'] == 'A']['effectiveness_score']
group_b = data[data['treatment'] == 'B']['effectiveness_score']
group_c = data[data['treatment'] == 'C']['effectiveness_score']
h_stat, p_val = stats.kruskal(group_a, group_b, group_c)
Unequal group sizes aren’t a problem—Kruskal-Wallis handles them naturally. However, extremely unbalanced designs (like 10 vs 10 vs 200) can affect power and interpretation.
Post-Hoc Analysis with Dunn’s Test
A significant Kruskal-Wallis result demands follow-up testing. Dunn’s test performs pairwise comparisons while controlling for multiple testing:
# Perform Dunn's test with Bonferroni correction
dunn_results = sp.posthoc_dunn(
data,
val_col='effectiveness_score',
group_col='treatment',
p_adjust='bonferroni'
)
print("Dunn's Test Results (Bonferroni-corrected p-values):")
print(dunn_results.round(4))
# Identify significant pairs
alpha = 0.05
significant_pairs = []
for i in range(len(dunn_results.columns)):
for j in range(i + 1, len(dunn_results.columns)):
group1 = dunn_results.columns[i]
group2 = dunn_results.columns[j]
p = dunn_results.iloc[i, j]
if p < alpha:
significant_pairs.append((group1, group2, p))
print(f"\nSignificant pairwise differences (p < {alpha}):")
for g1, g2, p in significant_pairs:
print(f" {g1} vs {g2}: p = {p:.4f}")
The Bonferroni correction is conservative—it reduces false positives but may miss real differences. Alternatives include:
- Holm: Less conservative, still controls family-wise error rate
- Benjamini-Hochberg: Controls false discovery rate, more powerful
# Compare different correction methods
for method in ['bonferroni', 'holm', 'fdr_bh']:
results = sp.posthoc_dunn(
data, val_col='effectiveness_score',
group_col='treatment', p_adjust=method
)
print(f"\n{method.upper()} correction:")
print(results.round(4))
Visualizing Results
Numbers without visuals tell an incomplete story. Box plots reveal distribution characteristics that summary statistics hide:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Box plot
sns.boxplot(data=data, x='treatment', y='effectiveness_score', ax=axes[0])
axes[0].set_title('Treatment Effectiveness by Group')
axes[0].set_xlabel('Treatment')
axes[0].set_ylabel('Effectiveness Score')
# Add median annotations
medians = data.groupby('treatment')['effectiveness_score'].median()
for i, treatment in enumerate(['A', 'B', 'C']):
axes[0].annotate(f'Mdn: {medians[treatment]:.1f}',
xy=(i, medians[treatment]),
xytext=(i + 0.2, medians[treatment] + 3),
fontsize=9)
# Violin plot for distribution shape
sns.violinplot(data=data, x='treatment', y='effectiveness_score', ax=axes[1])
axes[1].set_title('Distribution Shape by Treatment')
axes[1].set_xlabel('Treatment')
axes[1].set_ylabel('Effectiveness Score')
plt.tight_layout()
plt.savefig('kruskal_wallis_visualization.png', dpi=150)
plt.show()
For publication-ready figures, add significance annotations:
def add_significance_bracket(ax, x1, x2, y, p_value, height=2):
"""Add significance bracket between two x positions."""
# Determine significance stars
if p_value < 0.001:
sig_text = '***'
elif p_value < 0.01:
sig_text = '**'
elif p_value < 0.05:
sig_text = '*'
else:
sig_text = 'ns'
ax.plot([x1, x1, x2, x2], [y, y + height, y + height, y], 'k-', lw=1.5)
ax.text((x1 + x2) / 2, y + height, sig_text, ha='center', va='bottom', fontsize=12)
# Create annotated plot
fig, ax = plt.subplots(figsize=(8, 6))
sns.boxplot(data=data, x='treatment', y='effectiveness_score', ax=ax)
# Add significance brackets for significant pairs
y_max = data['effectiveness_score'].max()
add_significance_bracket(ax, 0, 1, y_max + 5, dunn_results.loc['A', 'B'])
add_significance_bracket(ax, 1, 2, y_max + 12, dunn_results.loc['B', 'C'])
ax.set_ylim(top=y_max + 25)
ax.set_title('Treatment Effectiveness with Statistical Comparisons')
plt.tight_layout()
plt.show()
Complete Workflow Example
Here’s a production-ready script combining everything. This analyzes customer satisfaction scores across three regional offices:
"""
Complete Kruskal-Wallis Analysis Pipeline
Analyzes customer satisfaction across regional offices
"""
import numpy as np
import pandas as pd
from scipy import stats
import scikit_posthocs as sp
import matplotlib.pyplot as plt
import seaborn as sns
def run_kruskal_wallis_analysis(data, value_col, group_col, alpha=0.05):
"""
Complete Kruskal-Wallis analysis with post-hoc tests and visualization.
Parameters:
-----------
data : pd.DataFrame
Input data with value and group columns
value_col : str
Name of the column containing measurements
group_col : str
Name of the column containing group labels
alpha : float
Significance level (default 0.05)
Returns:
--------
dict : Results containing statistics, p-values, and post-hoc comparisons
"""
results = {}
# Step 1: Descriptive statistics
print("=" * 60)
print("DESCRIPTIVE STATISTICS")
print("=" * 60)
desc_stats = data.groupby(group_col)[value_col].agg([
'count', 'median', 'mean', 'std', 'min', 'max'
]).round(2)
print(desc_stats)
results['descriptive'] = desc_stats
# Step 2: Check distribution shapes (Shapiro-Wilk test)
print("\n" + "=" * 60)
print("NORMALITY TESTS (Shapiro-Wilk)")
print("=" * 60)
for group_name, group_data in data.groupby(group_col):
stat, p = stats.shapiro(group_data[value_col])
normality = "Normal" if p >= alpha else "Non-normal"
print(f"{group_name}: W={stat:.4f}, p={p:.4f} ({normality})")
# Step 3: Kruskal-Wallis test
print("\n" + "=" * 60)
print("KRUSKAL-WALLIS TEST")
print("=" * 60)
groups = [group[value_col].values for _, group in data.groupby(group_col)]
h_stat, p_value = stats.kruskal(*groups)
print(f"H-statistic: {h_stat:.4f}")
print(f"P-value: {p_value:.6f}")
print(f"Degrees of freedom: {len(groups) - 1}")
results['h_statistic'] = h_stat
results['p_value'] = p_value
if p_value < alpha:
print(f"\n>>> SIGNIFICANT (p < {alpha}): Groups differ significantly")
# Step 4: Post-hoc analysis
print("\n" + "=" * 60)
print("POST-HOC ANALYSIS (Dunn's Test, Bonferroni)")
print("=" * 60)
posthoc = sp.posthoc_dunn(
data, val_col=value_col, group_col=group_col, p_adjust='bonferroni'
)
print(posthoc.round(4))
results['posthoc'] = posthoc
# Identify significant pairs
print("\nSignificant pairwise comparisons:")
group_names = posthoc.columns.tolist()
for i, g1 in enumerate(group_names):
for g2 in group_names[i+1:]:
p = posthoc.loc[g1, g2]
if p < alpha:
print(f" {g1} vs {g2}: p = {p:.4f} *")
else:
print(f"\n>>> NOT SIGNIFICANT (p >= {alpha}): No evidence of group differences")
results['posthoc'] = None
# Step 5: Visualization
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
sns.boxplot(data=data, x=group_col, y=value_col, ax=axes[0], palette='Set2')
axes[0].set_title(f'Distribution by {group_col}\n(Kruskal-Wallis p={p_value:.4f})')
sns.violinplot(data=data, x=group_col, y=value_col, ax=axes[1], palette='Set2')
axes[1].set_title(f'Density by {group_col}')
plt.tight_layout()
plt.savefig('kruskal_wallis_results.png', dpi=150, bbox_inches='tight')
plt.show()
return results
# Generate sample data: Customer satisfaction by region
np.random.seed(123)
satisfaction_data = pd.DataFrame({
'region': ['North'] * 45 + ['South'] * 50 + ['West'] * 40,
'satisfaction': np.concatenate([
np.random.choice([3, 4, 4, 5, 5, 5], 45), # North: higher satisfaction
np.random.choice([2, 3, 3, 4, 4], 50), # South: moderate
np.random.choice([1, 2, 2, 3, 3, 4], 40) # West: lower satisfaction
])
})
# Run the analysis
results = run_kruskal_wallis_analysis(
satisfaction_data,
value_col='satisfaction',
group_col='region'
)
This script produces comprehensive output: descriptive statistics, normality checks, the main test, post-hoc comparisons, and publication-quality visualizations. Adapt the run_kruskal_wallis_analysis function for your specific datasets—it handles the complete workflow while remaining readable and maintainable.