How to Perform the Kruskal-Wallis Test in R

The Kruskal-Wallis test is the non-parametric equivalent of one-way ANOVA. When your data doesn't meet the normality assumption required by ANOVA, or when you're working with ordinal data, this test...

Key Insights

  • The Kruskal-Wallis test is your go-to method when comparing three or more independent groups and your data violates ANOVA’s normality assumption—it ranks values instead of using raw measurements.
  • A significant Kruskal-Wallis result only tells you that differences exist somewhere; you must follow up with Dunn’s test to identify which specific groups differ from each other.
  • Always report effect size (epsilon-squared) alongside your p-value—statistical significance without practical significance misleads your readers and stakeholders.

Introduction to the Kruskal-Wallis Test

The Kruskal-Wallis test is the non-parametric equivalent of one-way ANOVA. When your data doesn’t meet the normality assumption required by ANOVA, or when you’re working with ordinal data, this test becomes essential.

The test works by ranking all observations across groups, then comparing mean ranks between groups. If groups come from identical distributions, their mean ranks should be similar. Large differences in mean ranks suggest the groups differ systematically.

The null hypothesis states that all groups come from populations with identical distributions. Rejecting this hypothesis means at least one group’s distribution differs from the others—but it doesn’t tell you which groups differ. That’s where post-hoc analysis comes in.

Use the Kruskal-Wallis test when:

  • You’re comparing three or more independent groups
  • Your dependent variable is ordinal or continuous
  • Data is non-normal or you have small sample sizes
  • You have outliers that would distort parametric tests

Assumptions and Prerequisites

The Kruskal-Wallis test has fewer assumptions than ANOVA, but it still has requirements you must verify.

Independence: Observations must be independent both within and between groups. This is a study design issue, not something you can test statistically.

Ordinal or continuous dependent variable: The test requires at least ordinal measurement scale.

Similar distribution shapes: While the test doesn’t require normality, groups should have similarly shaped distributions (even if shifted). If distributions have different shapes, a significant result might reflect shape differences rather than location differences.

Here’s how to check whether your data meets parametric assumptions and whether the Kruskal-Wallis test is appropriate:

# Load required libraries
library(ggplot2)

# Using the built-in PlantGrowth dataset
data("PlantGrowth")
str(PlantGrowth)

# Check normality within each group using Shapiro-Wilk test
by(PlantGrowth$weight, PlantGrowth$group, shapiro.test)

# Visual inspection with histograms
ggplot(PlantGrowth, aes(x = weight)) +
  geom_histogram(bins = 8, fill = "steelblue", color = "white") +
  facet_wrap(~group, ncol = 1) +
  theme_minimal() +
  labs(title = "Distribution of Weight by Treatment Group",
       x = "Dried Weight", y = "Count")

# Q-Q plots for each group
ggplot(PlantGrowth, aes(sample = weight)) +
  stat_qq() +
  stat_qq_line(color = "red") +
  facet_wrap(~group) +
  theme_minimal() +
  labs(title = "Q-Q Plots by Group")

If Shapiro-Wilk tests show p-values below 0.05 for any group, or if Q-Q plots show substantial deviation from the diagonal line, consider the Kruskal-Wallis test over ANOVA.

Preparing Your Data

The Kruskal-Wallis test requires data in long format: one column for the dependent variable and one column for the grouping factor. Each row represents a single observation.

# Example: Creating a dataset from scratch
set.seed(42)

# Simulating customer satisfaction scores (1-10) across three store locations
satisfaction_data <- data.frame(
  score = c(
    sample(4:7, 25, replace = TRUE),   # Store A: moderate scores
    sample(6:9, 25, replace = TRUE),   # Store B: higher scores
    sample(3:6, 25, replace = TRUE)    # Store C: lower scores
  ),
  store = factor(rep(c("Store_A", "Store_B", "Store_C"), each = 25))
)

# Verify structure
head(satisfaction_data)
str(satisfaction_data)

# Summary statistics by group
aggregate(score ~ store, data = satisfaction_data, 
          FUN = function(x) c(median = median(x), 
                              mean = mean(x), 
                              sd = sd(x)))

If your data is in wide format (groups as separate columns), reshape it first:

# Converting wide to long format
library(tidyr)

wide_data <- data.frame(
  Store_A = sample(4:7, 25, replace = TRUE),
  Store_B = sample(6:9, 25, replace = TRUE),
  Store_C = sample(3:6, 25, replace = TRUE)
)

long_data <- pivot_longer(wide_data, 
                          cols = everything(),
                          names_to = "store",
                          values_to = "score")

Running the Kruskal-Wallis Test

The kruskal.test() function is part of base R—no additional packages required. The syntax follows R’s standard formula interface.

# Using the PlantGrowth dataset
kruskal_result <- kruskal.test(weight ~ group, data = PlantGrowth)
print(kruskal_result)

Output:

	Kruskal-Wallis rank sum test

data:  weight by group
Kruskal-Wallis chi-squared = 7.9882, df = 2, p-value = 0.01842

Interpreting the output:

  • Kruskal-Wallis chi-squared (H statistic): 7.9882. Higher values indicate greater differences between groups.
  • df (degrees of freedom): Number of groups minus 1. With 3 groups, df = 2.
  • p-value: 0.01842. Below 0.05, so we reject the null hypothesis.

The significant result tells us the groups differ, but not which specific groups. Let’s also run this on our simulated satisfaction data:

# Running on satisfaction data
kruskal.test(score ~ store, data = satisfaction_data)

# Alternative syntax using vectors
# kruskal.test(list(store_a_scores, store_b_scores, store_c_scores))

Post-Hoc Analysis with Dunn’s Test

A significant Kruskal-Wallis result demands follow-up testing. Dunn’s test performs pairwise comparisons while controlling for multiple testing.

The FSA package provides a clean implementation:

# Install if needed: install.packages("FSA")
library(FSA)

# Dunn's test with Bonferroni correction
dunn_result <- dunnTest(weight ~ group, data = PlantGrowth, method = "bonferroni")
print(dunn_result)

Output shows pairwise comparisons with adjusted p-values:

                Comparison          Z      P.unadj       P.adj
1        ctrl - trt1  0.1944862 0.8458368 1.00000000
2        ctrl - trt2 -2.0493465 0.0404437 0.12133116
3        trt1 - trt2 -2.2438327 0.0248484 0.07454527

Choosing adjustment methods:

# Holm method (less conservative than Bonferroni)
dunnTest(weight ~ group, data = PlantGrowth, method = "holm")

# Benjamini-Hochberg (controls false discovery rate)
dunnTest(weight ~ group, data = PlantGrowth, method = "bh")

Bonferroni is the most conservative—use it when false positives are costly. Holm provides more power while maintaining family-wise error rate. Benjamini-Hochberg is appropriate for exploratory analyses.

An alternative package, dunn.test, provides similar functionality:

# install.packages("dunn.test")
library(dunn.test)

dunn.test(PlantGrowth$weight, PlantGrowth$group, method = "bonferroni")

Visualizing Results

Effective visualization communicates your findings better than tables alone. Boxplots show medians and spread; violin plots reveal distribution shapes.

library(ggplot2)

# Basic boxplot with individual points
ggplot(PlantGrowth, aes(x = group, y = weight, fill = group)) +
  geom_boxplot(alpha = 0.7, outlier.shape = NA) +
  geom_jitter(width = 0.2, alpha = 0.5, size = 2) +
  scale_fill_brewer(palette = "Set2") +
  theme_minimal() +
  labs(title = "Plant Weight by Treatment Group",
       subtitle = "Kruskal-Wallis p = 0.018",
       x = "Treatment Group",
       y = "Dried Weight (g)") +
  theme(legend.position = "none")

For publication-ready figures with significance annotations, use ggpubr:

# install.packages("ggpubr")
library(ggpubr)

# Boxplot with automatic Kruskal-Wallis and pairwise comparisons
ggboxplot(PlantGrowth, x = "group", y = "weight",
          fill = "group", palette = "Set2",
          add = "jitter", add.params = list(alpha = 0.5)) +
  stat_compare_means(method = "kruskal.test", label.y = 7) +
  stat_compare_means(comparisons = list(c("ctrl", "trt1"), 
                                         c("ctrl", "trt2"), 
                                         c("trt1", "trt2")),
                     method = "wilcox.test",
                     label = "p.signif") +
  theme(legend.position = "none")

Violin plots work well for larger samples:

ggplot(satisfaction_data, aes(x = store, y = score, fill = store)) +
  geom_violin(alpha = 0.7, trim = FALSE) +
  geom_boxplot(width = 0.15, fill = "white", alpha = 0.8) +
  scale_fill_brewer(palette = "Pastel1") +
  theme_minimal() +
  labs(title = "Customer Satisfaction by Store Location",
       x = "Store", y = "Satisfaction Score (1-10)") +
  theme(legend.position = "none")

Conclusion and Best Practices

Reporting guidelines: Always report the H statistic, degrees of freedom, p-value, and sample sizes. Example: “A Kruskal-Wallis test revealed significant differences in plant weight across treatment groups, H(2) = 7.99, p = .018, n = 30.”

Effect size matters: Calculate epsilon-squared to quantify practical significance:

# Epsilon-squared effect size
kruskal_result <- kruskal.test(weight ~ group, data = PlantGrowth)
H <- kruskal_result$statistic
n <- nrow(PlantGrowth)
epsilon_squared <- H / (n - 1)
print(paste("Epsilon-squared:", round(epsilon_squared, 3)))
# Interpretation: 0.01 = small, 0.06 = medium, 0.14 = large

Common pitfalls to avoid:

  1. Don’t use Kruskal-Wallis for paired or repeated measures—use Friedman’s test instead.
  2. Don’t skip post-hoc tests after significant results.
  3. Don’t interpret a non-significant result as “no difference”—you may lack statistical power.
  4. Don’t ignore distribution shapes—wildly different shapes make interpretation problematic.

The Kruskal-Wallis test is robust and widely applicable, but it’s not a magic solution for bad data. Verify assumptions, visualize distributions, and report effect sizes alongside p-values. Your analysis will be more credible and your conclusions more defensible.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.