How to Perform the Friedman Test in R
The Friedman test is a non-parametric statistical test designed for comparing three or more related groups. Think of it as the non-parametric cousin of repeated measures ANOVA. When you have the same...
Key Insights
- The Friedman test is your go-to non-parametric alternative when repeated measures ANOVA assumptions fail—use it for ordinal data, small samples, or when normality is questionable.
- Always structure your data correctly before testing: the
friedman.test()function accepts both matrix format and formula interface, but getting the format wrong produces misleading results. - A significant Friedman test only tells you differences exist somewhere—you must run post-hoc pairwise comparisons to identify which specific groups differ from each other.
Introduction to the Friedman Test
The Friedman test is a non-parametric statistical test designed for comparing three or more related groups. Think of it as the non-parametric cousin of repeated measures ANOVA. When you have the same subjects measured under multiple conditions—or matched groups evaluated across treatments—and your data violates normality assumptions, the Friedman test becomes your primary analytical tool.
Named after economist Milton Friedman (yes, that Milton Friedman), this test works by ranking observations within each block (subject) and then analyzing whether the rank sums differ significantly across conditions. This ranking approach makes it robust against outliers and non-normal distributions.
Use the Friedman test when:
- You have ordinal data (Likert scales, rankings, ratings)
- Your continuous data is heavily skewed or contains outliers
- Sample sizes are too small to rely on the Central Limit Theorem
- You’re comparing three or more related measurements
Common applications include taste tests where judges rate multiple products, medical studies tracking patients across treatment phases, and usability studies comparing interface designs with the same participants.
Assumptions and Requirements
Before running the Friedman test, verify your data meets these requirements:
Repeated measures or matched groups design. Each subject (or matched set) must be measured under all conditions. This creates the “blocks” that the test uses for within-subject ranking.
Ordinal or continuous dependent variable. The outcome must be at least ordinal—you need to meaningfully rank observations. Nominal categories won’t work.
At least three related groups. With only two groups, use the Wilcoxon signed-rank test instead. The Friedman test requires three or more conditions to compare.
Random sampling. Subjects should be randomly selected from the population of interest. This assumption affects generalizability, not the test mechanics.
Independence between blocks. While measurements within a subject are related (that’s the point), different subjects should be independent of each other.
The Friedman test does not assume normality, homogeneity of variance, or sphericity—making it considerably more flexible than repeated measures ANOVA.
Preparing Your Data in R
Data structure matters. The friedman.test() function accepts two input formats: a matrix or a formula with data frame. Getting this wrong is the most common source of errors.
Matrix format: Rows represent subjects (blocks), columns represent conditions (groups). Each cell contains the measurement for that subject under that condition.
Formula format: Requires long-format data with columns for the response variable, grouping variable, and blocking variable.
Let’s create sample data representing patient pain scores (0-10 scale) measured under three treatment conditions:
# Create sample data: 12 patients, 3 treatments
set.seed(42)
# Wide format (matrix-ready)
pain_wide <- data.frame(
patient_id = 1:12,
placebo = c(7, 8, 6, 9, 7, 8, 6, 7, 8, 9, 7, 8),
drug_a = c(5, 6, 4, 7, 5, 6, 5, 4, 6, 7, 5, 6),
drug_b = c(3, 4, 3, 5, 4, 3, 4, 3, 4, 5, 3, 4)
)
# Convert to long format for formula interface
pain_long <- data.frame(
patient_id = rep(1:12, times = 3),
treatment = factor(rep(c("placebo", "drug_a", "drug_b"), each = 12)),
pain_score = c(pain_wide$placebo, pain_wide$drug_a, pain_wide$drug_b)
)
# Verify structure
head(pain_long)
patient_id treatment pain_score
1 1 placebo 7
2 2 placebo 8
3 3 placebo 6
4 4 placebo 9
5 5 placebo 7
6 6 placebo 8
Running the Friedman Test
Base R includes friedman.test() in the stats package—no additional installations required.
Matrix input approach:
# Extract just the measurement columns as a matrix
pain_matrix <- as.matrix(pain_wide[, c("placebo", "drug_a", "drug_b")])
# Run the test
friedman_result <- friedman.test(pain_matrix)
print(friedman_result)
Friedman rank sum test
data: pain_matrix
Friedman chi-squared = 22.167, df = 2, p-value = 1.534e-05
Formula interface approach:
# Using long-format data
friedman_result2 <- friedman.test(
pain_score ~ treatment | patient_id,
data = pain_long
)
print(friedman_result2)
The formula reads as: “pain_score varies by treatment, blocked by patient_id.”
Interpreting the output:
- Friedman chi-squared: The test statistic (22.167). Larger values indicate greater differences between group ranks.
- df: Degrees of freedom, calculated as (number of groups - 1). Here, 3 - 1 = 2.
- p-value: The probability of observing this result if no true differences exist. At 1.534e-05, we have strong evidence against the null hypothesis.
With p < 0.05, we reject the null hypothesis that all treatments produce equivalent pain scores. But this only tells us that differences exist somewhere—not which specific treatments differ.
Post-Hoc Pairwise Comparisons
A significant Friedman test demands follow-up analysis. You have several options for pairwise comparisons.
Option 1: Pairwise Wilcoxon signed-rank tests
This approach uses the Wilcoxon test for each pair, with p-value adjustment for multiple comparisons:
pairwise.wilcox.test(
pain_long$pain_score,
pain_long$treatment,
p.adjust.method = "bonferroni",
paired = TRUE
)
Pairwise comparisons using Wilcoxon signed rank exact test
data: pain_long$pain_score and pain_long$treatment
drug_a drug_b
drug_b 0.0059 -
placebo 0.0059 0.0029
P value adjustment method: bonferroni
All pairwise comparisons are significant. Drug B produces lower pain scores than Drug A, and both drugs outperform placebo.
Option 2: Nemenyi test (recommended)
The Nemenyi test is specifically designed for post-hoc comparisons following a Friedman test. Install the PMCMRplus package:
# Install if needed
# install.packages("PMCMRplus")
library(PMCMRplus)
# Run Nemenyi test
nemenyi_result <- frdAllPairsNemenyiTest(pain_matrix)
print(nemenyi_result)
Pairwise comparisons using Nemenyi-Wilcoxon-Wilcox all-pairs test for a two-way balanced complete block design
data: pain_matrix
placebo drug_a
drug_a 0.0074 -
drug_b 2.6e-05 0.0074
P value adjustment method: single-step
The Nemenyi test confirms all three treatments differ significantly from each other.
Visualizing Results
Visualization helps communicate findings effectively. Here’s how to create publication-ready graphics with ggplot2:
library(ggplot2)
library(dplyr)
# Calculate summary statistics
pain_summary <- pain_long %>%
group_by(treatment) %>%
summarise(
median = median(pain_score),
q1 = quantile(pain_score, 0.25),
q3 = quantile(pain_score, 0.75),
.groups = "drop"
)
# Create boxplot with individual points
ggplot(pain_long, aes(x = treatment, y = pain_score, fill = treatment)) +
geom_boxplot(alpha = 0.7, outlier.shape = NA) +
geom_jitter(width = 0.1, alpha = 0.5, size = 2) +
labs(
title = "Pain Scores by Treatment Condition",
subtitle = "Friedman test: χ² = 22.17, p < 0.001",
x = "Treatment",
y = "Pain Score (0-10)"
) +
scale_fill_brewer(palette = "Set2") +
theme_minimal() +
theme(legend.position = "none")
For repeated measures data, a line plot showing individual trajectories can be more informative:
ggplot(pain_long, aes(x = treatment, y = pain_score, group = patient_id)) +
geom_line(alpha = 0.3, color = "gray50") +
geom_point(aes(color = treatment), size = 3) +
stat_summary(
aes(group = 1),
fun = median,
geom = "line",
linewidth = 1.5,
color = "black"
) +
labs(
title = "Individual Patient Pain Trajectories",
x = "Treatment",
y = "Pain Score"
) +
theme_minimal() +
theme(legend.position = "bottom")
Complete Worked Example
Let’s run through a complete analysis with a realistic scenario: wine competition judges rating four wines.
# Complete reproducible analysis
# Scenario: 8 judges rate 4 wines on a 1-100 scale
set.seed(123)
# Create the dataset
wine_data <- data.frame(
judge = rep(1:8, times = 4),
wine = factor(rep(c("Cabernet", "Merlot", "Pinot", "Syrah"), each = 8)),
rating = c(
# Cabernet ratings
78, 82, 75, 80, 77, 83, 79, 81,
# Merlot ratings
72, 75, 70, 74, 71, 76, 73, 74,
# Pinot Noir ratings
85, 88, 82, 87, 84, 89, 86, 88,
# Syrah ratings
74, 77, 72, 76, 73, 78, 75, 76
)
)
# Step 1: Explore the data
cat("=== Descriptive Statistics ===\n")
wine_data %>%
group_by(wine) %>%
summarise(
n = n(),
median = median(rating),
IQR = IQR(rating),
.groups = "drop"
) %>%
print()
# Step 2: Run Friedman test
cat("\n=== Friedman Test ===\n")
friedman_wine <- friedman.test(
rating ~ wine | judge,
data = wine_data
)
print(friedman_wine)
# Step 3: Effect size (Kendall's W)
n_judges <- 8
n_wines <- 4
kendall_w <- friedman_wine$statistic / (n_judges * (n_wines - 1))
cat("\nKendall's W (effect size):", round(kendall_w, 3), "\n")
# Step 4: Post-hoc comparisons (if significant)
if (friedman_wine$p.value < 0.05) {
cat("\n=== Post-Hoc Pairwise Comparisons ===\n")
posthoc <- pairwise.wilcox.test(
wine_data$rating,
wine_data$wine,
p.adjust.method = "holm",
paired = TRUE
)
print(posthoc)
}
# Step 5: Visualization
library(ggplot2)
ggplot(wine_data, aes(x = reorder(wine, rating, FUN = median),
y = rating, fill = wine)) +
geom_boxplot(alpha = 0.8) +
geom_jitter(width = 0.1, size = 2, alpha = 0.6) +
coord_flip() +
labs(
title = "Wine Competition Ratings by Variety",
subtitle = sprintf("Friedman χ² = %.2f, p < 0.001, Kendall's W = %.2f",
friedman_wine$statistic, kendall_w),
x = NULL,
y = "Judge Rating (1-100)"
) +
scale_fill_viridis_d(option = "plasma", begin = 0.2, end = 0.8) +
theme_minimal(base_size = 12) +
theme(legend.position = "none")
Interpretation: The Friedman test reveals significant differences in wine ratings (χ² = 21.45, p < 0.001). Kendall’s W of 0.89 indicates strong agreement among judges about the ranking. Post-hoc comparisons show Pinot Noir rated significantly higher than all other wines, while Cabernet, Merlot, and Syrah form a lower-rated cluster with some overlap.
This workflow—descriptive statistics, omnibus test, effect size, post-hoc comparisons, and visualization—provides a complete analytical framework you can adapt to any repeated measures non-parametric analysis.