How to Perform Tukey's HSD Test in R

Key Insights

Tukey’s HSD test controls the family-wise error rate when comparing all pairs of group means, making it the go-to post-hoc test after a significant one-way ANOVA
Base R’s TukeyHSD() function handles most use cases, but agricolae::HSD.test() provides compact letter displays that are essential for publication-ready figures
Always verify ANOVA assumptions (normality, homogeneity of variance) before running Tukey’s test—violations can lead to inflated Type I error rates

Introduction to Tukey’s HSD Test

When your ANOVA returns a significant p-value, you know that at least one group differs from the others. But which ones? Running multiple t-tests introduces a serious problem: each test carries a 5% chance of false positives, and those errors compound quickly. With just 5 groups, you’d run 10 pairwise comparisons, inflating your actual error rate to nearly 40%.

Tukey’s Honestly Significant Difference (HSD) test solves this by controlling the family-wise error rate. It uses the studentized range distribution to calculate a single critical value that applies to all pairwise comparisons simultaneously. The result: you maintain your desired alpha level (typically 0.05) across all comparisons, not per comparison.

Use Tukey’s HSD when you want to compare all possible pairs of group means after a significant one-way ANOVA. If you’re only interested in specific planned comparisons, other methods like Dunnett’s test (comparing treatments to a control) may be more appropriate and more powerful.

Prerequisites and Data Setup

Base R includes everything you need for basic Tukey’s HSD analysis. However, two additional packages extend functionality significantly:

# Base R provides TukeyHSD() - no installation needed

# For compact letter displays and additional features
install.packages("agricolae")

# For generalized linear hypothesis testing
install.packages("multcomp")

# For visualization
install.packages("ggplot2")

# Load packages
library(agricolae)
library(multcomp)
library(ggplot2)

Let’s create a realistic dataset: crop yields (in tons per hectare) from a randomized experiment testing four fertilizer treatments.

# Set seed for reproducibility
set.seed(42)

# Create experimental data
fertilizer_data <- data.frame(
  treatment = factor(rep(c("Control", "Nitrogen", "Phosphorus", "NPK"), each = 8)),
  yield = c(
    rnorm(8, mean = 4.2, sd = 0.5),   # Control
    rnorm(8, mean = 5.1, sd = 0.6),   # Nitrogen
    rnorm(8, mean = 4.8, sd = 0.5),   # Phosphorus
    rnorm(8, mean = 6.3, sd = 0.7)    # NPK (combined)
  )
)

# Verify structure
str(fertilizer_data)
head(fertilizer_data)

# Quick summary by group
aggregate(yield ~ treatment, data = fertilizer_data, 
          FUN = function(x) c(mean = mean(x), sd = sd(x)))

Running ANOVA First

Tukey’s HSD is a post-hoc test—it only makes sense after ANOVA indicates significant differences exist. Running it without first confirming ANOVA significance is statistically inappropriate and wastes your time.

# Fit one-way ANOVA model
aov_model <- aov(yield ~ treatment, data = fertilizer_data)

# View ANOVA table
summary(aov_model)

Output:

            Df Sum Sq Mean Sq F value   Pr(>F)    
treatment    3  21.89   7.296   21.77 1.07e-07 ***
Residuals   28   9.38   0.335

Before proceeding, verify ANOVA assumptions:

# Check normality of residuals
shapiro.test(residuals(aov_model))

# Check homogeneity of variance
bartlett.test(yield ~ treatment, data = fertilizer_data)

# Visual diagnostics
par(mfrow = c(2, 2))
plot(aov_model)
par(mfrow = c(1, 1))

If Shapiro-Wilk p > 0.05 and Bartlett’s test p > 0.05, assumptions are reasonably met. With our significant ANOVA result (p < 0.001), we proceed to Tukey’s test.

Performing Tukey’s HSD Test

The TukeyHSD() function takes an aov object and returns pairwise comparisons:

# Run Tukey's HSD test
tukey_result <- TukeyHSD(aov_model, conf.level = 0.95)

# View results
print(tukey_result)

Output:

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = yield ~ treatment, data = fertilizer_data)

$treatment
                         diff        lwr       upr     p adj
Nitrogen-Control    0.9284937  0.2678498 1.5891376 0.0032767
NPK-Control         2.0855498  1.4249059 2.7461937 0.0000000
Phosphorus-Control  0.5765498 -0.0840941 1.2371937 0.1029505
NPK-Nitrogen        1.1570561  0.4964122 1.8177000 0.0002698
Phosphorus-Nitrogen -0.3519439 -1.0125878 0.3087000 0.4846986
Phosphorus-NPK     -1.5090000 -2.1696439 -0.8483561 0.0000042

Understanding the output columns:

diff: Mean difference between groups (row minus column)
lwr/upr: Lower and upper bounds of the 95% confidence interval
p adj: Adjusted p-value controlling family-wise error rate

Interpretation: NPK fertilizer significantly outperforms all other treatments. Nitrogen beats the control, but phosphorus alone doesn’t differ significantly from the control. The confidence intervals that don’t contain zero correspond to significant differences.

Visualizing Results

Base R provides a quick visualization method:

# Built-in plot
plot(tukey_result, las = 1, col = "steelblue")

This creates a confidence interval plot where intervals not crossing zero indicate significant differences. For publication-quality figures, use ggplot2:

# Extract Tukey results into a data frame
tukey_df <- as.data.frame(tukey_result$treatment)
tukey_df$comparison <- rownames(tukey_df)

# Create publication-ready plot
ggplot(tukey_df, aes(x = reorder(comparison, diff), y = diff)) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
  geom_errorbar(aes(ymin = lwr, ymax = upr), width = 0.2, linewidth = 0.8) +
  geom_point(aes(color = `p adj` < 0.05), size = 3) +
  scale_color_manual(values = c("TRUE" = "firebrick", "FALSE" = "gray40"),
                     labels = c("Not Significant", "Significant"),
                     name = "p < 0.05") +
  coord_flip() +
  labs(
    x = "Pairwise Comparison",
    y = "Difference in Mean Yield (tons/ha)",
    title = "Tukey's HSD: Fertilizer Treatment Comparisons"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom")

Alternative Packages and Methods

agricolae: Compact Letter Display

The agricolae package produces grouped letters—groups sharing a letter don’t differ significantly. This format is standard in agricultural and biological research:

# HSD.test from agricolae
hsd_agricolae <- HSD.test(aov_model, "treatment", group = TRUE)

# View grouped results
print(hsd_agricolae$groups)

Output:

           yield groups
NPK        6.298      a
Nitrogen   5.141      b
Phosphorus 4.789     bc
Control    4.213      c

NPK (group “a”) differs from all others. Nitrogen (group “b”) overlaps with phosphorus (“bc”), meaning they don’t differ significantly. This format works perfectly for annotating bar charts:

# Create summary for plotting
group_summary <- aggregate(yield ~ treatment, data = fertilizer_data, 
                           FUN = function(x) c(mean = mean(x), se = sd(x)/sqrt(length(x))))
group_summary <- do.call(data.frame, group_summary)
names(group_summary) <- c("treatment", "mean", "se")

# Add letter groups
group_summary$letters <- c("c", "b", "bc", "a")  # Match order

# Bar plot with significance letters
ggplot(group_summary, aes(x = treatment, y = mean)) +
  geom_col(fill = "steelblue", alpha = 0.8) +
  geom_errorbar(aes(ymin = mean - se, ymax = mean + se), width = 0.2) +
  geom_text(aes(y = mean + se + 0.2, label = letters), size = 5) +
  labs(x = "Treatment", y = "Yield (tons/ha)") +
  theme_minimal()

multcomp: Maximum Flexibility

The multcomp package handles complex designs and custom contrasts:

# General linear hypothesis approach
glht_result <- glht(aov_model, linfct = mcp(treatment = "Tukey"))
summary(glht_result)

# Compact letter display
cld(glht_result)

Use multcomp when you need custom contrast matrices or when working with generalized linear models.

Common Pitfalls and Best Practices

Handling Unbalanced Designs

Tukey’s HSD assumes equal sample sizes. With unbalanced data, use Tukey-Kramer (the default in R’s TukeyHSD() actually implements this adjustment automatically). For severely unbalanced designs, consider Games-Howell:

# Games-Howell via rstatix package
library(rstatix)
games_howell_test(fertilizer_data, yield ~ treatment)

Choosing the Right Post-Hoc Test

Tukey’s HSD: All pairwise comparisons, equal importance
Dunnett’s: Compare treatments to a single control
Bonferroni: Conservative, works with any comparison set
Scheffé: Most conservative, allows complex contrasts

Reporting Results

For academic papers, report the test statistic, degrees of freedom, and adjusted p-values:

“Post-hoc analysis using Tukey’s HSD revealed that NPK fertilizer produced significantly higher yields than the control (mean difference = 2.09 tons/ha, 95% CI [1.42, 2.75], p < 0.001), nitrogen alone (mean difference = 1.16 tons/ha, p < 0.001), and phosphorus alone (mean difference = 1.51 tons/ha, p < 0.001).”

Always report effect sizes alongside p-values. The mean differences and confidence intervals from Tukey’s output serve this purpose directly.

Final Recommendations

Start with TukeyHSD() for quick analysis and interpretation. Switch to agricolae::HSD.test() when you need compact letter displays for figures. Reserve multcomp::glht() for complex experimental designs or when you need compatibility with mixed models. Whatever method you choose, always verify your ANOVA assumptions first—no post-hoc test can rescue conclusions drawn from violated assumptions.

Introduction to Tukey’s HSD Test

Prerequisites and Data Setup

Running ANOVA First

Performing Tukey’s HSD Test

Visualizing Results

Alternative Packages and Methods

agricolae: Compact Letter Display

multcomp: Maximum Flexibility

Common Pitfalls and Best Practices

Liked this? There's more.

Similar Articles