How to Calculate Kendall's Tau in R

Kendall's tau measures the ordinal association between two variables. Unlike Pearson's correlation, which assumes linear relationships and normal distributions, Kendall's tau asks a simpler question:...

Key Insights

  • Kendall’s tau is your best choice for ordinal data, small samples, or when outliers plague your dataset—it’s more robust than Pearson and often more interpretable than Spearman.
  • R’s built-in cor() and cor.test() functions calculate tau-b by default, which correctly handles tied ranks that occur frequently in real-world ordinal data.
  • The interpretation differs from Pearson: a tau of 0.5 doesn’t mean the same thing as a Pearson r of 0.5—tau values are typically lower for the same underlying relationship.

Introduction to Kendall’s Tau

Kendall’s tau measures the ordinal association between two variables. Unlike Pearson’s correlation, which assumes linear relationships and normal distributions, Kendall’s tau asks a simpler question: do the rankings of one variable tend to agree with the rankings of another?

You should reach for Kendall’s tau when:

  • Your data is ordinal (like survey responses on a 1-5 scale)
  • Your sample size is small (under 30 observations)
  • Outliers are present and you can’t justify removing them
  • The relationship isn’t linear but is monotonic
  • You want a correlation measure with a clear probabilistic interpretation

Three variants exist. Tau-a ignores ties entirely—rarely useful with real data. Tau-b adjusts for ties and is appropriate when both variables have the same number of categories (square tables). Tau-c (also called Stuart’s tau-c) handles rectangular tables where variables have different numbers of categories. R’s base functions use tau-b, which works well for most continuous or ordinal data.

The Math Behind Kendall’s Tau

The intuition is straightforward. Take every possible pair of observations and classify them as concordant or discordant. A pair is concordant if the observation with the higher value on X also has the higher value on Y. It’s discordant if the rankings disagree.

For observations (x₁, y₁) and (x₂, y₂):

  • Concordant: (x₂ - x₁) and (y₂ - y₁) have the same sign
  • Discordant: (x₂ - x₁) and (y₂ - y₁) have opposite signs

The basic formula for tau-a is:

τ = (C - D) / n(n-1)/2

Where C is concordant pairs, D is discordant pairs, and n(n-1)/2 is the total number of pairs.

Tau-b adds a correction for ties:

τ_b = (C - D) / √[(n₀ - n₁)(n₀ - n₂)]

Where n₀ = n(n-1)/2, n₁ = ties in X, and n₂ = ties in Y.

The probabilistic interpretation makes tau intuitive: it represents the difference between the probability that two observations are concordant versus discordant. A tau of 0.4 means concordant pairs are 40 percentage points more likely than discordant pairs.

Basic Calculation with cor() Function

R makes this trivially easy. The cor() function handles Kendall’s tau with a single argument change:

# Sample data: student study hours and exam scores
study_hours <- c(2, 4, 6, 8, 10, 12, 14, 16, 18, 20)
exam_scores <- c(55, 62, 58, 72, 75, 80, 78, 88, 92, 95)

# Calculate Kendall's tau
tau <- cor(study_hours, exam_scores, method = "kendall")
print(tau)
# [1] 0.8666667

This strong positive tau (0.87) indicates that students who study more consistently score higher. Compare this to Pearson and Spearman:

# Compare all three correlation methods
correlations <- c(
  pearson = cor(study_hours, exam_scores, method = "pearson"),
  spearman = cor(study_hours, exam_scores, method = "spearman"),
  kendall = cor(study_hours, exam_scores, method = "kendall")
)
print(round(correlations, 3))
#  pearson spearman  kendall 
#    0.968    0.952    0.867

Notice that Kendall’s tau is lower than both Pearson and Spearman for the same data. This is normal—tau is bounded more tightly in practice. Don’t compare tau values directly to Pearson r values; they measure different things.

Hypothesis Testing with cor.test()

Calculating tau is only half the job. You need to know if the correlation is statistically significant:

# Hypothesis test for Kendall's tau
test_result <- cor.test(study_hours, exam_scores, method = "kendall")
print(test_result)

Output:

	Kendall's rank correlation tau

data:  study_hours and exam_scores
z = 3.6599, p-value = 0.0002524
alternative hypothesis: true tau is not equal to 0
sample estimates:
      tau 
0.8666667

Key elements to interpret:

  • tau estimate (0.867): Strong positive ordinal association
  • z-value (3.66): Test statistic; larger absolute values indicate stronger evidence against the null
  • p-value (0.00025): Highly significant—we reject the null hypothesis that tau equals zero

For small samples (n < 10), R uses an exact permutation test. For larger samples, it uses a normal approximation. You can force exact calculation with exact = TRUE, though this becomes computationally expensive beyond 50 observations.

# One-sided test: is tau significantly greater than zero?
test_one_sided <- cor.test(study_hours, exam_scores, 
                            method = "kendall", 
                            alternative = "greater")
print(test_one_sided$p.value)
# [1] 0.0001262

Note that cor.test() with Kendall’s method doesn’t provide confidence intervals—this is a limitation of the base R implementation. For confidence intervals, you’ll need bootstrap methods or the Kendall package.

Handling Ties and Choosing Tau Variants

Ties are common in ordinal data. When two observations share the same rank on either variable, they form neither a concordant nor discordant pair—they’re tied. Tau-b handles this gracefully:

# Data with ties (common in survey responses)
satisfaction <- c(3, 4, 4, 5, 3, 4, 5, 5, 4, 3)  # 1-5 scale
loyalty <- c(2, 4, 3, 5, 2, 4, 5, 4, 4, 3)       # 1-5 scale

# Count the ties
table(satisfaction)
# satisfaction
# 3 4 5 
# 3 4 3

# Kendall's tau-b (R's default) handles ties
tau_with_ties <- cor(satisfaction, loyalty, method = "kendall")
print(tau_with_ties)
# [1] 0.7378648

Let’s manually verify the tie handling makes sense:

# Detailed test output shows the calculation
test_ties <- cor.test(satisfaction, loyalty, method = "kendall")
print(test_ties)

When you have a lot of ties, tau values will be lower than you might expect from eyeballing the data. This is correct behavior—ties genuinely contain less information about the relationship.

For rectangular contingency tables (where X has different categories than Y), consider tau-c from the DescTools package, which we’ll cover shortly.

Working with Real Data: Practical Example

Let’s work through a realistic scenario. Suppose you’re analyzing whether customer service ratings predict repeat purchase behavior:

# Simulate realistic survey data
set.seed(42)
n <- 50

# Service rating (1-5 ordinal scale)
service_rating <- sample(1:5, n, replace = TRUE, 
                         prob = c(0.05, 0.15, 0.30, 0.35, 0.15))

# Repeat purchases (count, related to service but noisy)
repeat_purchases <- round(service_rating * 1.5 + rnorm(n, 0, 2))
repeat_purchases <- pmax(0, repeat_purchases)  # No negative purchases

# Create data frame
customer_data <- data.frame(
  service_rating = service_rating,
  repeat_purchases = repeat_purchases
)

# Examine the data
head(customer_data, 10)
summary(customer_data)

Before calculating correlations, visualize the relationship:

# Scatter plot with jitter for ordinal data
library(ggplot2)

ggplot(customer_data, aes(x = service_rating, y = repeat_purchases)) +
  geom_jitter(width = 0.2, height = 0, alpha = 0.6, size = 3) +
  geom_smooth(method = "lm", se = TRUE, color = "steelblue") +
  labs(
    x = "Service Rating (1-5)",
    y = "Repeat Purchases",
    title = "Service Quality vs. Customer Loyalty"
  ) +
  scale_x_continuous(breaks = 1:5) +
  theme_minimal()

Now calculate and interpret Kendall’s tau:

# Full analysis
tau_test <- cor.test(customer_data$service_rating, 
                     customer_data$repeat_purchases, 
                     method = "kendall")

# Extract and format results
cat("Kendall's Tau Analysis\n")
cat("----------------------\n")
cat(sprintf("Tau estimate: %.3f\n", tau_test$estimate))
cat(sprintf("Z-statistic: %.3f\n", tau_test$statistic))
cat(sprintf("P-value: %.4f\n", tau_test$p.value))
cat(sprintf("Sample size: %d\n", nrow(customer_data)))

Interpretation: A tau of approximately 0.45 with p < 0.001 suggests a moderate, statistically significant positive association. Customers who rate service higher tend to make more repeat purchases. The relationship isn’t deterministic, but the ordinal pattern is clear.

Alternative Packages and Edge Cases

Base R covers most needs, but specialized packages offer additional features:

# Install if needed: install.packages(c("Kendall", "DescTools"))
library(Kendall)
library(DescTools)

# Sample data for comparison
x <- c(1, 2, 3, 4, 5, 5, 6, 7, 8, 9)
y <- c(2, 1, 4, 3, 6, 5, 7, 9, 8, 10)

# Base R tau-b
base_tau <- cor(x, y, method = "kendall")

# Kendall package (provides variance and confidence intervals)
kendall_result <- Kendall(x, y)

# DescTools tau-b and tau-c
tau_b <- KendallTauB(x, y, conf.level = 0.95)
tau_c <- StuartTauC(x, y, conf.level = 0.95)

# Compare results
cat("Base R tau-b:", round(base_tau, 4), "\n")
cat("Kendall package:", round(kendall_result$tau, 4), "\n")
cat("DescTools tau-b:", round(tau_b[1], 4), "\n")
cat("DescTools tau-c:", round(tau_c[1], 4), "\n")

The Kendall package provides standard errors and two-sided p-values with slightly different computational approaches. DescTools gives you confidence intervals directly—useful for reporting.

Handle missing data explicitly:

# Data with NA values
x_missing <- c(1, 2, NA, 4, 5, 6, 7, NA, 9, 10)
y_missing <- c(2, NA, 4, 3, 6, 5, 7, 9, 8, 10)

# Default behavior: returns NA
cor(x_missing, y_missing, method = "kendall")
# [1] NA

# Pairwise complete observations
cor(x_missing, y_missing, method = "kendall", use = "pairwise.complete.obs")
# [1] 0.7142857

# Complete cases only
cor(x_missing, y_missing, method = "kendall", use = "complete.obs")
# [1] 0.7857143

Use "pairwise.complete.obs" for correlation matrices where you want maximum data usage per pair. Use "complete.obs" when you need consistency across multiple correlations from the same subset of observations.

Kendall’s tau is computationally expensive—O(n²) for naive implementations. For datasets over 10,000 observations, consider the pcaPP package which implements an O(n log n) algorithm, or simply use Spearman’s rho as a faster alternative with similar robustness properties.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.