How to Perform the Kolmogorov-Smirnov Test in R

Key Insights

The Kolmogorov-Smirnov test compares distributions by measuring the maximum vertical distance between cumulative distribution functions, making it useful for both normality testing and comparing two empirical samples.
Always use the Lilliefors correction when testing normality with estimated parameters—the standard K-S test produces inflated p-values when you estimate mean and standard deviation from the same data.
The K-S test struggles with tied values in discrete data; use the exact = FALSE parameter to suppress warnings and get asymptotic p-values when working with rounded or categorical-like continuous data.

Introduction to the Kolmogorov-Smirnov Test

The Kolmogorov-Smirnov (K-S) test is a nonparametric test that compares probability distributions. Unlike tests that focus on specific moments like mean or variance, the K-S test examines the entire shape of distributions by comparing their cumulative distribution functions (CDFs).

The test comes in two variants. The one-sample K-S test compares your data against a theoretical distribution—testing whether your sample could have been drawn from a normal, uniform, exponential, or any other specified distribution. The two-sample K-S test compares two empirical samples to determine if they come from the same underlying distribution.

When should you reach for the K-S test instead of alternatives like Shapiro-Wilk or Anderson-Darling? The K-S test shines when you need distribution-free comparisons and when you’re comparing two samples rather than testing against a theoretical distribution. It’s also useful when you care about the entire distribution shape, not just normality. However, for pure normality testing with a single sample, Shapiro-Wilk typically has more statistical power.

Prerequisites and Setup

The K-S test lives in R’s built-in stats package, so you don’t need to install anything for basic functionality. For visualization and the Lilliefors correction, you’ll want a few additional packages.

# Core functionality - already loaded with base R
# library(stats)

# For visualization
library(ggplot2)

# For Lilliefors test (K-S with estimated parameters)
library(nortest)

# Generate sample data for examples
set.seed(42)
normal_data <- rnorm(100, mean = 50, sd = 10)
skewed_data <- rexp(100, rate = 0.1)
comparison_data <- rnorm(100, mean = 52, sd = 10)

One-Sample K-S Test: Testing Against a Known Distribution

The one-sample test answers this question: “Could my data have come from this specific distribution?” The test calculates the D statistic—the maximum absolute difference between your sample’s empirical CDF and the theoretical CDF.

# Test if data follows a standard normal distribution
result <- ks.test(normal_data, "pnorm", mean = 50, sd = 10)
print(result)

	Exact one-sample Kolmogorov-Smirnov test

data:  normal_data
D = 0.063385, p-value = 0.8037
alternative hypothesis: two-sided

The output gives you two key values. The D statistic (0.063) represents the maximum vertical distance between the empirical and theoretical CDFs. Smaller values indicate better fit. The p-value (0.80) tells you the probability of observing a D statistic this extreme if the data truly came from the specified distribution. A high p-value means you cannot reject the null hypothesis—the data is consistent with the theoretical distribution.

You can test against any distribution R knows about:

# Test against uniform distribution
uniform_sample <- runif(100, min = 0, max = 1)
ks.test(uniform_sample, "punif", min = 0, max = 1)

# Test against exponential distribution
exp_sample <- rexp(100, rate = 2)
ks.test(exp_sample, "pexp", rate = 2)

# Test skewed data against normal - should reject
ks.test(skewed_data, "pnorm", mean = mean(skewed_data), sd = sd(skewed_data))

Two-Sample K-S Test: Comparing Two Datasets

The two-sample variant compares two empirical distributions directly. This is invaluable for A/B testing, before/after comparisons, or validating that two data sources follow the same distribution.

# Compare two samples
two_sample_result <- ks.test(normal_data, comparison_data)
print(two_sample_result)

	Asymptotic two-sample Kolmogorov-Smirnov test

data:  normal_data and comparison_data
D = 0.13, p-value = 0.3521
alternative hypothesis: two-sided

The interpretation changes slightly here. A high p-value (0.35) indicates insufficient evidence to conclude the distributions differ. The samples could plausibly come from the same underlying distribution.

Here’s a practical A/B testing scenario:

# Simulated response times (milliseconds) from two server configurations
set.seed(123)
server_a_times <- rgamma(150, shape = 2, rate = 0.01)  # Original config
server_b_times <- rgamma(150, shape = 2.5, rate = 0.012)  # New config

ab_result <- ks.test(server_a_times, server_b_times)
print(ab_result)

# Calculate practical metrics alongside
cat("\nServer A - Mean:", round(mean(server_a_times), 1), "ms\n")
cat("Server B - Mean:", round(mean(server_b_times), 1), "ms\n")
cat("Distribution difference detected:", ab_result$p.value < 0.05, "\n")

Visualizing K-S Test Results

Numbers tell part of the story; visualization tells the rest. Plotting empirical CDFs makes the D statistic intuitive—it’s literally the largest vertical gap between the curves.

# Create ECDF plot comparing two samples
library(ggplot2)

# Combine data for plotting
plot_data <- data.frame(
  value = c(normal_data, comparison_data),
  group = rep(c("Sample A", "Sample B"), each = 100)
)

# Calculate D statistic location for annotation
ecdf_a <- ecdf(normal_data)
ecdf_b <- ecdf(comparison_data)
all_values <- sort(unique(c(normal_data, comparison_data)))
differences <- abs(ecdf_a(all_values) - ecdf_b(all_values))
max_diff_idx <- which.max(differences)
max_diff_x <- all_values[max_diff_idx]
max_diff_y1 <- ecdf_a(max_diff_x)
max_diff_y2 <- ecdf_b(max_diff_x)

# Create the plot
ggplot(plot_data, aes(x = value, color = group)) +
  stat_ecdf(linewidth = 1) +
  geom_segment(
    aes(x = max_diff_x, xend = max_diff_x, 
        y = max_diff_y1, yend = max_diff_y2),
    color = "red", linewidth = 1.5, linetype = "dashed"
  ) +
  annotate("text", x = max_diff_x + 3, y = (max_diff_y1 + max_diff_y2) / 2,
           label = paste("D =", round(max(differences), 3)),
           color = "red", fontface = "bold") +
  labs(
    title = "Empirical CDFs with K-S D Statistic",
    x = "Value",
    y = "Cumulative Probability",
    color = "Sample"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

For one-sample tests, overlay the theoretical CDF:

# One-sample visualization
ggplot(data.frame(x = normal_data), aes(x = x)) +
  stat_ecdf(aes(color = "Empirical"), linewidth = 1) +
  stat_function(
    fun = pnorm, 
    args = list(mean = 50, sd = 10),
    aes(color = "Theoretical Normal"),
    linewidth = 1
  ) +
  labs(
    title = "Sample vs. Theoretical Normal Distribution",
    x = "Value",
    y = "Cumulative Probability",
    color = "Distribution"
  ) +
  theme_minimal()

Limitations and Common Pitfalls

The K-S test has several gotchas that trip up practitioners.

Parameter estimation bias: When you estimate distribution parameters from your data (like using mean(x) and sd(x) for a normality test), the standard K-S test produces anti-conservative p-values. Use the Lilliefors test instead:

library(nortest)

# Wrong approach - inflated p-values
wrong_result <- ks.test(normal_data, "pnorm", 
                         mean = mean(normal_data), 
                         sd = sd(normal_data))

# Correct approach - Lilliefors test
correct_result <- lillie.test(normal_data)

cat("Standard K-S p-value:", wrong_result$p.value, "\n")
cat("Lilliefors p-value:", correct_result$p.value, "\n")

Ties in data: The K-S test assumes continuous distributions. Tied values (duplicates) violate this assumption:

# Data with ties triggers a warning
discrete_like <- round(rnorm(100, 50, 10))
ks.test(discrete_like, "pnorm", mean = 50, sd = 10)

# Suppress warning and use asymptotic p-value
ks.test(discrete_like, "pnorm", mean = 50, sd = 10, exact = FALSE)

Sample size sensitivity: With large samples, the test detects trivial differences. With small samples, it lacks power to detect real differences. Always pair statistical significance with effect size considerations.

# Large sample detects tiny difference
set.seed(42)
large_a <- rnorm(10000, mean = 0, sd = 1)
large_b <- rnorm(10000, mean = 0.05, sd = 1)  # Barely different

ks.test(large_a, large_b)  # Likely significant despite trivial difference

Practical Example: End-to-End Workflow

Let’s work through a complete analysis. You’re comparing API response times before and after a performance optimization.

# Complete K-S test workflow for response time analysis
library(ggplot2)
library(nortest)

# Simulated response time data (milliseconds)
set.seed(2024)
before_optimization <- c(
  rlnorm(200, meanlog = 5, sdlog = 0.5),
  rlnorm(20, meanlog = 6, sdlog = 0.3)  # Some slow requests
)
after_optimization <- rlnorm(220, meanlog = 4.8, sdlog = 0.4)

# Step 1: Exploratory summary
cat("=== Response Time Summary ===\n")
cat("Before - Mean:", round(mean(before_optimization), 1), "ms,",
    "Median:", round(median(before_optimization), 1), "ms\n")
cat("After  - Mean:", round(mean(after_optimization), 1), "ms,",
    "Median:", round(median(after_optimization), 1), "ms\n\n")

# Step 2: Test if distributions are normal (they probably aren't)
cat("=== Normality Tests (Lilliefors) ===\n")
before_normal <- lillie.test(before_optimization)
after_normal <- lillie.test(after_optimization)
cat("Before optimization - p-value:", format(before_normal$p.value, digits = 3), "\n")
cat("After optimization  - p-value:", format(after_normal$p.value, digits = 3), "\n\n")

# Step 3: Two-sample K-S test
cat("=== Two-Sample K-S Test ===\n")
ks_result <- ks.test(before_optimization, after_optimization)
print(ks_result)

# Step 4: Interpretation
cat("\n=== Interpretation ===\n")
if (ks_result$p.value < 0.05) {
  cat("The distributions differ significantly (p =", 
      format(ks_result$p.value, digits = 3), ")\n")
  cat("D statistic:", round(ks_result$statistic, 3), "\n")
  cat("The optimization changed the response time distribution.\n")
} else {
  cat("No significant difference detected between distributions.\n")
}

# Step 5: Visualization
plot_df <- data.frame(
  time = c(before_optimization, after_optimization),
  period = rep(c("Before", "After"), c(length(before_optimization), 
                                        length(after_optimization)))
)

ggplot(plot_df, aes(x = time, color = period)) +
  stat_ecdf(linewidth = 1.2) +
  scale_x_log10() +
  labs(
    title = "API Response Time Distribution: Before vs. After Optimization",
    subtitle = paste("K-S Test: D =", round(ks_result$statistic, 3), 
                     ", p =", format(ks_result$p.value, digits = 3)),
    x = "Response Time (ms, log scale)",
    y = "Cumulative Probability",
    color = "Period"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

This workflow gives you a complete picture: summary statistics for context, normality checks to justify using nonparametric methods, the K-S test result, and a visualization that makes the distributional shift immediately apparent.

The K-S test won’t tell you everything about your distributions, but it provides a rigorous, assumption-light method for detecting distributional differences. Combine it with visualizations and domain knowledge for actionable insights.