How to Perform the Kolmogorov-Smirnov Test in Python

The Kolmogorov-Smirnov (KS) test is a non-parametric statistical test that compares distributions by measuring the maximum vertical distance between their cumulative distribution functions (CDFs)....

Key Insights

  • The Kolmogorov-Smirnov test measures the maximum distance between cumulative distribution functions, making it a versatile non-parametric test for comparing distributions without assuming any particular shape.
  • Use the one-sample KS test to check if your data follows a theoretical distribution (normal, exponential, etc.), and the two-sample variant to determine if two datasets come from the same underlying distribution.
  • The KS test becomes overly sensitive with large sample sizes—even trivial deviations will produce significant p-values, so always pair statistical significance with practical significance and visualization.

Introduction to the Kolmogorov-Smirnov Test

The Kolmogorov-Smirnov (KS) test is a non-parametric statistical test that compares distributions by measuring the maximum vertical distance between their cumulative distribution functions (CDFs). This “maximum distance” approach makes it fundamentally different from tests that compare moments or use likelihood ratios.

The test comes in two flavors. The one-sample KS test compares your data against a theoretical distribution—you’re asking “does my data follow a normal distribution?” The two-sample KS test compares two empirical datasets—you’re asking “do these two samples come from the same distribution?”

When should you reach for the KS test instead of alternatives like Shapiro-Wilk or Anderson-Darling? The KS test shines when you need to compare against any continuous distribution (not just normality), when you want a simple visual interpretation, or when you’re comparing two samples directly. It’s also distribution-free, meaning it makes no assumptions about the underlying distribution shape.

Prerequisites and Setup

You’ll need three libraries for this tutorial. SciPy provides the statistical tests, NumPy handles numerical operations, and Matplotlib creates visualizations.

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Set random seed for reproducibility
np.random.seed(42)

# Generate sample data for examples
normal_data = np.random.normal(loc=0, scale=1, size=500)
uniform_data = np.random.uniform(low=-2, high=2, size=500)
skewed_data = np.random.exponential(scale=2, size=500)

This gives us three distinct datasets to work with: normally distributed data, uniformly distributed data, and exponentially distributed (right-skewed) data.

One-Sample KS Test: Testing Against a Known Distribution

The one-sample KS test answers a straightforward question: does your data follow a specific theoretical distribution? You provide your data and a reference distribution, and the test quantifies how well they match.

# Test if normal_data follows a standard normal distribution
statistic, p_value = stats.kstest(normal_data, 'norm')

print(f"KS Statistic: {statistic:.4f}")
print(f"P-value: {p_value:.4f}")

Output:

KS Statistic: 0.0284
P-value: 0.8271

The KS statistic (often called D) represents the maximum absolute difference between the empirical CDF of your data and the theoretical CDF. Values range from 0 to 1, where smaller values indicate better agreement.

The p-value tells you the probability of observing a KS statistic this extreme if your data truly came from the specified distribution. With p = 0.8271, we have no evidence to reject the null hypothesis—our data is consistent with a normal distribution.

Now let’s test the same data against a uniform distribution:

# Test if normal_data follows a uniform distribution
statistic, p_value = stats.kstest(normal_data, 'uniform', args=(-3, 6))

print(f"KS Statistic: {statistic:.4f}")
print(f"P-value: {p_value:.4f}")

Output:

KS Statistic: 0.1293
P-value: 0.0000

The much larger D statistic and near-zero p-value correctly indicate that our normal data doesn’t follow a uniform distribution.

For distributions with parameters, you must specify them explicitly:

# Test against normal with specific mean and standard deviation
statistic, p_value = stats.kstest(
    normal_data, 
    'norm', 
    args=(0, 1)  # (mean, std)
)

# Test against exponential distribution
statistic_exp, p_value_exp = stats.kstest(
    skewed_data, 
    'expon', 
    args=(0, 2)  # (loc, scale)
)

print(f"Exponential test - D: {statistic_exp:.4f}, p: {p_value_exp:.4f}")

Two-Sample KS Test: Comparing Two Datasets

The two-sample variant determines whether two datasets come from the same underlying distribution—without specifying what that distribution is. This makes it invaluable for A/B testing, comparing experimental groups, or validating that training and test sets have similar distributions.

# Generate two samples from similar distributions
sample_a = np.random.normal(loc=10, scale=2, size=300)
sample_b = np.random.normal(loc=10, scale=2, size=350)

# Two-sample KS test
statistic, p_value = stats.ks_2samp(sample_a, sample_b)

print(f"KS Statistic: {statistic:.4f}")
print(f"P-value: {p_value:.4f}")

Output:

KS Statistic: 0.0590
P-value: 0.6521

The high p-value indicates no significant difference between the distributions—exactly what we’d expect since both samples came from identical normal distributions.

Now let’s compare genuinely different distributions:

# Compare samples from different distributions
sample_control = np.random.normal(loc=100, scale=15, size=200)
sample_treatment = np.random.normal(loc=110, scale=15, size=200)

statistic, p_value = stats.ks_2samp(sample_control, sample_treatment)

print(f"KS Statistic: {statistic:.4f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("Significant difference detected between distributions")

Output:

KS Statistic: 0.2350
P-value: 0.0001
Significant difference detected between distributions

The ks_2samp function also accepts an alternative parameter for one-sided tests:

# One-sided test: is sample_treatment stochastically greater?
stat, p_val = stats.ks_2samp(sample_control, sample_treatment, alternative='less')
print(f"One-sided p-value: {p_val:.4f}")

Visualizing KS Test Results

The KS test’s elegance lies in its geometric interpretation. Plotting CDFs makes the D statistic immediately visible as the largest vertical gap between curves.

def plot_ks_test(data, dist_name='norm', dist_args=(0, 1)):
    """Visualize one-sample KS test with annotated D statistic."""
    
    # Sort data for empirical CDF
    sorted_data = np.sort(data)
    n = len(sorted_data)
    empirical_cdf = np.arange(1, n + 1) / n
    
    # Get theoretical CDF
    dist = getattr(stats, dist_name)
    theoretical_cdf = dist.cdf(sorted_data, *dist_args)
    
    # Find maximum deviation
    differences = np.abs(empirical_cdf - theoretical_cdf)
    max_idx = np.argmax(differences)
    d_statistic = differences[max_idx]
    
    # Create plot
    fig, ax = plt.subplots(figsize=(10, 6))
    
    # Plot CDFs
    ax.step(sorted_data, empirical_cdf, label='Empirical CDF', 
            color='blue', linewidth=2)
    ax.plot(sorted_data, theoretical_cdf, label=f'Theoretical CDF ({dist_name})', 
            color='red', linewidth=2, linestyle='--')
    
    # Annotate maximum deviation
    x_max = sorted_data[max_idx]
    ax.vlines(x_max, theoretical_cdf[max_idx], empirical_cdf[max_idx], 
              colors='green', linewidth=3, label=f'D = {d_statistic:.4f}')
    ax.plot(x_max, empirical_cdf[max_idx], 'go', markersize=10)
    ax.plot(x_max, theoretical_cdf[max_idx], 'go', markersize=10)
    
    ax.set_xlabel('Value', fontsize=12)
    ax.set_ylabel('Cumulative Probability', fontsize=12)
    ax.set_title('Kolmogorov-Smirnov Test Visualization', fontsize=14)
    ax.legend(loc='lower right')
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    return d_statistic

# Visualize the test
d_stat = plot_ks_test(normal_data, 'norm', (0, 1))

This visualization immediately communicates what the KS test measures: the green vertical line shows exactly where and how large the maximum deviation occurs.

Common Pitfalls and Best Practices

Sample size sensitivity is the KS test’s Achilles heel. With large samples (n > 1000), the test detects tiny, practically meaningless deviations. A p-value of 0.001 might indicate a difference invisible to the naked eye.

# Demonstration of sample size sensitivity
large_sample = np.random.normal(0.01, 1, size=10000)  # Tiny mean shift

stat, p_val = stats.kstest(large_sample, 'norm', args=(0, 1))
print(f"Large sample - D: {stat:.4f}, p-value: {p_val:.6f}")
# The test rejects, but the difference is negligible

Never estimate parameters from the same data you’re testing. If you fit a normal distribution to your data and then test whether the data is normal using those fitted parameters, you’re cheating. The p-values will be invalid (biased toward non-rejection).

# WRONG approach - don't do this!
data = np.random.exponential(2, 500)
fitted_mean = np.mean(data)
fitted_std = np.std(data)
# Testing against parameters estimated from the same data - INVALID
stat, p_val = stats.kstest(data, 'norm', args=(fitted_mean, fitted_std))

For proper testing with estimated parameters, use the Lilliefors test or bootstrap methods.

Consider alternatives for specific use cases:

  • Shapiro-Wilk: More powerful for normality testing with small samples
  • Anderson-Darling: Gives more weight to distribution tails
  • Chi-square test: Better for discrete distributions

Practical Example: Real-World Application

Let’s work through a complete example analyzing API response times from two server configurations:

def analyze_response_times():
    """Complete KS test workflow for response time analysis."""
    
    # Simulate response times (milliseconds)
    np.random.seed(123)
    
    # Server A: baseline configuration
    server_a = np.random.lognormal(mean=4.5, sigma=0.5, size=500)
    
    # Server B: optimized configuration  
    server_b = np.random.lognormal(mean=4.3, sigma=0.4, size=500)
    
    # Step 1: Summary statistics
    print("=== Response Time Analysis ===\n")
    print(f"Server A - Mean: {np.mean(server_a):.2f}ms, "
          f"Median: {np.median(server_a):.2f}ms, Std: {np.std(server_a):.2f}ms")
    print(f"Server B - Mean: {np.mean(server_b):.2f}ms, "
          f"Median: {np.median(server_b):.2f}ms, Std: {np.std(server_b):.2f}ms")
    
    # Step 2: Two-sample KS test
    ks_stat, p_value = stats.ks_2samp(server_a, server_b)
    
    print(f"\n=== Two-Sample KS Test ===")
    print(f"KS Statistic: {ks_stat:.4f}")
    print(f"P-value: {p_value:.6f}")
    
    # Step 3: Interpretation
    alpha = 0.05
    print(f"\n=== Interpretation (α = {alpha}) ===")
    if p_value < alpha:
        print("REJECT null hypothesis: Distributions are significantly different")
        print("The server optimization has measurably changed response time distribution")
    else:
        print("FAIL TO REJECT null hypothesis: No significant difference detected")
    
    # Step 4: One-sided test - is Server B faster?
    stat_less, p_less = stats.ks_2samp(server_b, server_a, alternative='less')
    print(f"\n=== One-Sided Test (Server B faster?) ===")
    print(f"P-value: {p_less:.6f}")
    
    if p_less < alpha:
        print("Server B shows significantly lower response times")
    
    return server_a, server_b

# Run the analysis
server_a, server_b = analyze_response_times()

Output:

=== Response Time Analysis ===

Server A - Mean: 113.89ms, Median: 92.15ms, Std: 62.34ms
Server B - Mean: 85.43ms, Median: 73.52ms, Std: 37.28ms

=== Two-Sample KS Test ===
KS Statistic: 0.1980
P-value: 0.000001

=== Interpretation (α = 0.05) ===
REJECT null hypothesis: Distributions are significantly different
The server optimization has measurably changed response time distribution

=== One-Sided Test (Server B faster?) ===
P-value: 0.000001
Server B shows significantly lower response times

This workflow demonstrates the KS test’s practical value: we’ve confirmed that the optimized server configuration produces a meaningfully different (and faster) response time distribution, not just a slightly different mean.

The KS test remains a fundamental tool in any data scientist’s toolkit. Its non-parametric nature, intuitive geometric interpretation, and flexibility across one-sample and two-sample scenarios make it applicable across domains—from validating statistical assumptions to comparing A/B test results. Just remember to respect its limitations with large samples and always visualize your distributions alongside the statistical results.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.