How to Perform the Runs Test in Python

Key Insights

The runs test detects non-randomness in binary sequences by comparing the observed number of runs to what you’d expect from a truly random process
Converting continuous data to binary sequences (typically using the median as a cutoff) is the critical preprocessing step that determines your test’s validity
A significant result (low p-value) indicates your sequence likely isn’t random, but the test won’t tell you why—you’ll need additional analysis to identify the underlying pattern

Introduction to the Runs Test

The runs test (also called the Wald-Wolfowitz test) answers a deceptively simple question: is this sequence random? You have a series of binary outcomes—heads and tails, up and down movements, pass and fail results—and you want to know if the order exhibits any pattern.

The test works by counting “runs,” which are consecutive sequences of identical values. Consider HHHTTHH: this contains three runs (HHH, TT, HH). A truly random sequence should have neither too many runs (suggesting alternation) nor too few (suggesting clustering).

Use the runs test when you need to:

Validate random number generators
Check for independence in quality control measurements
Test whether stock price movements follow a random walk
Detect patterns in manufacturing defects over time

The test is non-parametric, meaning it makes no assumptions about the underlying distribution of your data. This makes it robust but also means it only tests one specific aspect of randomness.

Understanding Runs and Test Statistics

A run is a maximal sequence of consecutive identical elements. In the sequence AABBBAA, you have three runs: AA, BBB, and AA.

The null hypothesis states that the sequence is randomly ordered. Under this assumption, we can calculate the expected number of runs and its variance.

For a sequence with n₁ elements of type 1 and n₂ elements of type 2:

Expected runs: E(R) = (2 * n₁ * n₂) / (n₁ + n₂) + 1
Variance: Var(R) = (2 * n₁ * n₂ * (2 * n₁ * n₂ - n₁ - n₂)) / ((n₁ + n₂)² * (n₁ + n₂ - 1))

The z-statistic follows: Z = (R - E(R)) / √Var(R)

Here’s how to count runs manually:

def count_runs(sequence):
    """
    Count the number of runs in a binary sequence.
    
    Parameters:
    -----------
    sequence : list or array-like
        Binary sequence (can be any two distinct values)
    
    Returns:
    --------
    tuple: (number of runs, count of first value, count of second value)
    """
    if len(sequence) < 2:
        return len(sequence), len(sequence), 0
    
    sequence = list(sequence)
    runs = 1
    
    for i in range(1, len(sequence)):
        if sequence[i] != sequence[i - 1]:
            runs += 1
    
    # Count occurrences of each unique value
    unique_values = list(set(sequence))
    n1 = sequence.count(unique_values[0])
    n2 = len(sequence) - n1
    
    return runs, n1, n2


# Example usage
binary_seq = [1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1]
runs, n1, n2 = count_runs(binary_seq)
print(f"Sequence: {binary_seq}")
print(f"Number of runs: {runs}")
print(f"Count of 1s: {n1}, Count of 0s: {n2}")

Output:

Sequence: [1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1]
Number of runs: 8
Count of 1s: 6, Count of 0s: 6

Preparing Your Data

Most real-world data isn’t binary. You’ll need to convert continuous measurements into a binary sequence before applying the runs test. The standard approach uses the median as the cutoff point.

import numpy as np

def convert_to_binary(data, cutoff=None):
    """
    Convert continuous data to binary sequence.
    
    Parameters:
    -----------
    data : array-like
        Continuous numeric data
    cutoff : float, optional
        Value to split on. If None, uses median.
    
    Returns:
    --------
    numpy.ndarray: Binary sequence (1 for above cutoff, 0 for below)
    """
    data = np.array(data)
    
    if cutoff is None:
        cutoff = np.median(data)
    
    # Values exactly at cutoff are typically assigned to one group
    # Here we assign them to the "below" group (0)
    binary = (data > cutoff).astype(int)
    
    return binary, cutoff


# Example: Temperature readings
temperatures = [72.1, 73.5, 71.8, 74.2, 75.0, 73.1, 72.8, 76.3, 
                74.8, 73.9, 72.5, 71.9, 73.3, 74.1, 75.5]

binary_temps, median_temp = convert_to_binary(temperatures)
print(f"Original data: {temperatures}")
print(f"Median cutoff: {median_temp}")
print(f"Binary sequence: {binary_temps}")

Watch out for these preprocessing pitfalls:

Ties at the median: Decide consistently how to handle values exactly equal to the cutoff
Missing values: Remove or impute before conversion
Outliers: They won’t affect the binary conversion much, but consider whether they’re valid data points

Implementing the Runs Test with statsmodels

The statsmodels library provides a ready-to-use implementation. Note that it lives in the sandbox module, which contains experimental code that may change.

from statsmodels.sandbox.stats.runs import runstest_1samp
import numpy as np

def perform_runs_test(data, cutoff=None):
    """
    Perform runs test using statsmodels.
    
    Parameters:
    -----------
    data : array-like
        Data to test (can be continuous or binary)
    cutoff : float, optional
        For continuous data, the value to split on
    
    Returns:
    --------
    dict: Test results including z-statistic and p-value
    """
    data = np.array(data)
    
    # runstest_1samp expects continuous data and uses median by default
    z_stat, p_value = runstest_1samp(data, cutoff=cutoff)
    
    return {
        'z_statistic': z_stat,
        'p_value': p_value,
        'is_random': p_value > 0.05  # Using conventional alpha
    }


# Test with sample data
np.random.seed(42)
random_data = np.random.randn(100)  # Should appear random
trending_data = np.cumsum(np.random.randn(100))  # Random walk, less random in runs

print("Testing truly random data:")
result_random = perform_runs_test(random_data)
print(f"  Z-statistic: {result_random['z_statistic']:.4f}")
print(f"  P-value: {result_random['p_value']:.4f}")
print(f"  Appears random: {result_random['is_random']}")

print("\nTesting trending data:")
result_trend = perform_runs_test(trending_data)
print(f"  Z-statistic: {result_trend['z_statistic']:.4f}")
print(f"  P-value: {result_trend['p_value']:.4f}")
print(f"  Appears random: {result_trend['is_random']}")

Manual Implementation from Scratch

Building the test yourself deepens understanding and removes external dependencies:

import numpy as np
from scipy import stats

def runs_test_manual(sequence):
    """
    Manual implementation of the runs test.
    
    Parameters:
    -----------
    sequence : array-like
        Binary sequence to test
    
    Returns:
    --------
    dict: Complete test results
    """
    sequence = np.array(sequence)
    
    # Count runs
    runs = 1
    for i in range(1, len(sequence)):
        if sequence[i] != sequence[i - 1]:
            runs += 1
    
    # Count each category
    n1 = np.sum(sequence == sequence[0])
    n2 = len(sequence) - n1
    
    # Handle edge cases
    if n1 == 0 or n2 == 0:
        raise ValueError("Sequence must contain both values")
    
    n = n1 + n2
    
    # Calculate expected runs and variance
    expected_runs = (2 * n1 * n2) / n + 1
    
    variance = (2 * n1 * n2 * (2 * n1 * n2 - n)) / (n**2 * (n - 1))
    
    # Calculate z-score
    std_dev = np.sqrt(variance)
    z_score = (runs - expected_runs) / std_dev
    
    # Two-tailed p-value
    p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))
    
    return {
        'observed_runs': runs,
        'expected_runs': expected_runs,
        'variance': variance,
        'z_score': z_score,
        'p_value': p_value,
        'n1': n1,
        'n2': n2
    }


# Verify against a known example
test_sequence = [1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0]
result = runs_test_manual(test_sequence)

print("Manual Runs Test Results:")
print(f"  Observed runs: {result['observed_runs']}")
print(f"  Expected runs: {result['expected_runs']:.4f}")
print(f"  Variance: {result['variance']:.4f}")
print(f"  Z-score: {result['z_score']:.4f}")
print(f"  P-value: {result['p_value']:.4f}")

Practical Example: Testing Stock Returns for Randomness

Let’s apply the runs test to real financial data to test the random walk hypothesis:

import numpy as np
import matplotlib.pyplot as plt
from statsmodels.sandbox.stats.runs import runstest_1samp

# Simulated stock returns (replace with real data via yfinance if needed)
np.random.seed(123)
# Simulate 252 trading days (1 year)
dates = np.arange(252)
returns = np.random.randn(252) * 0.02  # ~2% daily volatility

def analyze_stock_randomness(returns, ticker="STOCK"):
    """
    Analyze stock returns for randomness using runs test.
    """
    # Convert to binary: positive vs negative returns
    binary_returns = (returns > 0).astype(int)
    
    # Perform runs test
    z_stat, p_value = runstest_1samp(returns, cutoff=0)
    
    # Count runs for visualization
    runs = []
    current_run_start = 0
    current_value = binary_returns[0]
    
    for i in range(1, len(binary_returns)):
        if binary_returns[i] != current_value:
            runs.append({
                'start': current_run_start,
                'end': i - 1,
                'value': current_value,
                'length': i - current_run_start
            })
            current_run_start = i
            current_value = binary_returns[i]
    
    # Don't forget the last run
    runs.append({
        'start': current_run_start,
        'end': len(binary_returns) - 1,
        'value': current_value,
        'length': len(binary_returns) - current_run_start
    })
    
    # Visualization
    fig, axes = plt.subplots(2, 1, figsize=(12, 8))
    
    # Plot 1: Returns with runs highlighted
    colors = ['red' if r['value'] == 0 else 'green' for r in runs]
    for i, run in enumerate(runs):
        axes[0].axvspan(run['start'], run['end'] + 1, 
                        alpha=0.3, color=colors[i])
    axes[0].bar(range(len(returns)), returns, color='steelblue', width=1.0)
    axes[0].axhline(y=0, color='black', linestyle='-', linewidth=0.5)
    axes[0].set_xlabel('Trading Day')
    axes[0].set_ylabel('Daily Return')
    axes[0].set_title(f'{ticker} Daily Returns with Runs Highlighted')
    
    # Plot 2: Run length distribution
    run_lengths = [r['length'] for r in runs]
    axes[1].hist(run_lengths, bins=range(1, max(run_lengths) + 2), 
                 edgecolor='black', alpha=0.7)
    axes[1].set_xlabel('Run Length')
    axes[1].set_ylabel('Frequency')
    axes[1].set_title('Distribution of Run Lengths')
    
    plt.tight_layout()
    plt.savefig('stock_runs_analysis.png', dpi=150)
    plt.close()
    
    return {
        'ticker': ticker,
        'z_statistic': z_stat,
        'p_value': p_value,
        'total_runs': len(runs),
        'avg_run_length': np.mean(run_lengths),
        'max_run_length': max(run_lengths),
        'conclusion': 'Random' if p_value > 0.05 else 'Non-random'
    }


# Run the analysis
result = analyze_stock_randomness(returns, "SIMULATED")

print("Stock Returns Randomness Analysis")
print("=" * 40)
print(f"Ticker: {result['ticker']}")
print(f"Z-statistic: {result['z_statistic']:.4f}")
print(f"P-value: {result['p_value']:.4f}")
print(f"Total runs: {result['total_runs']}")
print(f"Average run length: {result['avg_run_length']:.2f}")
print(f"Maximum run length: {result['max_run_length']}")
print(f"Conclusion: {result['conclusion']}")

Interpreting Results and Limitations

Reading the p-value: A p-value below your significance threshold (typically 0.05) suggests the sequence is not random. But direction matters:

Negative z-score: Fewer runs than expected, indicating clustering
Positive z-score: More runs than expected, indicating alternation

Limitations to keep in mind:

Binary reduction loses information: Converting continuous data to binary discards magnitude. A sequence of returns [+0.01%, +5%, +0.02%] becomes [1, 1, 1], same as [+5%, +5%, +5%].
Sensitive to cutoff choice: Using mean versus median can yield different conclusions. Document your choice and consider sensitivity analysis.
Only tests one aspect of randomness: The runs test checks for independence between consecutive observations. It won’t detect periodic patterns with gaps or other complex structures.
Sample size matters: For small samples (n < 20), use exact distributions rather than the normal approximation.

Alternative tests to consider:

Ljung-Box test: For autocorrelation in time series
Durbin-Watson test: Specifically for regression residuals
NIST test suite: Comprehensive randomness testing for cryptographic applications

The runs test is a useful first check, but rarely sufficient on its own. Combine it with visual inspection and other statistical tests for robust conclusions about randomness in your data.