How to Perform the Runs Test in Python
The runs test (also called the Wald-Wolfowitz test) answers a deceptively simple question: is this sequence random? You have a series of binary outcomes—heads and tails, up and down movements, pass...
Key Insights
- The runs test detects non-randomness in binary sequences by comparing the observed number of runs to what you’d expect from a truly random process
- Converting continuous data to binary sequences (typically using the median as a cutoff) is the critical preprocessing step that determines your test’s validity
- A significant result (low p-value) indicates your sequence likely isn’t random, but the test won’t tell you why—you’ll need additional analysis to identify the underlying pattern
Introduction to the Runs Test
The runs test (also called the Wald-Wolfowitz test) answers a deceptively simple question: is this sequence random? You have a series of binary outcomes—heads and tails, up and down movements, pass and fail results—and you want to know if the order exhibits any pattern.
The test works by counting “runs,” which are consecutive sequences of identical values. Consider HHHTTHH: this contains three runs (HHH, TT, HH). A truly random sequence should have neither too many runs (suggesting alternation) nor too few (suggesting clustering).
Use the runs test when you need to:
- Validate random number generators
- Check for independence in quality control measurements
- Test whether stock price movements follow a random walk
- Detect patterns in manufacturing defects over time
The test is non-parametric, meaning it makes no assumptions about the underlying distribution of your data. This makes it robust but also means it only tests one specific aspect of randomness.
Understanding Runs and Test Statistics
A run is a maximal sequence of consecutive identical elements. In the sequence AABBBAA, you have three runs: AA, BBB, and AA.
The null hypothesis states that the sequence is randomly ordered. Under this assumption, we can calculate the expected number of runs and its variance.
For a sequence with n₁ elements of type 1 and n₂ elements of type 2:
- Expected runs:
E(R) = (2 * n₁ * n₂) / (n₁ + n₂) + 1 - Variance:
Var(R) = (2 * n₁ * n₂ * (2 * n₁ * n₂ - n₁ - n₂)) / ((n₁ + n₂)² * (n₁ + n₂ - 1))
The z-statistic follows: Z = (R - E(R)) / √Var(R)
Here’s how to count runs manually:
def count_runs(sequence):
"""
Count the number of runs in a binary sequence.
Parameters:
-----------
sequence : list or array-like
Binary sequence (can be any two distinct values)
Returns:
--------
tuple: (number of runs, count of first value, count of second value)
"""
if len(sequence) < 2:
return len(sequence), len(sequence), 0
sequence = list(sequence)
runs = 1
for i in range(1, len(sequence)):
if sequence[i] != sequence[i - 1]:
runs += 1
# Count occurrences of each unique value
unique_values = list(set(sequence))
n1 = sequence.count(unique_values[0])
n2 = len(sequence) - n1
return runs, n1, n2
# Example usage
binary_seq = [1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1]
runs, n1, n2 = count_runs(binary_seq)
print(f"Sequence: {binary_seq}")
print(f"Number of runs: {runs}")
print(f"Count of 1s: {n1}, Count of 0s: {n2}")
Output:
Sequence: [1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1]
Number of runs: 8
Count of 1s: 6, Count of 0s: 6
Preparing Your Data
Most real-world data isn’t binary. You’ll need to convert continuous measurements into a binary sequence before applying the runs test. The standard approach uses the median as the cutoff point.
import numpy as np
def convert_to_binary(data, cutoff=None):
"""
Convert continuous data to binary sequence.
Parameters:
-----------
data : array-like
Continuous numeric data
cutoff : float, optional
Value to split on. If None, uses median.
Returns:
--------
numpy.ndarray: Binary sequence (1 for above cutoff, 0 for below)
"""
data = np.array(data)
if cutoff is None:
cutoff = np.median(data)
# Values exactly at cutoff are typically assigned to one group
# Here we assign them to the "below" group (0)
binary = (data > cutoff).astype(int)
return binary, cutoff
# Example: Temperature readings
temperatures = [72.1, 73.5, 71.8, 74.2, 75.0, 73.1, 72.8, 76.3,
74.8, 73.9, 72.5, 71.9, 73.3, 74.1, 75.5]
binary_temps, median_temp = convert_to_binary(temperatures)
print(f"Original data: {temperatures}")
print(f"Median cutoff: {median_temp}")
print(f"Binary sequence: {binary_temps}")
Watch out for these preprocessing pitfalls:
- Ties at the median: Decide consistently how to handle values exactly equal to the cutoff
- Missing values: Remove or impute before conversion
- Outliers: They won’t affect the binary conversion much, but consider whether they’re valid data points
Implementing the Runs Test with statsmodels
The statsmodels library provides a ready-to-use implementation. Note that it lives in the sandbox module, which contains experimental code that may change.
from statsmodels.sandbox.stats.runs import runstest_1samp
import numpy as np
def perform_runs_test(data, cutoff=None):
"""
Perform runs test using statsmodels.
Parameters:
-----------
data : array-like
Data to test (can be continuous or binary)
cutoff : float, optional
For continuous data, the value to split on
Returns:
--------
dict: Test results including z-statistic and p-value
"""
data = np.array(data)
# runstest_1samp expects continuous data and uses median by default
z_stat, p_value = runstest_1samp(data, cutoff=cutoff)
return {
'z_statistic': z_stat,
'p_value': p_value,
'is_random': p_value > 0.05 # Using conventional alpha
}
# Test with sample data
np.random.seed(42)
random_data = np.random.randn(100) # Should appear random
trending_data = np.cumsum(np.random.randn(100)) # Random walk, less random in runs
print("Testing truly random data:")
result_random = perform_runs_test(random_data)
print(f" Z-statistic: {result_random['z_statistic']:.4f}")
print(f" P-value: {result_random['p_value']:.4f}")
print(f" Appears random: {result_random['is_random']}")
print("\nTesting trending data:")
result_trend = perform_runs_test(trending_data)
print(f" Z-statistic: {result_trend['z_statistic']:.4f}")
print(f" P-value: {result_trend['p_value']:.4f}")
print(f" Appears random: {result_trend['is_random']}")
Manual Implementation from Scratch
Building the test yourself deepens understanding and removes external dependencies:
import numpy as np
from scipy import stats
def runs_test_manual(sequence):
"""
Manual implementation of the runs test.
Parameters:
-----------
sequence : array-like
Binary sequence to test
Returns:
--------
dict: Complete test results
"""
sequence = np.array(sequence)
# Count runs
runs = 1
for i in range(1, len(sequence)):
if sequence[i] != sequence[i - 1]:
runs += 1
# Count each category
n1 = np.sum(sequence == sequence[0])
n2 = len(sequence) - n1
# Handle edge cases
if n1 == 0 or n2 == 0:
raise ValueError("Sequence must contain both values")
n = n1 + n2
# Calculate expected runs and variance
expected_runs = (2 * n1 * n2) / n + 1
variance = (2 * n1 * n2 * (2 * n1 * n2 - n)) / (n**2 * (n - 1))
# Calculate z-score
std_dev = np.sqrt(variance)
z_score = (runs - expected_runs) / std_dev
# Two-tailed p-value
p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))
return {
'observed_runs': runs,
'expected_runs': expected_runs,
'variance': variance,
'z_score': z_score,
'p_value': p_value,
'n1': n1,
'n2': n2
}
# Verify against a known example
test_sequence = [1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0]
result = runs_test_manual(test_sequence)
print("Manual Runs Test Results:")
print(f" Observed runs: {result['observed_runs']}")
print(f" Expected runs: {result['expected_runs']:.4f}")
print(f" Variance: {result['variance']:.4f}")
print(f" Z-score: {result['z_score']:.4f}")
print(f" P-value: {result['p_value']:.4f}")
Practical Example: Testing Stock Returns for Randomness
Let’s apply the runs test to real financial data to test the random walk hypothesis:
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.sandbox.stats.runs import runstest_1samp
# Simulated stock returns (replace with real data via yfinance if needed)
np.random.seed(123)
# Simulate 252 trading days (1 year)
dates = np.arange(252)
returns = np.random.randn(252) * 0.02 # ~2% daily volatility
def analyze_stock_randomness(returns, ticker="STOCK"):
"""
Analyze stock returns for randomness using runs test.
"""
# Convert to binary: positive vs negative returns
binary_returns = (returns > 0).astype(int)
# Perform runs test
z_stat, p_value = runstest_1samp(returns, cutoff=0)
# Count runs for visualization
runs = []
current_run_start = 0
current_value = binary_returns[0]
for i in range(1, len(binary_returns)):
if binary_returns[i] != current_value:
runs.append({
'start': current_run_start,
'end': i - 1,
'value': current_value,
'length': i - current_run_start
})
current_run_start = i
current_value = binary_returns[i]
# Don't forget the last run
runs.append({
'start': current_run_start,
'end': len(binary_returns) - 1,
'value': current_value,
'length': len(binary_returns) - current_run_start
})
# Visualization
fig, axes = plt.subplots(2, 1, figsize=(12, 8))
# Plot 1: Returns with runs highlighted
colors = ['red' if r['value'] == 0 else 'green' for r in runs]
for i, run in enumerate(runs):
axes[0].axvspan(run['start'], run['end'] + 1,
alpha=0.3, color=colors[i])
axes[0].bar(range(len(returns)), returns, color='steelblue', width=1.0)
axes[0].axhline(y=0, color='black', linestyle='-', linewidth=0.5)
axes[0].set_xlabel('Trading Day')
axes[0].set_ylabel('Daily Return')
axes[0].set_title(f'{ticker} Daily Returns with Runs Highlighted')
# Plot 2: Run length distribution
run_lengths = [r['length'] for r in runs]
axes[1].hist(run_lengths, bins=range(1, max(run_lengths) + 2),
edgecolor='black', alpha=0.7)
axes[1].set_xlabel('Run Length')
axes[1].set_ylabel('Frequency')
axes[1].set_title('Distribution of Run Lengths')
plt.tight_layout()
plt.savefig('stock_runs_analysis.png', dpi=150)
plt.close()
return {
'ticker': ticker,
'z_statistic': z_stat,
'p_value': p_value,
'total_runs': len(runs),
'avg_run_length': np.mean(run_lengths),
'max_run_length': max(run_lengths),
'conclusion': 'Random' if p_value > 0.05 else 'Non-random'
}
# Run the analysis
result = analyze_stock_randomness(returns, "SIMULATED")
print("Stock Returns Randomness Analysis")
print("=" * 40)
print(f"Ticker: {result['ticker']}")
print(f"Z-statistic: {result['z_statistic']:.4f}")
print(f"P-value: {result['p_value']:.4f}")
print(f"Total runs: {result['total_runs']}")
print(f"Average run length: {result['avg_run_length']:.2f}")
print(f"Maximum run length: {result['max_run_length']}")
print(f"Conclusion: {result['conclusion']}")
Interpreting Results and Limitations
Reading the p-value: A p-value below your significance threshold (typically 0.05) suggests the sequence is not random. But direction matters:
- Negative z-score: Fewer runs than expected, indicating clustering
- Positive z-score: More runs than expected, indicating alternation
Limitations to keep in mind:
-
Binary reduction loses information: Converting continuous data to binary discards magnitude. A sequence of returns [+0.01%, +5%, +0.02%] becomes [1, 1, 1], same as [+5%, +5%, +5%].
-
Sensitive to cutoff choice: Using mean versus median can yield different conclusions. Document your choice and consider sensitivity analysis.
-
Only tests one aspect of randomness: The runs test checks for independence between consecutive observations. It won’t detect periodic patterns with gaps or other complex structures.
-
Sample size matters: For small samples (n < 20), use exact distributions rather than the normal approximation.
Alternative tests to consider:
- Ljung-Box test: For autocorrelation in time series
- Durbin-Watson test: Specifically for regression residuals
- NIST test suite: Comprehensive randomness testing for cryptographic applications
The runs test is a useful first check, but rarely sufficient on its own. Combine it with visual inspection and other statistical tests for robust conclusions about randomness in your data.