How to Perform Bootstrap Resampling in Python
Bootstrap resampling solves a fundamental problem in statistics: how do you estimate uncertainty when you don't know the underlying distribution of your data?
Key Insights
- Bootstrap resampling lets you estimate uncertainty and confidence intervals without making assumptions about your data’s underlying distribution—just resample with replacement thousands of times
- For most applications, 10,000 bootstrap iterations provide stable estimates; use the percentile method for confidence intervals unless you have strong reasons for bias correction
- SciPy’s
scipy.stats.bootstrap()handles the heavy lifting, but understanding manual implementation helps you customize for complex scenarios like regression coefficients
Introduction to Bootstrap Resampling
Bootstrap resampling solves a fundamental problem in statistics: how do you estimate uncertainty when you don’t know the underlying distribution of your data?
Traditional statistical methods often assume your data follows a normal distribution or require large sample sizes for asymptotic properties to kick in. Real-world data rarely cooperates. You might have 47 measurements from an expensive experiment, a skewed distribution, or no theoretical basis for assuming any particular shape.
Bootstrap resampling sidesteps these issues entirely. Instead of relying on theoretical distributions, you treat your sample as a stand-in for the population and repeatedly resample from it. The variability across these resamples estimates the variability you’d see if you could actually repeat your experiment thousands of times.
Bradley Efron introduced the bootstrap in 1979, and it’s since become one of the most practical tools in applied statistics. Use it when you need confidence intervals for any statistic (means, medians, correlations, regression coefficients), when your sample size is too small for asymptotic methods, or when you’re working with statistics that have no closed-form variance formula.
The Bootstrap Algorithm Explained
The bootstrap algorithm is deceptively simple:
- Start with your original sample of n observations
- Draw n observations from your sample with replacement
- Calculate your statistic of interest on this bootstrap sample
- Repeat steps 2-3 many times (typically 1,000-10,000)
- Use the distribution of bootstrap statistics to estimate uncertainty
The “with replacement” part is crucial. Each bootstrap sample can include some original observations multiple times and others not at all. On average, about 63.2% of unique observations appear in each bootstrap sample—the rest are duplicates.
Here’s a manual implementation:
import numpy as np
def bootstrap_statistic(data, statistic_func, n_iterations=10000, random_state=None):
"""
Perform bootstrap resampling for any statistic.
Parameters:
-----------
data : array-like
Original sample data
statistic_func : callable
Function that computes the statistic of interest
n_iterations : int
Number of bootstrap iterations
random_state : int, optional
Seed for reproducibility
Returns:
--------
bootstrap_statistics : ndarray
Array of bootstrap statistic values
"""
rng = np.random.default_rng(random_state)
data = np.asarray(data)
n = len(data)
bootstrap_statistics = np.empty(n_iterations)
for i in range(n_iterations):
# Sample with replacement
bootstrap_sample = rng.choice(data, size=n, replace=True)
bootstrap_statistics[i] = statistic_func(bootstrap_sample)
return bootstrap_statistics
# Example usage
sample_data = np.array([23, 45, 67, 32, 89, 54, 78, 41, 36, 62])
bootstrap_means = bootstrap_statistic(sample_data, np.mean, random_state=42)
print(f"Original mean: {np.mean(sample_data):.2f}")
print(f"Bootstrap mean of means: {np.mean(bootstrap_means):.2f}")
print(f"Bootstrap std of means: {np.std(bootstrap_means):.2f}")
Estimating Confidence Intervals
The percentile method is the simplest approach to constructing bootstrap confidence intervals. For a 95% CI, take the 2.5th and 97.5th percentiles of your bootstrap distribution:
def bootstrap_confidence_interval(data, statistic_func, confidence=0.95,
n_iterations=10000, random_state=None):
"""
Calculate bootstrap confidence interval using the percentile method.
"""
bootstrap_stats = bootstrap_statistic(data, statistic_func,
n_iterations, random_state)
alpha = 1 - confidence
lower_percentile = (alpha / 2) * 100
upper_percentile = (1 - alpha / 2) * 100
ci_lower = np.percentile(bootstrap_stats, lower_percentile)
ci_upper = np.percentile(bootstrap_stats, upper_percentile)
return ci_lower, ci_upper, bootstrap_stats
# Calculate 95% CI for the mean
sample_data = np.array([23, 45, 67, 32, 89, 54, 78, 41, 36, 62,
55, 48, 71, 39, 58, 44, 63, 51, 47, 69])
ci_low, ci_high, boot_means = bootstrap_confidence_interval(
sample_data, np.mean, confidence=0.95, random_state=42
)
print(f"Sample mean: {np.mean(sample_data):.2f}")
print(f"95% CI: [{ci_low:.2f}, {ci_high:.2f}]")
The percentile method works well when the bootstrap distribution is roughly symmetric. For skewed distributions, consider the bias-corrected and accelerated (BCa) method, which adjusts for bias and skewness. However, BCa requires more computation and can be unstable with small samples.
How many iterations do you need? For standard errors, 1,000 is often sufficient. For confidence intervals, use at least 10,000. For precise tail probabilities or p-values, consider 100,000 or more. When in doubt, run your analysis twice with different seeds—if results differ meaningfully, increase iterations.
Bootstrap for Regression Coefficients
Bootstrap becomes particularly valuable for regression analysis, where you want confidence intervals for coefficients without relying on normality assumptions.
Two approaches exist: resampling cases (rows) and resampling residuals. Resampling cases is more robust and handles heteroscedasticity naturally. Resampling residuals assumes the model is correctly specified but can be more efficient.
Here’s how to bootstrap linear regression coefficients:
from sklearn.linear_model import LinearRegression
import numpy as np
def bootstrap_regression_coefficients(X, y, n_iterations=10000, random_state=None):
"""
Bootstrap confidence intervals for linear regression coefficients.
Uses case resampling approach.
"""
rng = np.random.default_rng(random_state)
X = np.asarray(X)
y = np.asarray(y)
n_samples = len(y)
# Ensure X is 2D
if X.ndim == 1:
X = X.reshape(-1, 1)
n_features = X.shape[1]
# Store bootstrap coefficients
boot_intercepts = np.empty(n_iterations)
boot_coefficients = np.empty((n_iterations, n_features))
model = LinearRegression()
for i in range(n_iterations):
# Resample indices with replacement
indices = rng.choice(n_samples, size=n_samples, replace=True)
X_boot = X[indices]
y_boot = y[indices]
model.fit(X_boot, y_boot)
boot_intercepts[i] = model.intercept_
boot_coefficients[i] = model.coef_
return boot_intercepts, boot_coefficients
# Generate example data
np.random.seed(42)
X = np.random.randn(50, 2)
true_coefficients = np.array([2.5, -1.3])
y = X @ true_coefficients + 3.0 + np.random.randn(50) * 0.5
# Bootstrap the coefficients
boot_intercepts, boot_coefs = bootstrap_regression_coefficients(
X, y, n_iterations=10000, random_state=42
)
# Calculate confidence intervals
print("Coefficient estimates with 95% CIs:")
print(f"Intercept: {np.mean(boot_intercepts):.3f} "
f"[{np.percentile(boot_intercepts, 2.5):.3f}, "
f"{np.percentile(boot_intercepts, 97.5):.3f}]")
for i in range(boot_coefs.shape[1]):
print(f"Coef {i+1}: {np.mean(boot_coefs[:, i]):.3f} "
f"[{np.percentile(boot_coefs[:, i], 2.5):.3f}, "
f"{np.percentile(boot_coefs[:, i], 97.5):.3f}]")
Using scipy.stats.bootstrap
SciPy 1.7+ includes a robust bootstrap implementation that handles edge cases and supports multiple confidence interval methods:
from scipy import stats
import numpy as np
# Sample data
data = np.array([23, 45, 67, 32, 89, 54, 78, 41, 36, 62,
55, 48, 71, 39, 58, 44, 63, 51, 47, 69])
# One-liner bootstrap for the mean
result = stats.bootstrap(
(data,), # Data must be a tuple of arrays
np.mean, # Statistic function
n_resamples=10000,
confidence_level=0.95,
method='percentile', # or 'basic', 'bca'
random_state=42
)
print(f"95% CI: [{result.confidence_interval.low:.2f}, "
f"{result.confidence_interval.high:.2f}]")
print(f"Standard error: {result.standard_error:.2f}")
# Bootstrap for median (no closed-form CI available)
median_result = stats.bootstrap(
(data,),
np.median,
n_resamples=10000,
random_state=42
)
print(f"Median 95% CI: [{median_result.confidence_interval.low:.2f}, "
f"{median_result.confidence_interval.high:.2f}]")
The method parameter offers three options: 'percentile' (simple and robust), 'basic' (reflects bootstrap distribution around the estimate), and 'bca' (bias-corrected and accelerated, best for skewed distributions but computationally heavier).
Performance Optimization
For large datasets or complex statistics, bootstrap can become slow. Vectorization eliminates Python loops:
import numpy as np
from joblib import Parallel, delayed
from tqdm import tqdm
def vectorized_bootstrap_mean(data, n_iterations=10000, random_state=None):
"""Fully vectorized bootstrap for the mean."""
rng = np.random.default_rng(random_state)
n = len(data)
# Generate all bootstrap indices at once
indices = rng.choice(n, size=(n_iterations, n), replace=True)
# Compute all means in one operation
bootstrap_means = data[indices].mean(axis=1)
return bootstrap_means
# For complex statistics, use parallel processing
def parallel_bootstrap(data, statistic_func, n_iterations=10000,
n_jobs=-1, random_state=None):
"""
Parallelized bootstrap with progress tracking.
"""
rng = np.random.default_rng(random_state)
n = len(data)
# Generate seeds for each worker
seeds = rng.integers(0, 2**31, size=n_iterations)
def single_bootstrap(seed):
local_rng = np.random.default_rng(seed)
indices = local_rng.choice(n, size=n, replace=True)
return statistic_func(data[indices])
results = Parallel(n_jobs=n_jobs)(
delayed(single_bootstrap)(seed)
for seed in tqdm(seeds, desc="Bootstrapping")
)
return np.array(results)
# Example: Bootstrap a complex statistic
data = np.random.randn(1000)
def trimmed_mean(x, trim=0.1):
"""10% trimmed mean."""
return stats.trim_mean(x, trim)
boot_results = parallel_bootstrap(data, trimmed_mean,
n_iterations=10000, n_jobs=4)
Common Pitfalls and Best Practices
Bootstrap isn’t magic. It fails in predictable ways:
Heavy-tailed distributions: When estimating the mean of a Cauchy distribution or similar heavy-tailed data, bootstrap confidence intervals can be wildly unstable. The sample mean itself is a poor estimator here.
Dependent data: Standard bootstrap assumes independent observations. For time series, use block bootstrap or stationary bootstrap methods instead.
Extreme quantiles: Bootstrapping the maximum or minimum of your data, or extreme percentiles, performs poorly because these statistics depend on observations that may not appear in every bootstrap sample.
Small samples: While bootstrap helps with small samples, it can’t create information that isn’t there. With n < 20, confidence intervals may be unreliable. Consider whether your question is even answerable with so little data.
Minimum recommendations:
- Use at least 10,000 iterations for confidence intervals
- Always set a random seed for reproducibility
- Visualize your bootstrap distribution—if it’s multimodal or highly skewed, interpret results cautiously
- Compare bootstrap CIs to parametric CIs when available as a sanity check
- For regression, prefer case resampling unless you have strong reasons to believe your model is correctly specified
Bootstrap resampling democratizes uncertainty quantification. You no longer need to derive complex variance formulas or assume convenient distributions. Understand the algorithm, know its limitations, and it becomes one of the most versatile tools in your statistical toolkit.