How to Perform Bootstrap Resampling in Python

Key Insights

Bootstrap resampling lets you estimate uncertainty and confidence intervals without making assumptions about your data’s underlying distribution—just resample with replacement thousands of times
For most applications, 10,000 bootstrap iterations provide stable estimates; use the percentile method for confidence intervals unless you have strong reasons for bias correction
SciPy’s scipy.stats.bootstrap() handles the heavy lifting, but understanding manual implementation helps you customize for complex scenarios like regression coefficients

Introduction to Bootstrap Resampling

Bootstrap resampling solves a fundamental problem in statistics: how do you estimate uncertainty when you don’t know the underlying distribution of your data?

Traditional statistical methods often assume your data follows a normal distribution or require large sample sizes for asymptotic properties to kick in. Real-world data rarely cooperates. You might have 47 measurements from an expensive experiment, a skewed distribution, or no theoretical basis for assuming any particular shape.

Bootstrap resampling sidesteps these issues entirely. Instead of relying on theoretical distributions, you treat your sample as a stand-in for the population and repeatedly resample from it. The variability across these resamples estimates the variability you’d see if you could actually repeat your experiment thousands of times.

Bradley Efron introduced the bootstrap in 1979, and it’s since become one of the most practical tools in applied statistics. Use it when you need confidence intervals for any statistic (means, medians, correlations, regression coefficients), when your sample size is too small for asymptotic methods, or when you’re working with statistics that have no closed-form variance formula.

The Bootstrap Algorithm Explained

The bootstrap algorithm is deceptively simple:

Start with your original sample of n observations
Draw n observations from your sample with replacement
Calculate your statistic of interest on this bootstrap sample
Repeat steps 2-3 many times (typically 1,000-10,000)
Use the distribution of bootstrap statistics to estimate uncertainty

The “with replacement” part is crucial. Each bootstrap sample can include some original observations multiple times and others not at all. On average, about 63.2% of unique observations appear in each bootstrap sample—the rest are duplicates.

Here’s a manual implementation:

import numpy as np

def bootstrap_statistic(data, statistic_func, n_iterations=10000, random_state=None):
    """
    Perform bootstrap resampling for any statistic.
    
    Parameters:
    -----------
    data : array-like
        Original sample data
    statistic_func : callable
        Function that computes the statistic of interest
    n_iterations : int
        Number of bootstrap iterations
    random_state : int, optional
        Seed for reproducibility
    
    Returns:
    --------
    bootstrap_statistics : ndarray
        Array of bootstrap statistic values
    """
    rng = np.random.default_rng(random_state)
    data = np.asarray(data)
    n = len(data)
    
    bootstrap_statistics = np.empty(n_iterations)
    
    for i in range(n_iterations):
        # Sample with replacement
        bootstrap_sample = rng.choice(data, size=n, replace=True)
        bootstrap_statistics[i] = statistic_func(bootstrap_sample)
    
    return bootstrap_statistics

# Example usage
sample_data = np.array([23, 45, 67, 32, 89, 54, 78, 41, 36, 62])
bootstrap_means = bootstrap_statistic(sample_data, np.mean, random_state=42)

print(f"Original mean: {np.mean(sample_data):.2f}")
print(f"Bootstrap mean of means: {np.mean(bootstrap_means):.2f}")
print(f"Bootstrap std of means: {np.std(bootstrap_means):.2f}")

Estimating Confidence Intervals

The percentile method is the simplest approach to constructing bootstrap confidence intervals. For a 95% CI, take the 2.5th and 97.5th percentiles of your bootstrap distribution:

def bootstrap_confidence_interval(data, statistic_func, confidence=0.95, 
                                   n_iterations=10000, random_state=None):
    """
    Calculate bootstrap confidence interval using the percentile method.
    """
    bootstrap_stats = bootstrap_statistic(data, statistic_func, 
                                          n_iterations, random_state)
    
    alpha = 1 - confidence
    lower_percentile = (alpha / 2) * 100
    upper_percentile = (1 - alpha / 2) * 100
    
    ci_lower = np.percentile(bootstrap_stats, lower_percentile)
    ci_upper = np.percentile(bootstrap_stats, upper_percentile)
    
    return ci_lower, ci_upper, bootstrap_stats

# Calculate 95% CI for the mean
sample_data = np.array([23, 45, 67, 32, 89, 54, 78, 41, 36, 62, 
                        55, 48, 71, 39, 58, 44, 63, 51, 47, 69])

ci_low, ci_high, boot_means = bootstrap_confidence_interval(
    sample_data, np.mean, confidence=0.95, random_state=42
)

print(f"Sample mean: {np.mean(sample_data):.2f}")
print(f"95% CI: [{ci_low:.2f}, {ci_high:.2f}]")

The percentile method works well when the bootstrap distribution is roughly symmetric. For skewed distributions, consider the bias-corrected and accelerated (BCa) method, which adjusts for bias and skewness. However, BCa requires more computation and can be unstable with small samples.

How many iterations do you need? For standard errors, 1,000 is often sufficient. For confidence intervals, use at least 10,000. For precise tail probabilities or p-values, consider 100,000 or more. When in doubt, run your analysis twice with different seeds—if results differ meaningfully, increase iterations.

Bootstrap for Regression Coefficients

Bootstrap becomes particularly valuable for regression analysis, where you want confidence intervals for coefficients without relying on normality assumptions.

Two approaches exist: resampling cases (rows) and resampling residuals. Resampling cases is more robust and handles heteroscedasticity naturally. Resampling residuals assumes the model is correctly specified but can be more efficient.

Here’s how to bootstrap linear regression coefficients:

from sklearn.linear_model import LinearRegression
import numpy as np

def bootstrap_regression_coefficients(X, y, n_iterations=10000, random_state=None):
    """
    Bootstrap confidence intervals for linear regression coefficients.
    Uses case resampling approach.
    """
    rng = np.random.default_rng(random_state)
    X = np.asarray(X)
    y = np.asarray(y)
    n_samples = len(y)
    
    # Ensure X is 2D
    if X.ndim == 1:
        X = X.reshape(-1, 1)
    
    n_features = X.shape[1]
    
    # Store bootstrap coefficients
    boot_intercepts = np.empty(n_iterations)
    boot_coefficients = np.empty((n_iterations, n_features))
    
    model = LinearRegression()
    
    for i in range(n_iterations):
        # Resample indices with replacement
        indices = rng.choice(n_samples, size=n_samples, replace=True)
        X_boot = X[indices]
        y_boot = y[indices]
        
        model.fit(X_boot, y_boot)
        boot_intercepts[i] = model.intercept_
        boot_coefficients[i] = model.coef_
    
    return boot_intercepts, boot_coefficients

# Generate example data
np.random.seed(42)
X = np.random.randn(50, 2)
true_coefficients = np.array([2.5, -1.3])
y = X @ true_coefficients + 3.0 + np.random.randn(50) * 0.5

# Bootstrap the coefficients
boot_intercepts, boot_coefs = bootstrap_regression_coefficients(
    X, y, n_iterations=10000, random_state=42
)

# Calculate confidence intervals
print("Coefficient estimates with 95% CIs:")
print(f"Intercept: {np.mean(boot_intercepts):.3f} "
      f"[{np.percentile(boot_intercepts, 2.5):.3f}, "
      f"{np.percentile(boot_intercepts, 97.5):.3f}]")

for i in range(boot_coefs.shape[1]):
    print(f"Coef {i+1}: {np.mean(boot_coefs[:, i]):.3f} "
          f"[{np.percentile(boot_coefs[:, i], 2.5):.3f}, "
          f"{np.percentile(boot_coefs[:, i], 97.5):.3f}]")

Using scipy.stats.bootstrap

SciPy 1.7+ includes a robust bootstrap implementation that handles edge cases and supports multiple confidence interval methods:

from scipy import stats
import numpy as np

# Sample data
data = np.array([23, 45, 67, 32, 89, 54, 78, 41, 36, 62,
                 55, 48, 71, 39, 58, 44, 63, 51, 47, 69])

# One-liner bootstrap for the mean
result = stats.bootstrap(
    (data,),                    # Data must be a tuple of arrays
    np.mean,                    # Statistic function
    n_resamples=10000,
    confidence_level=0.95,
    method='percentile',        # or 'basic', 'bca'
    random_state=42
)

print(f"95% CI: [{result.confidence_interval.low:.2f}, "
      f"{result.confidence_interval.high:.2f}]")
print(f"Standard error: {result.standard_error:.2f}")

# Bootstrap for median (no closed-form CI available)
median_result = stats.bootstrap(
    (data,),
    np.median,
    n_resamples=10000,
    random_state=42
)
print(f"Median 95% CI: [{median_result.confidence_interval.low:.2f}, "
      f"{median_result.confidence_interval.high:.2f}]")

The method parameter offers three options: 'percentile' (simple and robust), 'basic' (reflects bootstrap distribution around the estimate), and 'bca' (bias-corrected and accelerated, best for skewed distributions but computationally heavier).

Performance Optimization

For large datasets or complex statistics, bootstrap can become slow. Vectorization eliminates Python loops:

import numpy as np
from joblib import Parallel, delayed
from tqdm import tqdm

def vectorized_bootstrap_mean(data, n_iterations=10000, random_state=None):
    """Fully vectorized bootstrap for the mean."""
    rng = np.random.default_rng(random_state)
    n = len(data)
    
    # Generate all bootstrap indices at once
    indices = rng.choice(n, size=(n_iterations, n), replace=True)
    
    # Compute all means in one operation
    bootstrap_means = data[indices].mean(axis=1)
    
    return bootstrap_means

# For complex statistics, use parallel processing
def parallel_bootstrap(data, statistic_func, n_iterations=10000, 
                       n_jobs=-1, random_state=None):
    """
    Parallelized bootstrap with progress tracking.
    """
    rng = np.random.default_rng(random_state)
    n = len(data)
    
    # Generate seeds for each worker
    seeds = rng.integers(0, 2**31, size=n_iterations)
    
    def single_bootstrap(seed):
        local_rng = np.random.default_rng(seed)
        indices = local_rng.choice(n, size=n, replace=True)
        return statistic_func(data[indices])
    
    results = Parallel(n_jobs=n_jobs)(
        delayed(single_bootstrap)(seed) 
        for seed in tqdm(seeds, desc="Bootstrapping")
    )
    
    return np.array(results)

# Example: Bootstrap a complex statistic
data = np.random.randn(1000)

def trimmed_mean(x, trim=0.1):
    """10% trimmed mean."""
    return stats.trim_mean(x, trim)

boot_results = parallel_bootstrap(data, trimmed_mean, 
                                   n_iterations=10000, n_jobs=4)

Common Pitfalls and Best Practices

Bootstrap isn’t magic. It fails in predictable ways:

Heavy-tailed distributions: When estimating the mean of a Cauchy distribution or similar heavy-tailed data, bootstrap confidence intervals can be wildly unstable. The sample mean itself is a poor estimator here.

Dependent data: Standard bootstrap assumes independent observations. For time series, use block bootstrap or stationary bootstrap methods instead.

Extreme quantiles: Bootstrapping the maximum or minimum of your data, or extreme percentiles, performs poorly because these statistics depend on observations that may not appear in every bootstrap sample.

Small samples: While bootstrap helps with small samples, it can’t create information that isn’t there. With n < 20, confidence intervals may be unreliable. Consider whether your question is even answerable with so little data.

Minimum recommendations:

Use at least 10,000 iterations for confidence intervals
Always set a random seed for reproducibility
Visualize your bootstrap distribution—if it’s multimodal or highly skewed, interpret results cautiously
Compare bootstrap CIs to parametric CIs when available as a sanity check
For regression, prefer case resampling unless you have strong reasons to believe your model is correctly specified

Bootstrap resampling democratizes uncertainty quantification. You no longer need to derive complex variance formulas or assume convenient distributions. Understand the algorithm, know its limitations, and it becomes one of the most versatile tools in your statistical toolkit.