How to Create an ECDF Plot in Seaborn

Key Insights

ECDF plots eliminate binning decisions required by histograms while showing exact percentile information at every data point, making them superior for precise distribution analysis
Seaborn’s ecdfplot() function supports comparative analysis through the hue parameter, enabling direct visual comparison of distributions across multiple groups without overlap issues
ECDFs excel in A/B testing and model validation scenarios where you need to answer questions like “what percentage of users experienced load times under 2 seconds?” with pixel-perfect accuracy

Introduction to ECDF Plots

The Empirical Cumulative Distribution Function (ECDF) is one of the most underutilized visualization tools in data science. An ECDF shows the proportion of data points less than or equal to each value in your dataset. For any value x on the horizontal axis, the ECDF tells you what fraction of your data falls at or below x.

Unlike histograms, ECDFs require no binning decisions. Every data point is represented exactly, and you can read percentiles directly from the plot. Want to know the median? Find where the ECDF crosses 0.5. Need the 95th percentile? Look at 0.95 on the y-axis. This precision makes ECDFs invaluable for comparing distributions, detecting outliers, and communicating statistical findings without the artifacts introduced by arbitrary bin choices.

Seaborn added native ECDF support in version 0.11.0, making it trivial to create publication-quality ECDF plots with minimal code.

Basic ECDF Plot with Seaborn

The ecdfplot() function requires just a dataset and a variable to plot. Let’s start with the classic tips dataset:

import seaborn as sns
import matplotlib.pyplot as plt

# Load sample data
tips = sns.load_dataset('tips')

# Create basic ECDF plot
sns.ecdfplot(data=tips, x='total_bill')
plt.xlabel('Total Bill ($)')
plt.ylabel('Proportion')
plt.title('ECDF of Restaurant Bills')
plt.show()

This creates a step function showing the cumulative proportion of bills at each price point. The plot immediately reveals that 50% of bills are under approximately $17, and 75% are under $24. This information is exact—no guessing about bin edges or smoothing artifacts.

The ECDF rises steeply where data is dense and flattens where data is sparse. A long flat section indicates a gap in your data, while a steep rise shows a concentration of values.

Customizing ECDF Plots

Seaborn provides several parameters to customize ECDF appearance and behavior. The stat parameter controls what the y-axis represents:

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Proportion (default)
sns.ecdfplot(data=tips, x='total_bill', ax=axes[0])
axes[0].set_title('Proportion ECDF')
axes[0].set_ylabel('Proportion')

# Count
sns.ecdfplot(data=tips, x='total_bill', stat='count', ax=axes[1])
axes[1].set_title('Count ECDF')
axes[1].set_ylabel('Count')

plt.tight_layout()
plt.show()

The stat='count' option shows the cumulative count instead of proportion, useful when absolute numbers matter more than percentages.

For complementary ECDFs (showing the proportion above each value), use the complementary parameter:

# Complementary ECDF (1 - ECDF)
sns.ecdfplot(data=tips, x='total_bill', complementary=True)
plt.xlabel('Total Bill ($)')
plt.ylabel('Proportion Above')
plt.title('Complementary ECDF: Proportion of Bills Above Each Value')
plt.show()

This is particularly useful for survival analysis or when you’re interested in exceedance probabilities (e.g., “What proportion of requests took longer than X seconds?”).

You can also customize line appearance:

sns.ecdfplot(data=tips, x='total_bill', 
             linewidth=2.5, 
             color='darkblue',
             linestyle='--')
plt.show()

Comparing Multiple Distributions

The real power of ECDFs emerges when comparing distributions. The hue parameter creates separate ECDFs for each category:

# Compare bill distributions by day
sns.ecdfplot(data=tips, x='total_bill', hue='day')
plt.xlabel('Total Bill ($)')
plt.ylabel('Proportion')
plt.title('Bill Distribution by Day of Week')
plt.legend(title='Day')
plt.show()

This immediately reveals differences between groups. If one line is consistently above another, that group has generally lower values. Where lines cross, the groups have similar values at that point but may differ elsewhere.

For comparing smoker vs non-smoker bills:

sns.ecdfplot(data=tips, x='total_bill', hue='smoker', 
             palette=['#2ecc71', '#e74c3c'])
plt.xlabel('Total Bill ($)')
plt.ylabel('Proportion')
plt.title('Bill Distribution: Smokers vs Non-Smokers')
plt.axhline(0.5, color='gray', linestyle=':', alpha=0.5)
plt.legend(title='Smoker')
plt.show()

The horizontal line at 0.5 helps identify median differences at a glance.

Advanced Techniques

For datasets with sampling weights or importance weights, use the weights parameter:

import numpy as np

# Create synthetic weights (e.g., survey weights)
tips['weight'] = np.random.uniform(0.5, 2.0, len(tips))

# Weighted ECDF
sns.ecdfplot(data=tips, x='total_bill', weights='weight')
plt.xlabel('Total Bill ($)')
plt.ylabel('Weighted Proportion')
plt.title('Weighted ECDF of Bills')
plt.show()

Combining ECDFs with other visualizations provides comprehensive distribution insights:

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Histogram
sns.histplot(data=tips, x='total_bill', bins=30, ax=axes[0])
axes[0].set_title('Histogram')
axes[0].set_xlabel('Total Bill ($)')

# ECDF
sns.ecdfplot(data=tips, x='total_bill', ax=axes[1])
axes[1].set_title('ECDF')
axes[1].set_xlabel('Total Bill ($)')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

For complete control, you can calculate ECDFs manually using NumPy:

def calculate_ecdf(data):
    """Calculate ECDF values manually"""
    sorted_data = np.sort(data)
    n = len(sorted_data)
    y = np.arange(1, n + 1) / n
    return sorted_data, y

# Manual ECDF calculation
x_vals, y_vals = calculate_ecdf(tips['total_bill'].values)

plt.figure(figsize=(10, 6))
plt.plot(x_vals, y_vals, linewidth=2)
plt.xlabel('Total Bill ($)')
plt.ylabel('Proportion')
plt.title('Manually Calculated ECDF')
plt.grid(True, alpha=0.3)
plt.show()

This approach is useful when you need to perform custom calculations or integrate ECDFs into larger analysis pipelines.

Practical Use Cases and Interpretation

ECDFs shine in A/B testing scenarios. Suppose you’re testing two website designs and measuring page load times:

# Simulate A/B test data
np.random.seed(42)
design_a = np.random.gamma(2, 0.5, 1000)  # Control
design_b = np.random.gamma(1.8, 0.48, 1000)  # Treatment

# Combine into DataFrame
import pandas as pd
ab_data = pd.DataFrame({
    'load_time': np.concatenate([design_a, design_b]),
    'design': ['A'] * 1000 + ['B'] * 1000
})

# Create comparison ECDF
plt.figure(figsize=(10, 6))
sns.ecdfplot(data=ab_data, x='load_time', hue='design', 
             palette=['#3498db', '#e67e22'])
plt.axhline(0.95, color='red', linestyle='--', alpha=0.5, label='95th percentile')
plt.xlabel('Load Time (seconds)')
plt.ylabel('Proportion of Users')
plt.title('A/B Test: Page Load Time Distribution')
plt.legend(title='Design')
plt.grid(True, alpha=0.3)
plt.show()

# Calculate specific percentiles
print(f"Design A - 95th percentile: {np.percentile(design_a, 95):.2f}s")
print(f"Design B - 95th percentile: {np.percentile(design_b, 95):.2f}s")

The ECDF makes it immediately obvious whether Design B provides consistent improvements across all percentiles or only benefits certain user segments. If the lines diverge at high percentiles, you know the treatment affects worst-case performance differently than typical performance.

For outlier detection, ECDFs help identify long tails:

# Identify potential outliers
plt.figure(figsize=(10, 6))
sns.ecdfplot(data=tips, x='total_bill')
plt.axhline(0.95, color='red', linestyle='--', alpha=0.5)
plt.axhline(0.99, color='red', linestyle='--', alpha=0.5)
plt.xlabel('Total Bill ($)')
plt.ylabel('Proportion')
plt.title('ECDF with Percentile Markers')
plt.grid(True, alpha=0.3)
plt.show()

# Find values above 95th percentile
threshold = tips['total_bill'].quantile(0.95)
outliers = tips[tips['total_bill'] > threshold]
print(f"Bills above 95th percentile (>${threshold:.2f}): {len(outliers)}")

Conclusion

ECDF plots provide unambiguous, precise distribution visualization without the arbitrary decisions required by histograms. Use them when you need to compare distributions across groups, identify exact percentiles, or communicate statistical findings to stakeholders who need concrete answers.

Choose ECDFs over histograms when precision matters more than showing density patterns. Choose them over box plots when you need to see the entire distribution rather than summary statistics. For A/B testing, performance monitoring, and model validation, ECDFs should be your default distribution visualization.

The seaborn.ecdfplot() function makes creating these plots trivial, and the resulting visualizations answer questions that other plots leave ambiguous. Start using ECDFs in your next analysis—your stakeholders will appreciate the clarity.