How to Create a KDE Plot in Seaborn

Kernel Density Estimation (KDE) plots visualize the probability density function of a continuous variable by placing a kernel (typically Gaussian) at each data point and summing the results. Unlike...

Key Insights

  • KDE plots reveal smooth probability distributions where histograms show blocky approximations, making them ideal for understanding underlying data patterns and comparing multiple distributions
  • Bandwidth adjustment is critical—too small creates noisy plots, too large over-smooths important features; start with bw_adjust=1 and iterate based on your data characteristics
  • Bivariate KDE plots expose relationships between two variables through density contours, providing more insight than scatter plots alone for overlapping or dense data

Introduction to KDE Plots

Kernel Density Estimation (KDE) plots visualize the probability density function of a continuous variable by placing a kernel (typically Gaussian) at each data point and summing the results. Unlike histograms that depend heavily on bin width and positioning, KDE plots produce smooth, continuous curves that better represent the underlying distribution.

Use KDE plots when you need to understand distribution shape, identify multiple peaks (multimodal distributions), or compare distributions across categories. They excel at revealing subtle patterns that histograms might obscure through arbitrary binning. However, histograms remain superior for discrete data or when exact counts matter more than distribution shape.

Here’s a direct comparison:

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# Generate sample data
np.random.seed(42)
data = np.concatenate([np.random.normal(0, 1, 300), 
                       np.random.normal(4, 1.5, 200)])

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

# Histogram
ax1.hist(data, bins=30, edgecolor='black', alpha=0.7)
ax1.set_title('Histogram')
ax1.set_xlabel('Value')
ax1.set_ylabel('Frequency')

# KDE plot
sns.kdeplot(data=data, ax=ax2, fill=True)
ax2.set_title('KDE Plot')
ax2.set_xlabel('Value')
ax2.set_ylabel('Density')

plt.tight_layout()
plt.show()

The KDE plot immediately reveals the bimodal nature of this distribution, while the histogram’s appearance varies dramatically with bin choice.

Basic KDE Plot with kdeplot()

The sns.kdeplot() function requires minimal setup. Pass your data directly or use a DataFrame with column specification. For univariate plots, Seaborn handles kernel selection and bandwidth calculation automatically.

import seaborn as sns
import matplotlib.pyplot as plt

# Load sample dataset
tips = sns.load_dataset('tips')

# Basic KDE plot
plt.figure(figsize=(10, 6))
sns.kdeplot(data=tips['total_bill'])
plt.title('Distribution of Total Bill Amounts')
plt.xlabel('Total Bill ($)')
plt.ylabel('Density')
plt.show()

# Customized with color and fill
plt.figure(figsize=(10, 6))
sns.kdeplot(data=tips['total_bill'], 
            fill=True, 
            color='steelblue',
            alpha=0.6,
            linewidth=2)
plt.title('Distribution of Total Bill Amounts')
plt.xlabel('Total Bill ($)')
plt.ylabel('Density')
plt.show()

The fill=True parameter adds shading under the curve, making the plot more visually appealing and easier to interpret. The alpha parameter controls transparency, useful when overlaying multiple distributions.

Bivariate KDE Plots

Bivariate KDE plots visualize the joint probability density of two variables using contour lines or filled regions. This reveals correlation patterns and density concentrations that scatter plots might miss in crowded data.

# Load iris dataset
iris = sns.load_dataset('iris')

# Basic bivariate KDE with contours
plt.figure(figsize=(10, 8))
sns.kdeplot(data=iris, 
            x='sepal_length', 
            y='sepal_width',
            cmap='Blues',
            fill=True,
            thresh=0.05)
plt.title('Bivariate KDE: Sepal Dimensions')
plt.show()

# Contour lines only with multiple levels
plt.figure(figsize=(10, 8))
sns.kdeplot(data=iris, 
            x='sepal_length', 
            y='sepal_width',
            levels=10,
            color='darkblue',
            linewidths=2)
plt.title('Density Contours: Sepal Dimensions')
plt.show()

# Filled contours with custom levels
plt.figure(figsize=(10, 8))
sns.kdeplot(data=iris, 
            x='sepal_length', 
            y='sepal_width',
            fill=True,
            levels=6,
            cmap='viridis',
            alpha=0.7)
plt.title('Filled Density Contours')
plt.show()

The thresh parameter sets the minimum density threshold for display, eliminating noise at the distribution edges. The levels parameter controls contour granularity—more levels provide finer detail but can clutter the visualization.

Customizing KDE Appearance

Bandwidth adjustment is your most powerful customization tool. The bw_adjust parameter scales the automatically calculated bandwidth. Values less than 1 create more detailed (potentially noisy) plots; values greater than 1 produce smoother curves.

# Compare different bandwidth values
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
bandwidths = [0.3, 0.7, 1.5, 3.0]

for ax, bw in zip(axes.flat, bandwidths):
    sns.kdeplot(data=tips['total_bill'], 
                ax=ax, 
                fill=True,
                bw_adjust=bw,
                color='coral')
    ax.set_title(f'Bandwidth Adjustment: {bw}')
    ax.set_xlabel('Total Bill ($)')
    ax.set_ylabel('Density')

plt.tight_layout()
plt.show()

# Multiple overlapping KDEs for comparison
plt.figure(figsize=(12, 6))
for day in tips['day'].unique():
    subset = tips[tips['day'] == day]
    sns.kdeplot(data=subset['total_bill'], 
                label=day,
                fill=True,
                alpha=0.4,
                linewidth=2)

plt.title('Total Bill Distribution by Day')
plt.xlabel('Total Bill ($)')
plt.ylabel('Density')
plt.legend(title='Day')
plt.show()

For categorical comparisons, manually filtering and plotting works, but the hue parameter (covered next) provides cleaner syntax.

KDE Plot Parameters and Options

Seaborn’s KDE implementation includes powerful parameters for specialized visualizations:

# Cumulative distribution
plt.figure(figsize=(10, 6))
sns.kdeplot(data=tips['total_bill'], 
            cumulative=True,
            fill=True,
            color='green',
            alpha=0.5)
plt.title('Cumulative Distribution of Total Bills')
plt.xlabel('Total Bill ($)')
plt.ylabel('Cumulative Probability')
plt.show()

# Clipping to specific range
plt.figure(figsize=(10, 6))
sns.kdeplot(data=tips['total_bill'], 
            clip=(10, 40),
            fill=True,
            color='purple')
plt.title('KDE Clipped to $10-$40 Range')
plt.xlabel('Total Bill ($)')
plt.ylabel('Density')
plt.show()

# Using hue for categorical grouping
plt.figure(figsize=(12, 6))
sns.kdeplot(data=tips, 
            x='total_bill',
            hue='time',
            fill=True,
            alpha=0.5,
            linewidth=2,
            common_norm=False)
plt.title('Bill Distribution: Lunch vs Dinner')
plt.xlabel('Total Bill ($)')
plt.ylabel('Density')
plt.legend(title='Meal Time')
plt.show()

# Bivariate with hue
plt.figure(figsize=(12, 8))
sns.kdeplot(data=iris, 
            x='sepal_length',
            y='sepal_width',
            hue='species',
            fill=True,
            alpha=0.4,
            levels=5)
plt.title('Sepal Dimensions by Species')
plt.show()

The common_norm=False parameter normalizes each category independently, making comparisons easier when sample sizes differ significantly. The clip parameter prevents the KDE from extending into impossible value ranges (like negative prices).

The gridsize parameter controls calculation resolution. Default is 200 points; increase for smoother curves with large datasets or decrease for faster rendering:

# High resolution for publication
sns.kdeplot(data=tips['total_bill'], 
            gridsize=500,
            fill=True)

Practical Applications and Best Practices

KDE plots shine in specific scenarios. Use them for comparing distributions across categories, identifying distribution shapes (normal, skewed, multimodal), or presenting smooth probability densities in reports. Avoid them for discrete data, small sample sizes (under 30 points), or when exact frequencies matter.

Here’s a real-world example analyzing customer transaction patterns:

# Simulate customer transaction data
np.random.seed(123)
transactions = {
    'amount': np.concatenate([
        np.random.lognormal(3.5, 0.5, 400),  # Regular customers
        np.random.lognormal(5, 0.3, 100)     # Premium customers
    ]),
    'customer_type': ['Regular']*400 + ['Premium']*100
}

import pandas as pd
df_transactions = pd.DataFrame(transactions)

# Compare spending patterns
plt.figure(figsize=(14, 6))
sns.kdeplot(data=df_transactions, 
            x='amount',
            hue='customer_type',
            fill=True,
            alpha=0.5,
            linewidth=2.5,
            bw_adjust=0.8,
            common_norm=False)
plt.title('Customer Spending Patterns by Segment', fontsize=16)
plt.xlabel('Transaction Amount ($)', fontsize=12)
plt.ylabel('Density', fontsize=12)
plt.legend(title='Customer Type', fontsize=11)
plt.xlim(0, 400)
plt.show()

Performance considerations: KDE calculation is computationally expensive. For datasets exceeding 10,000 points, consider sampling or using gridsize to reduce calculation points. For interactive dashboards, cache KDE results rather than recalculating on every render.

Common pitfalls to avoid:

  1. Over-smoothing: Default bandwidth works well for normally distributed data but may hide important features in complex distributions. Always visualize with multiple bw_adjust values initially.

  2. Edge effects: KDEs extend beyond your data range. Use clip to constrain the plot to meaningful bounds, especially for bounded variables like percentages or prices.

  3. Comparing unequal samples: When comparing groups with vastly different sample sizes, set common_norm=False to normalize each distribution independently.

  4. Misinterpreting density: The y-axis shows probability density, not probability. The area under the curve equals 1, but individual y-values can exceed 1 for tightly concentrated distributions.

KDE plots are your tool for understanding continuous distributions with nuance that histograms cannot provide. Master bandwidth adjustment, leverage the hue parameter for comparisons, and always consider whether smooth density estimation serves your analytical goals better than discrete binning.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.