How to Create a Density Plot in Seaborn

Density plots visualize the probability distribution of continuous variables by estimating the underlying probability density function. Unlike histograms that depend on arbitrary bin sizes, density...

Key Insights

  • Density plots smooth histograms into continuous curves using kernel density estimation (KDE), making them superior for comparing multiple distributions and identifying subtle patterns in continuous data.
  • Seaborn’s kdeplot() offers precise control over bandwidth, fill, and styling, while displot(kind='kde') provides figure-level functionality for creating faceted grids and complex multi-plot layouts.
  • Bandwidth selection critically affects plot interpretation—lower values reveal local variations but risk overfitting, while higher values show general trends but may obscure important details.

Introduction to Density Plots

Density plots visualize the probability distribution of continuous variables by estimating the underlying probability density function. Unlike histograms that depend on arbitrary bin sizes, density plots create smooth curves that reveal the shape of your data distribution without binning artifacts.

Use density plots when comparing multiple distributions, presenting to non-technical audiences who find smooth curves more intuitive, or when bin selection in histograms creates misleading interpretations. They excel at showing multimodal distributions and subtle distribution differences that histograms might obscure.

Here’s a direct comparison:

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# Generate sample data
np.random.seed(42)
data = np.concatenate([
    np.random.normal(0, 1, 300),
    np.random.normal(4, 1.5, 200)
])

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

# Histogram
ax1.hist(data, bins=30, edgecolor='black', alpha=0.7)
ax1.set_title('Histogram (bin-dependent)')
ax1.set_xlabel('Value')
ax1.set_ylabel('Frequency')

# Density plot
sns.kdeplot(data=data, ax=ax2, fill=True)
ax2.set_title('Density Plot (smooth)')
ax2.set_xlabel('Value')
ax2.set_ylabel('Density')

plt.tight_layout()
plt.show()

The density plot immediately reveals the bimodal nature of this distribution, while the histogram’s appearance changes dramatically with different bin sizes.

Basic Density Plot with kdeplot()

Seaborn’s kdeplot() function is the primary tool for creating density plots. It accepts data in multiple formats: arrays, pandas Series, or DataFrame columns.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load sample data
df = sns.load_dataset('tips')

# Basic univariate density plot
plt.figure(figsize=(10, 6))
sns.kdeplot(data=df, x='total_bill')
plt.title('Distribution of Total Bill Amounts')
plt.xlabel('Total Bill ($)')
plt.ylabel('Density')
plt.show()

The x parameter specifies which variable to plot. For vertical density plots, use y instead. The function automatically calculates appropriate bandwidth and kernel settings, though you’ll often want to customize these.

Key parameters for basic usage:

  • data: DataFrame or array-like object
  • x or y: Variable name (if using DataFrame) or direct data
  • fill: Boolean to fill area under curve
  • color: Line and fill color
  • label: Legend label for multiple plots
# Multiple approaches to the same plot
plt.figure(figsize=(10, 6))

# Using DataFrame column
sns.kdeplot(data=df, x='total_bill', fill=True, color='steelblue', label='Total Bill')

# Using Series directly
sns.kdeplot(data=df['tip'], fill=True, color='coral', alpha=0.6, label='Tip Amount')

plt.legend()
plt.title('Comparing Bill and Tip Distributions')
plt.show()

Customizing Density Plot Appearance

Bandwidth (bw_adjust) is the most critical parameter—it controls the smoothness of your density estimate. The default value is 1.0; lower values create more detailed curves, higher values create smoother ones.

fig, axes = plt.subplots(2, 2, figsize=(12, 10))
bandwidths = [0.3, 0.7, 1.5, 3.0]

for ax, bw in zip(axes.flat, bandwidths):
    sns.kdeplot(data=df, x='total_bill', fill=True, bw_adjust=bw, ax=ax)
    ax.set_title(f'Bandwidth Adjustment: {bw}')
    ax.set_xlabel('Total Bill ($)')
    ax.set_ylabel('Density')

plt.tight_layout()
plt.show()

Additional styling options:

plt.figure(figsize=(10, 6))

sns.kdeplot(
    data=df, 
    x='total_bill',
    fill=True,              # Fill area under curve
    color='#2ecc71',        # Custom color
    alpha=0.5,              # Transparency
    linewidth=2.5,          # Border thickness
    linestyle='--',         # Line style
    bw_adjust=0.8           # Bandwidth adjustment
)

plt.title('Customized Density Plot', fontsize=14, fontweight='bold')
plt.xlabel('Total Bill ($)', fontsize=12)
plt.ylabel('Density', fontsize=12)
plt.grid(alpha=0.3)
plt.show()

Bivariate Density Plots

Bivariate density plots visualize the joint distribution of two continuous variables, revealing correlations and clusters that univariate plots miss.

# Load iris dataset
iris = sns.load_dataset('iris')

plt.figure(figsize=(12, 5))

# Contour plot
plt.subplot(1, 2, 1)
sns.kdeplot(
    data=iris, 
    x='sepal_length', 
    y='sepal_width',
    levels=10,           # Number of contour lines
    color='darkblue',
    linewidths=2
)
plt.title('Contour Density Plot')

# Filled contour plot
plt.subplot(1, 2, 2)
sns.kdeplot(
    data=iris, 
    x='sepal_length', 
    y='sepal_width',
    fill=True,
    cmap='viridis',      # Color map for filled regions
    levels=15,
    alpha=0.7
)
plt.title('Filled Contour Density Plot')

plt.tight_layout()
plt.show()

Combine bivariate density with scatter plots for comprehensive visualization:

plt.figure(figsize=(10, 8))

# Bivariate density
sns.kdeplot(
    data=iris,
    x='petal_length',
    y='petal_width',
    fill=True,
    cmap='Blues',
    alpha=0.6,
    levels=20
)

# Overlay scatter plot
plt.scatter(iris['petal_length'], iris['petal_width'], 
            s=20, alpha=0.4, color='darkblue', edgecolor='white')

plt.title('Petal Dimensions: Density + Scatter')
plt.xlabel('Petal Length (cm)')
plt.ylabel('Petal Width (cm)')
plt.show()

Multiple Distributions with hue Parameter

The hue parameter splits your data by category, creating overlaid density plots for direct comparison.

plt.figure(figsize=(12, 6))

sns.kdeplot(
    data=iris,
    x='sepal_length',
    hue='species',
    fill=True,
    alpha=0.5,
    linewidth=2,
    palette='Set2',
    common_norm=False      # Normalize each distribution separately
)

plt.title('Sepal Length Distribution by Species', fontsize=14)
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Density')
plt.legend(title='Species', fontsize=10)
plt.grid(alpha=0.3)
plt.show()

For bivariate plots with categories:

plt.figure(figsize=(10, 8))

sns.kdeplot(
    data=iris,
    x='petal_length',
    y='petal_width',
    hue='species',
    fill=True,
    alpha=0.4,
    levels=8,
    palette='husl'
)

plt.title('Petal Dimensions by Species (Bivariate)')
plt.xlabel('Petal Length (cm)')
plt.ylabel('Petal Width (cm)')
plt.show()

Alternative: displot() with kind=‘kde’

Use displot(kind='kde') when you need figure-level functionality: automatic figure sizing, built-in legends, or faceted grids. It wraps kdeplot() with additional layout capabilities.

# Simple figure-level density plot
sns.displot(
    data=iris,
    x='sepal_length',
    hue='species',
    kind='kde',
    fill=True,
    height=6,
    aspect=1.5,
    palette='viridis'
)
plt.show()

The real power comes with faceting:

# Create faceted grid
g = sns.displot(
    data=iris,
    x='sepal_length',
    col='species',           # Separate column for each species
    kind='kde',
    fill=True,
    height=4,
    aspect=1.2,
    color='steelblue',
    bw_adjust=0.8
)

g.set_titles('Species: {col_name}')
g.set_axis_labels('Sepal Length (cm)', 'Density')
plt.show()

For complex layouts with multiple variables:

# Load tips dataset
tips = sns.load_dataset('tips')

g = sns.displot(
    data=tips,
    x='total_bill',
    col='time',              # Lunch vs Dinner
    row='sex',               # Male vs Female
    kind='kde',
    fill=True,
    height=3,
    aspect=1.5,
    color='coral'
)

g.set_titles('{row_name} - {col_name}')
plt.show()

Practical Tips and Common Use Cases

Bandwidth Selection Strategy: Start with default bandwidth, then adjust based on your data size. For large datasets (n > 1000), use bw_adjust=0.5-0.8 to reveal details. For small datasets (n < 100), use bw_adjust=1.5-2.0 to avoid overfitting noise.

Performance Considerations: Density estimation is computationally expensive. For datasets exceeding 10,000 points, consider sampling or use gridsize parameter to reduce calculation points:

# Faster rendering for large datasets
sns.kdeplot(data=large_dataset, x='value', gridsize=50)  # Default is 200

Real-World Example: Analyzing customer purchase patterns:

# Simulate customer purchase data
np.random.seed(42)
purchase_data = pd.DataFrame({
    'amount': np.concatenate([
        np.random.gamma(2, 20, 500),      # Regular customers
        np.random.gamma(5, 50, 200)       # Premium customers
    ]),
    'customer_type': ['Regular']*500 + ['Premium']*200,
    'payment_method': np.random.choice(['Credit', 'Debit', 'Cash'], 700)
})

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Compare customer types
sns.kdeplot(
    data=purchase_data,
    x='amount',
    hue='customer_type',
    fill=True,
    alpha=0.5,
    ax=axes[0],
    common_norm=False
)
axes[0].set_title('Purchase Amount by Customer Type')
axes[0].set_xlabel('Purchase Amount ($)')

# Bivariate analysis with payment method
for method in purchase_data['payment_method'].unique():
    subset = purchase_data[purchase_data['payment_method'] == method]
    sns.kdeplot(
        data=subset,
        x='amount',
        ax=axes[1],
        label=method,
        fill=True,
        alpha=0.4,
        bw_adjust=0.7
    )
axes[1].set_title('Purchase Distribution by Payment Method')
axes[1].set_xlabel('Purchase Amount ($)')
axes[1].legend()

plt.tight_layout()
plt.show()

Common Pitfalls: Avoid using density plots for discrete data (use count plots instead), don’t compare distributions with vastly different sample sizes without normalization, and always check if your data has sufficient points (minimum 30-50) for meaningful density estimation.

Density plots transform raw data into interpretable distributions. Master bandwidth adjustment, leverage the hue parameter for comparisons, and choose between kdeplot() and displot() based on whether you need axes-level control or figure-level layouts.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.