How to Create a Histogram in Seaborn
Histograms visualize the distribution of numerical data by dividing values into bins and counting observations in each bin. They answer critical questions: Is my data normally distributed? Are there...
Key Insights
- Seaborn’s
histplot()provides superior statistical visualization compared to matplotlib’s basic histogram, with built-in support for KDE overlays, rug plots, and automatic binning algorithms that adapt to your data distribution. - The
hueparameter transforms basic histograms into powerful comparative tools, enabling side-by-side or stacked visualizations across categorical variables without manual subplot management. - Bin selection dramatically impacts histogram interpretation—use the Freedman-Diaconis rule (Seaborn’s default) for most cases, but switch to fixed bin counts when comparing distributions across multiple datasets.
Introduction to Histograms and Seaborn
Histograms visualize the distribution of numerical data by dividing values into bins and counting observations in each bin. They answer critical questions: Is my data normally distributed? Are there outliers? What’s the central tendency and spread?
Seaborn builds on matplotlib with statistical defaults that make sense out of the box. While matplotlib requires manual tweaking for aesthetically pleasing plots, Seaborn applies color palettes, proper spacing, and statistical enhancements automatically. For data scientists, this means less time formatting and more time analyzing.
This guide covers everything from basic histograms to advanced multi-group comparisons. You’ll learn practical techniques for real-world data analysis, not just toy examples.
Basic Histogram with sns.histplot()
Start with the essential imports and sample data. We’ll use the classic tips dataset, but these techniques apply to any numerical data.
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# Load sample data
tips = sns.load_dataset('tips')
# Create a basic histogram
plt.figure(figsize=(10, 6))
sns.histplot(data=tips, x='total_bill')
plt.show()
This produces a histogram showing the distribution of total bill amounts. Seaborn automatically:
- Calculates optimal bin width using the Freedman-Diaconis rule
- Applies a clean color scheme
- Adds appropriate axis labels
- Sets reasonable figure dimensions
The y-axis shows count by default—the number of observations in each bin. The x-axis represents your numerical variable divided into ranges.
For programmatic access, pass your data directly:
# Alternative syntax with numpy array
bill_amounts = tips['total_bill'].values
sns.histplot(bill_amounts)
Both approaches work, but the DataFrame syntax is clearer for complex visualizations.
Customizing Histogram Appearance
Default settings work well, but customization helps tell specific stories with your data.
plt.figure(figsize=(12, 6))
# Customized histogram
sns.histplot(
data=tips,
x='total_bill',
bins=20, # Explicit bin count
color='#2ecc71', # Custom color
edgecolor='black', # Bin edge color
alpha=0.7, # Transparency
linewidth=1.2 # Edge width
)
plt.title('Distribution of Total Bill Amounts', fontsize=16, fontweight='bold')
plt.xlabel('Total Bill ($)', fontsize=12)
plt.ylabel('Frequency', fontsize=12)
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()
Bin selection matters. Too few bins obscure distribution details; too many create noise. Start with Seaborn’s default, then adjust:
# Compare different binning strategies
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
# Fixed number of bins
sns.histplot(data=tips, x='total_bill', bins=10, ax=axes[0])
axes[0].set_title('10 Bins')
# Auto (Freedman-Diaconis)
sns.histplot(data=tips, x='total_bill', ax=axes[1])
axes[1].set_title('Auto Bins')
# Explicit bin edges
bin_edges = np.arange(0, 55, 5)
sns.histplot(data=tips, x='total_bill', bins=bin_edges, ax=axes[2])
axes[2].set_title('Custom Edges (5-unit intervals)')
plt.tight_layout()
plt.show()
Use bins='auto' (default), bins='fd' (Freedman-Diaconis), or bins='sturges' for algorithm-based binning. Use integer values when you need consistent bins across multiple plots.
Adding Statistical Elements
Histograms become more informative with statistical overlays.
plt.figure(figsize=(12, 6))
# Histogram with KDE overlay
sns.histplot(
data=tips,
x='total_bill',
bins=25,
kde=True, # Add kernel density estimate
color='skyblue',
edgecolor='black',
alpha=0.6,
line_kws={'linewidth': 2.5, 'color': 'darkblue'}
)
plt.title('Total Bill Distribution with KDE', fontsize=14)
plt.show()
The KDE (Kernel Density Estimate) curve smooths the distribution, revealing underlying patterns obscured by binning. It’s particularly useful for identifying multimodal distributions.
Add rug plots to show individual data points:
fig, ax = plt.subplots(figsize=(12, 6))
# Combined histogram, KDE, and rug plot
sns.histplot(
data=tips,
x='total_bill',
bins=20,
kde=True,
color='coral',
alpha=0.5,
ax=ax
)
# Add rug plot
sns.rugplot(
data=tips,
x='total_bill',
height=0.05,
color='darkred',
alpha=0.6,
ax=ax
)
plt.title('Complete Distribution Visualization', fontsize=14)
plt.show()
Rug plots work best with moderate-sized datasets (under 1000 points). Beyond that, they create visual clutter.
Advanced Techniques
Compare distributions across categories using the hue parameter:
plt.figure(figsize=(12, 6))
# Side-by-side comparison by day
sns.histplot(
data=tips,
x='total_bill',
hue='time', # Split by categorical variable
bins=20,
multiple='dodge', # Place bars side-by-side
palette='Set2',
edgecolor='black',
alpha=0.7
)
plt.title('Bill Distribution: Lunch vs Dinner', fontsize=14)
plt.legend(title='Time')
plt.show()
The multiple parameter controls how overlapping histograms display:
'layer': Overlay with transparency'dodge': Side-by-side bars'stack': Stack bars vertically'fill': Normalize to show proportions
For stacked histograms:
plt.figure(figsize=(12, 6))
sns.histplot(
data=tips,
x='total_bill',
hue='sex',
bins=20,
multiple='stack',
palette='viridis',
edgecolor='white',
linewidth=0.5
)
plt.title('Bill Distribution by Gender (Stacked)', fontsize=14)
plt.show()
Use FacetGrid for complex multi-dimensional comparisons:
# Create separate subplots for each day
g = sns.FacetGrid(tips, col='day', col_wrap=2, height=4, aspect=1.5)
g.map(sns.histplot, 'total_bill', bins=15, color='teal', kde=True)
g.set_axis_labels('Total Bill ($)', 'Count')
g.set_titles(col_template='{col_name}')
plt.tight_layout()
plt.show()
This creates a 2x2 grid showing bill distributions for each day of the week. FacetGrid excels when comparing the same variable across multiple categorical levels.
Best Practices and Common Pitfalls
Use histplot(), not distplot(). The older distplot() function is deprecated. For distribution visualization, use:
histplot(): Histograms with optional KDEkdeplot(): Standalone kernel density estimatesdisplot(): Figure-level interface combining multiple plot types
# Modern approach with displot (figure-level)
sns.displot(
data=tips,
x='total_bill',
hue='time',
kind='hist',
bins=20,
kde=True,
height=6,
aspect=1.5
)
plt.show()
Normalize when comparing different sample sizes:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Count (misleading with different sample sizes)
sns.histplot(data=tips, x='total_bill', hue='time',
multiple='dodge', ax=axes[0])
axes[0].set_title('Count (Raw)')
# Density (normalized)
sns.histplot(data=tips, x='total_bill', hue='time',
multiple='dodge', stat='density', ax=axes[1])
axes[1].set_title('Density (Normalized)')
plt.tight_layout()
plt.show()
The stat parameter accepts:
'count': Raw frequencies (default)'density': Normalize so area sums to 1'probability': Normalize so bars sum to 1'percent': Show as percentages
Performance with large datasets: For datasets exceeding 100,000 rows, consider binning strategies:
# Efficient approach for large data
large_data = np.random.randn(500000)
plt.figure(figsize=(12, 6))
sns.histplot(large_data, bins=50, kde=False) # Disable KDE for speed
plt.title('Large Dataset (500K points)')
plt.show()
Disable KDE for datasets over 10,000 points unless you specifically need it—the computation becomes expensive and the visual benefit diminishes.
Choose bins consistently when creating multiple related plots. Fixed bin edges ensure fair comparisons:
# Consistent bins across subplots
bin_edges = np.linspace(0, 50, 21)
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
sns.histplot(data=tips[tips['time']=='Lunch'], x='total_bill',
bins=bin_edges, ax=axes[0])
sns.histplot(data=tips[tips['time']=='Dinner'], x='total_bill',
bins=bin_edges, ax=axes[1])
plt.show()
Histograms are fundamental to data exploration. Master these techniques and you’ll quickly identify distributions, outliers, and patterns that inform downstream analysis. Seaborn makes this straightforward—use it.