How to Create a Strip Plot in Seaborn
Strip plots display individual data points along a categorical axis, with each observation shown as a single marker. Unlike box plots or bar charts that aggregate data into summary statistics, strip...
Key Insights
- Strip plots excel at showing individual data points across categories, making them ideal for small to medium datasets where you want to preserve granular information that summary statistics would hide
- The jitter parameter is critical for readability—without it, overlapping points create misleading visual density that obscures your actual data distribution
- Layering strip plots with violin or box plots combines individual observations with distributional summaries, giving viewers both statistical context and raw data transparency
Understanding Strip Plots and When to Use Them
Strip plots display individual data points along a categorical axis, with each observation shown as a single marker. Unlike box plots or bar charts that aggregate data into summary statistics, strip plots preserve every data point, making them valuable when you need to show the complete picture of your data distribution.
Use strip plots when you have categorical data with continuous values and want to reveal patterns that summaries might hide: outliers, clustering, gaps in the distribution, or sample size differences between categories. They’re particularly effective for datasets with fewer than a few hundred points per category. Beyond that threshold, consider violin plots or histograms to avoid visual clutter.
The main advantage of strip plots is transparency. When you show a box plot, viewers see medians and quartiles but lose information about individual observations. Strip plots expose bimodal distributions, unusual gaps, or the presence of just a handful of extreme values that might otherwise go unnoticed.
Creating Your First Strip Plot
Let’s start with the fundamentals. Seaborn makes creating strip plots straightforward with the stripplot() function.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load sample dataset
tips = sns.load_dataset('tips')
# Create basic strip plot
plt.figure(figsize=(10, 6))
sns.stripplot(data=tips, x='day', y='total_bill')
plt.title('Restaurant Bills by Day of Week')
plt.ylabel('Total Bill ($)')
plt.xlabel('Day')
plt.show()
This basic example plots restaurant bills across different days. However, you’ll immediately notice a problem: many points overlap, creating vertical lines that don’t accurately represent the data density. This is where customization becomes essential.
Customizing for Clarity and Impact
The most important parameter for strip plots is jitter, which adds random noise to point positions along the categorical axis. This spreads overlapping points horizontally, revealing the true distribution.
# Strip plot with jitter and transparency
plt.figure(figsize=(10, 6))
sns.stripplot(data=tips, x='day', y='total_bill',
jitter=0.3, alpha=0.6, size=6)
plt.title('Restaurant Bills by Day (with Jitter)')
plt.ylabel('Total Bill ($)')
plt.xlabel('Day')
plt.show()
The jitter parameter accepts values between 0 and 0.5, controlling how much horizontal spread to apply. Start with 0.2-0.3 and adjust based on your data density. The alpha parameter controls transparency (0 to 1), helping visualize overlapping points—darker regions indicate higher density.
You can also customize colors and marker sizes to improve readability:
# Customized appearance
plt.figure(figsize=(10, 6))
sns.stripplot(data=tips, x='day', y='total_bill',
jitter=0.25, alpha=0.7, size=7,
color='steelblue', edgecolor='gray', linewidth=0.5)
plt.title('Customized Strip Plot')
plt.ylabel('Total Bill ($)')
plt.xlabel('Day')
plt.show()
Adding edge colors to markers creates definition between points, particularly useful when dealing with moderate overlap even after jittering.
Adding Dimensions with Multiple Categories
Strip plots become more powerful when you encode additional categorical variables using the hue parameter:
# Strip plot with hue for additional category
plt.figure(figsize=(12, 6))
sns.stripplot(data=tips, x='day', y='total_bill',
hue='time', jitter=0.3, alpha=0.7, size=6,
palette='Set2', dodge=True)
plt.title('Restaurant Bills by Day and Time')
plt.ylabel('Total Bill ($)')
plt.xlabel('Day')
plt.legend(title='Meal Time', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()
The dodge=True parameter separates different hue categories within each x-axis category, making comparisons easier. Without it, points from different hue groups overlap at the same x-position.
For analyzing multiple relationships simultaneously, use catplot() with the col or row parameters:
# Multiple strip plots using catplot
g = sns.catplot(data=tips, x='day', y='total_bill',
col='time', kind='strip',
jitter=0.3, alpha=0.6, height=5, aspect=1.2,
palette='viridis')
g.set_axis_labels('Day', 'Total Bill ($)')
g.set_titles('{col_name} Service')
plt.tight_layout()
plt.show()
This creates separate strip plots for each time category, making it easier to spot patterns specific to lunch versus dinner service.
Layering Strip Plots with Statistical Summaries
One of the most effective techniques is combining strip plots with other plot types. This gives you both individual data points and statistical context:
# Strip plot overlaid on violin plot
plt.figure(figsize=(12, 6))
# First create violin plot
sns.violinplot(data=tips, x='day', y='total_bill',
palette='muted', alpha=0.5, inner=None)
# Overlay strip plot
sns.stripplot(data=tips, x='day', y='total_bill',
jitter=0.2, alpha=0.7, size=4, color='black')
plt.title('Bills by Day: Distribution and Individual Points')
plt.ylabel('Total Bill ($)')
plt.xlabel('Day')
plt.show()
The violin plot shows the kernel density estimate while the strip plot shows actual observations. Setting inner=None on the violin plot removes its internal markers, preventing visual clutter.
Another powerful combination uses box plots:
# Strip plot with box plot overlay
plt.figure(figsize=(12, 6))
# Create strip plot first (background)
sns.stripplot(data=tips, x='day', y='total_bill',
jitter=0.3, alpha=0.4, size=5, color='skyblue')
# Overlay box plot
sns.boxplot(data=tips, x='day', y='total_bill',
width=0.3, palette='Set2',
boxprops=dict(alpha=0.7),
showfliers=False) # Strip plot already shows outliers
plt.title('Bills by Day: Box Plot with Individual Points')
plt.ylabel('Total Bill ($)')
plt.xlabel('Day')
plt.show()
Setting showfliers=False on the box plot prevents duplicate outlier markers since the strip plot already displays all points.
Practical Application: Survey Response Analysis
Let’s work through a realistic scenario analyzing employee satisfaction survey data across departments:
# Create realistic survey dataset
import numpy as np
np.random.seed(42)
departments = ['Engineering', 'Sales', 'Marketing', 'Support', 'HR']
data = []
for dept in departments:
if dept == 'Engineering':
scores = np.random.normal(7.5, 1.2, 45)
elif dept == 'Sales':
scores = np.random.normal(6.8, 1.5, 38)
elif dept == 'Marketing':
scores = np.random.normal(7.2, 1.0, 32)
elif dept == 'Support':
scores = np.random.normal(6.5, 1.8, 42)
else: # HR
scores = np.random.normal(7.8, 0.9, 28)
scores = np.clip(scores, 1, 10) # Keep within 1-10 scale
for score in scores:
data.append({'Department': dept, 'Satisfaction': score})
survey_df = pd.DataFrame(data)
# Create comprehensive visualization
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
# Left plot: Strip plot with box plot
sns.stripplot(data=survey_df, x='Department', y='Satisfaction',
jitter=0.3, alpha=0.5, size=6, color='lightcoral',
ax=axes[0])
sns.boxplot(data=survey_df, x='Department', y='Satisfaction',
width=0.4, palette='Set3', ax=axes[0],
boxprops=dict(alpha=0.7), showfliers=False)
axes[0].set_title('Employee Satisfaction by Department', fontsize=14, fontweight='bold')
axes[0].set_ylabel('Satisfaction Score (1-10)')
axes[0].axhline(y=7, color='gray', linestyle='--', alpha=0.5, label='Target Score')
axes[0].legend()
# Right plot: Strip plot with violin plot
sns.violinplot(data=survey_df, x='Department', y='Satisfaction',
palette='pastel', alpha=0.6, inner=None, ax=axes[1])
sns.stripplot(data=survey_df, x='Department', y='Satisfaction',
jitter=0.25, alpha=0.6, size=5, color='darkblue',
ax=axes[1])
axes[1].set_title('Distribution Shape and Individual Responses', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Satisfaction Score (1-10)')
plt.tight_layout()
plt.show()
# Print summary statistics
print(survey_df.groupby('Department')['Satisfaction'].describe().round(2))
This example demonstrates strip plots in a business context, revealing not just average satisfaction but also variability and outliers that might indicate specific issues requiring investigation.
Best Practices and Guidelines
Choose strip plots when you have fewer than 200 points per category and want to show individual observations. For larger datasets, switch to violin plots or histograms to avoid overplotting.
Set jitter appropriately. Too little jitter leaves points overlapping; too much makes categories blend together. Start with 0.2-0.3 and adjust based on point density.
Use transparency (alpha between 0.4-0.7) to reveal overlapping points. Darker areas indicate higher density, providing visual cues about distribution shape.
Consider accessibility. Don’t rely solely on color to convey information. Use the dodge parameter to physically separate categories, and ensure sufficient color contrast. Add reference lines or annotations for key thresholds.
Layer strategically. When combining plot types, render the strip plot first with higher transparency, then overlay statistical summaries with lower transparency. This keeps individual points visible while emphasizing aggregate patterns.
Mind your sample sizes. Strip plots naturally show sample size differences—categories with fewer points have visibly fewer markers. This transparency is valuable but can be misleading if viewers don’t recognize it. Consider adding sample size annotations.
Strip plots occupy a sweet spot in data visualization: detailed enough to preserve individual observations, yet structured enough to facilitate categorical comparisons. Master them, and you’ll have a powerful tool for honest, transparent data communication.