How to Create a Residual Plot in Seaborn

Residual plots are your first line of defense against bad regression models. A residual is the difference between an observed value and the value predicted by your model. When you plot these...

Key Insights

  • Residual plots reveal model assumptions violations that summary statistics miss—look for random scatter around zero to confirm a good linear fit
  • Seaborn’s residplot() works for quick checks, but manually calculating residuals from sklearn models gives you more control and flexibility for complex diagnostics
  • Patterns like funnels indicate heteroscedasticity, curves suggest non-linearity, and both mean your linear model needs rethinking or transformation

Understanding Residual Plots and Why They Matter

Residual plots are your first line of defense against bad regression models. A residual is the difference between an observed value and the value predicted by your model. When you plot these residuals against predicted values or independent variables, you get a diagnostic tool that reveals problems invisible to R² or RMSE metrics.

A good residual plot shows random scatter around zero—no patterns, no trends, just noise. This randomness indicates your model captures the underlying relationship well. Bad residual plots show patterns: curved shapes mean you’re missing non-linear relationships, funnel shapes indicate heteroscedasticity (non-constant variance), and extreme outliers suggest data quality issues or model inadequacy.

Ignore residual plots at your peril. I’ve seen data scientists celebrate high R² values while their residual plots screamed “your model is fundamentally wrong.” Summary statistics lie; residual plots don’t.

Setting Up Your Environment

You need seaborn, matplotlib, pandas, numpy, and scikit-learn. Install them if you haven’t already:

pip install seaborn matplotlib pandas numpy scikit-learn

Here’s your import block:

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Set style for better-looking plots
sns.set_style("whitegrid")
sns.set_palette("husl")

For this article, we’ll use seaborn’s built-in tips dataset and create a synthetic dataset to demonstrate different patterns:

# Load tips dataset
tips = sns.load_dataset('tips')

# Create synthetic data with known characteristics
np.random.seed(42)
X_good = np.linspace(0, 10, 100)
y_good = 2 * X_good + 3 + np.random.normal(0, 1, 100)

X_hetero = np.linspace(0, 10, 100)
y_hetero = 2 * X_hetero + 3 + np.random.normal(0, X_hetero * 0.5, 100)

X_nonlinear = np.linspace(0, 10, 100)
y_nonlinear = X_nonlinear**2 + np.random.normal(0, 3, 100)

Quick Residual Plots with residplot()

Seaborn’s residplot() function creates residual plots with minimal code. It fits a simple regression model internally and plots residuals automatically:

# Basic residual plot
plt.figure(figsize=(10, 6))
sns.residplot(data=tips, x='total_bill', y='tip', 
              scatter_kws={'alpha': 0.5})
plt.axhline(y=0, color='red', linestyle='--', linewidth=2)
plt.title('Residual Plot: Tips vs Total Bill')
plt.xlabel('Total Bill ($)')
plt.ylabel('Residuals')
plt.tight_layout()
plt.show()

The residplot() function accepts several useful parameters:

  • lowess=True: Adds a LOWESS smoothing line to detect non-linear patterns
  • order: Fits polynomial regression of specified order
  • scatter_kws: Dictionary of keyword arguments for scatter plot styling

Here’s an enhanced version with LOWESS smoothing:

plt.figure(figsize=(10, 6))
sns.residplot(data=tips, x='total_bill', y='tip',
              lowess=True,
              scatter_kws={'alpha': 0.5, 's': 50},
              line_kws={'color': 'red', 'linewidth': 2})
plt.axhline(y=0, color='black', linestyle='--', linewidth=1)
plt.title('Residual Plot with LOWESS Smoothing')
plt.xlabel('Total Bill ($)')
plt.ylabel('Residuals')
plt.tight_layout()
plt.show()

The LOWESS line should hover around zero if your model is appropriate. Any systematic deviation indicates problems.

Manual Residual Plots with Sklearn Models

For production work, you’ll typically fit models with sklearn and want residual plots from those specific models. This approach gives you complete control:

# Prepare data
X = tips[['total_bill']].values
y = tips['tip'].values

# Fit linear regression
model = LinearRegression()
model.fit(X, y)

# Calculate predictions and residuals
y_pred = model.predict(X)
residuals = y - y_pred

# Create residual plot
plt.figure(figsize=(10, 6))
sns.scatterplot(x=y_pred, y=residuals, alpha=0.6, s=60)
plt.axhline(y=0, color='red', linestyle='--', linewidth=2)
plt.xlabel('Fitted Values')
plt.ylabel('Residuals')
plt.title('Residual Plot: Manual Calculation')
plt.tight_layout()
plt.show()

This approach works with any sklearn model—ridge regression, lasso, or even ensemble methods. You’re not limited to simple linear regression.

For multiple features, plot residuals against fitted values rather than individual features:

# Multiple regression example
X_multi = tips[['total_bill', 'size']].values
y = tips['tip'].values

model_multi = LinearRegression()
model_multi.fit(X_multi, y)

y_pred_multi = model_multi.predict(X_multi)
residuals_multi = y - y_pred_multi

plt.figure(figsize=(10, 6))
sns.scatterplot(x=y_pred_multi, y=residuals_multi, alpha=0.6)
plt.axhline(y=0, color='red', linestyle='--', linewidth=2)
plt.xlabel('Fitted Values')
plt.ylabel('Residuals')
plt.title('Residual Plot: Multiple Regression')
plt.tight_layout()
plt.show()

Customizing for Publication-Quality Plots

Default plots work for exploration, but you’ll want customization for presentations and reports:

# Create a polished residual plot
fig, ax = plt.subplots(figsize=(12, 7))

# Scatter plot with custom styling
sns.scatterplot(x=y_pred, y=residuals, 
                alpha=0.6, 
                s=80,
                color='#2E86AB',
                edgecolor='white',
                linewidth=0.5,
                ax=ax)

# Reference line at zero
ax.axhline(y=0, color='#A23B72', linestyle='--', linewidth=2.5, 
           label='Zero Residual Line')

# Add standard deviation bands
std_resid = np.std(residuals)
ax.axhline(y=2*std_resid, color='gray', linestyle=':', linewidth=1.5, 
           alpha=0.7, label='±2 Std Dev')
ax.axhline(y=-2*std_resid, color='gray', linestyle=':', linewidth=1.5, 
           alpha=0.7)

# Labels and title
ax.set_xlabel('Fitted Values', fontsize=12, fontweight='bold')
ax.set_ylabel('Residuals', fontsize=12, fontweight='bold')
ax.set_title('Residual Plot: Tips Prediction Model', 
             fontsize=14, fontweight='bold', pad=20)
ax.legend(loc='upper left', frameon=True, shadow=True)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

For comparing multiple models, use subplots:

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Good fit
model_good = LinearRegression()
model_good.fit(X_good.reshape(-1, 1), y_good)
resid_good = y_good - model_good.predict(X_good.reshape(-1, 1))
sns.scatterplot(x=X_good, y=resid_good, alpha=0.6, ax=axes[0])
axes[0].axhline(y=0, color='red', linestyle='--')
axes[0].set_title('Good Fit: Random Scatter')
axes[0].set_xlabel('X')
axes[0].set_ylabel('Residuals')

# Heteroscedasticity
model_hetero = LinearRegression()
model_hetero.fit(X_hetero.reshape(-1, 1), y_hetero)
resid_hetero = y_hetero - model_hetero.predict(X_hetero.reshape(-1, 1))
sns.scatterplot(x=X_hetero, y=resid_hetero, alpha=0.6, ax=axes[1])
axes[1].axhline(y=0, color='red', linestyle='--')
axes[1].set_title('Heteroscedasticity: Funnel Pattern')
axes[1].set_xlabel('X')
axes[1].set_ylabel('Residuals')

# Non-linearity
model_nonlinear = LinearRegression()
model_nonlinear.fit(X_nonlinear.reshape(-1, 1), y_nonlinear)
resid_nonlinear = y_nonlinear - model_nonlinear.predict(X_nonlinear.reshape(-1, 1))
sns.scatterplot(x=X_nonlinear, y=resid_nonlinear, alpha=0.6, ax=axes[2])
axes[2].axhline(y=0, color='red', linestyle='--')
axes[2].set_title('Non-linearity: Curved Pattern')
axes[2].set_xlabel('X')
axes[2].set_ylabel('Residuals')

plt.tight_layout()
plt.show()

Reading Residual Plots Like a Pro

Random Scatter (Good): Points spread randomly around zero with constant variance across the range. This is what you want. Your model assumptions hold, and linear regression is appropriate.

Curved Pattern (Bad): A U-shape or inverted U-shape indicates non-linearity. Your relationship isn’t linear. Solutions: add polynomial terms, try non-linear regression, or transform variables (log, square root).

Funnel Shape (Bad): Variance increases or decreases systematically. This is heteroscedasticity. It violates the constant variance assumption. Solutions: transform the dependent variable, use weighted least squares, or try robust regression methods.

Outliers (Investigate): Points far from zero might be genuine outliers or influential observations. Calculate Cook’s distance to assess their impact. Don’t automatically remove them—understand why they’re different.

Clusters (Bad): Distinct groups suggest missing categorical variables or interactions. Your model is too simple for the data structure.

When and How to Use Residual Plots

Make residual plots mandatory in your regression workflow. Check them before trusting any model metrics. Here’s my standard process:

  1. Fit your model
  2. Create a residual plot immediately
  3. Look for patterns before checking R² or p-values
  4. If patterns exist, fix the model before proceeding
  5. Only interpret coefficients after residuals look random

Use residual plots during model comparison. A model with slightly lower R² but better residual patterns is often the better choice. Summary statistics can mislead; residual plots reveal the truth about model adequacy.

For time series data, plot residuals against time to check for autocorrelation. For spatial data, create spatial residual plots to check for geographic clustering. The principle remains the same: residuals should be random with respect to any variable you can think of.

Residual plots aren’t optional diagnostics—they’re essential validation tools. Learn to read them fluently, and you’ll avoid the trap of deploying fundamentally flawed models that look good on paper but fail in production.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.