How to Calculate Adjusted R-Squared in Python

Key Insights

Adjusted R-squared penalizes model complexity by accounting for the number of predictors, making it essential for comparing models with different feature counts
While scikit-learn doesn’t provide adjusted R² directly, you can calculate it from the standard R² score using a simple formula involving sample size and predictor count
Statsmodels offers adjusted R² out of the box via the .rsquared_adj attribute, making it the more convenient choice for statistical analysis

Introduction to R-Squared and Its Limitations

R-squared (R²) measures how well your regression model explains the variance in your target variable. A value of 0.85 means your model explains 85% of the variance—sounds straightforward. But there’s a catch that trips up many practitioners.

R² has a fundamental flaw: it never decreases when you add more predictors. Throw in a completely random variable? R² goes up (or stays the same). Add your grandmother’s birthday as a feature? R² increases. This behavior makes R² unreliable for comparing models with different numbers of features.

This is where adjusted R-squared becomes essential. It modifies the R² calculation to penalize additional predictors that don’t meaningfully improve the model. When you add a useless feature, adjusted R² actually decreases, giving you honest feedback about your model’s true explanatory power.

You need adjusted R² when:

Comparing models with different numbers of predictors
Performing feature selection
Detecting overfitting from unnecessary variables
Reporting model performance in academic or business contexts

The Adjusted R-Squared Formula

The adjusted R-squared formula accounts for both sample size and the number of predictors:

Adjusted R² = 1 - [(1 - R²) × (n - 1) / (n - p - 1)]

Where:

R² is the standard coefficient of determination
n is the number of observations (samples)
p is the number of predictors (features)

The term (n - 1) / (n - p - 1) acts as a penalty factor. As you add predictors (increasing p), this ratio grows larger, which inflates the (1 - R²) term and reduces the adjusted R². The penalty becomes more severe when you have fewer observations relative to predictors.

Interpretation guidelines:

Values closer to 1.0 indicate better fit
Negative values are possible (and indicate a terrible model)
A decrease when adding features suggests those features hurt more than help
Generally, values above 0.7 indicate reasonable explanatory power, though this varies by domain

Here’s how to calculate adjusted R² manually using NumPy:

import numpy as np

def adjusted_r_squared_manual(y_true, y_pred, n_features):
    """
    Calculate adjusted R-squared from scratch.
    
    Parameters:
    -----------
    y_true : array-like
        Actual target values
    y_pred : array-like
        Predicted target values
    n_features : int
        Number of predictors in the model
    
    Returns:
    --------
    float : Adjusted R-squared value
    """
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)
    
    n = len(y_true)
    
    # Calculate R-squared
    ss_residual = np.sum((y_true - y_pred) ** 2)
    ss_total = np.sum((y_true - np.mean(y_true)) ** 2)
    r_squared = 1 - (ss_residual / ss_total)
    
    # Calculate adjusted R-squared
    adjusted_r2 = 1 - ((1 - r_squared) * (n - 1) / (n - n_features - 1))
    
    return adjusted_r2

# Example usage
np.random.seed(42)
y_true = np.array([3, 5, 7, 9, 11, 13, 15])
y_pred = np.array([2.8, 5.2, 6.9, 9.1, 10.8, 13.2, 14.9])
n_features = 2

adj_r2 = adjusted_r_squared_manual(y_true, y_pred, n_features)
print(f"Adjusted R²: {adj_r2:.4f}")

Calculating Adjusted R² with Scikit-learn

Scikit-learn’s LinearRegression provides R² through the score() method, but it doesn’t offer adjusted R² directly. You need to calculate it yourself using the formula above.

Here’s a complete workflow with a synthetic dataset:

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression

def adjusted_r_squared(model, X, y):
    """
    Calculate adjusted R-squared for a fitted sklearn model.
    
    Parameters:
    -----------
    model : fitted sklearn estimator
        Must have a score() method
    X : array-like of shape (n_samples, n_features)
        Feature matrix
    y : array-like of shape (n_samples,)
        Target values
    
    Returns:
    --------
    tuple : (r_squared, adjusted_r_squared)
    """
    r2 = model.score(X, y)
    n = X.shape[0]
    p = X.shape[1]
    
    adjusted_r2 = 1 - ((1 - r2) * (n - 1) / (n - p - 1))
    
    return r2, adjusted_r2

# Generate synthetic regression data
X, y = make_regression(
    n_samples=200,
    n_features=5,
    n_informative=3,  # Only 3 features actually matter
    noise=10,
    random_state=42
)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Fit the model
model = LinearRegression()
model.fit(X_train, y_train)

# Calculate metrics on test set
r2, adj_r2 = adjusted_r_squared(model, X_test, y_test)

print(f"R-squared: {r2:.4f}")
print(f"Adjusted R-squared: {adj_r2:.4f}")
print(f"Number of samples: {X_test.shape[0]}")
print(f"Number of features: {X_test.shape[1]}")

This function is reusable across any scikit-learn estimator that implements the score() method, including Ridge, Lasso, and ElasticNet.

Calculating Adjusted R² with Statsmodels

Statsmodels is the more statistics-focused library and provides adjusted R² directly. This makes it the preferred choice when you need comprehensive regression diagnostics.

import numpy as np
import statsmodels.api as sm
from sklearn.datasets import make_regression

# Generate data
X, y = make_regression(
    n_samples=200,
    n_features=5,
    n_informative=3,
    noise=10,
    random_state=42
)

# Statsmodels requires manually adding a constant for the intercept
X_with_const = sm.add_constant(X)

# Fit OLS model
model = sm.OLS(y, X_with_const)
results = model.fit()

# Access adjusted R-squared directly
print(f"R-squared: {results.rsquared:.4f}")
print(f"Adjusted R-squared: {results.rsquared_adj:.4f}")

# Full summary includes both metrics
print("\n" + "="*60)
print(results.summary())

The summary output provides a wealth of information including confidence intervals, p-values, and diagnostic statistics. The adjusted R² appears in the top-right section of the summary table.

Let’s verify that statsmodels matches our manual calculation:

# Verify manual calculation matches statsmodels
y_pred = results.predict(X_with_const)
n = len(y)
p = X.shape[1]  # Don't count the constant

ss_res = np.sum((y - y_pred) ** 2)
ss_tot = np.sum((y - np.mean(y)) ** 2)
r2_manual = 1 - (ss_res / ss_tot)
adj_r2_manual = 1 - ((1 - r2_manual) * (n - 1) / (n - p - 1))

print(f"\nManual R²: {r2_manual:.4f}")
print(f"Statsmodels R²: {results.rsquared:.4f}")
print(f"Manual Adjusted R²: {adj_r2_manual:.4f}")
print(f"Statsmodels Adjusted R²: {results.rsquared_adj:.4f}")

Comparing R² vs Adjusted R² in Practice

The real value of adjusted R² becomes apparent when you start adding features to your model. Let’s demonstrate how the two metrics diverge as we add irrelevant noise features:

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression

def compare_metrics_with_noise_features(max_noise_features=15):
    """
    Demonstrate how R² and Adjusted R² diverge when adding noise features.
    """
    # Create base dataset with 3 informative features
    np.random.seed(42)
    X_base, y = make_regression(
        n_samples=100,
        n_features=3,
        n_informative=3,
        noise=15,
        random_state=42
    )
    
    results = []
    
    for n_noise in range(max_noise_features + 1):
        # Add noise features
        if n_noise > 0:
            noise_features = np.random.randn(100, n_noise)
            X = np.hstack([X_base, noise_features])
        else:
            X = X_base
        
        # Fit model
        model = LinearRegression()
        model.fit(X, y)
        
        # Calculate metrics
        r2 = model.score(X, y)
        n, p = X.shape
        adj_r2 = 1 - ((1 - r2) * (n - 1) / (n - p - 1))
        
        results.append({
            'noise_features': n_noise,
            'total_features': p,
            'r_squared': r2,
            'adjusted_r_squared': adj_r2,
            'difference': r2 - adj_r2
        })
    
    return pd.DataFrame(results)

# Run comparison
df = compare_metrics_with_noise_features(15)

print("Impact of Adding Noise Features to a Model")
print("=" * 65)
print(df.to_string(index=False, float_format='{:.4f}'.format))

print("\n\nKey Observations:")
print(f"- R² increased from {df['r_squared'].iloc[0]:.4f} to {df['r_squared'].iloc[-1]:.4f}")
print(f"- Adjusted R² decreased from {df['adjusted_r_squared'].iloc[0]:.4f} to {df['adjusted_r_squared'].iloc[-1]:.4f}")
print(f"- Gap between metrics grew from {df['difference'].iloc[0]:.4f} to {df['difference'].iloc[-1]:.4f}")

This example clearly shows the problem: R² keeps climbing even though we’re adding pure noise. Meanwhile, adjusted R² correctly signals that model quality is degrading. When adjusted R² starts declining while R² increases, you have strong evidence of overfitting.

Best Practices and Common Pitfalls

When to use adjusted R² vs other metrics:

Adjusted R² works well for comparing linear models with different numbers of features on the same dataset. However, consider alternatives in these situations:

AIC/BIC: Better for model selection when comparing non-nested models or when you want stronger penalties for complexity. BIC penalizes more heavily than adjusted R².
Cross-validation scores: More robust for predictive modeling since they estimate out-of-sample performance.
RMSE/MAE: When you need interpretable error units rather than variance explained.

import statsmodels.api as sm

# Quick comparison of model selection criteria
X_with_const = sm.add_constant(X)
results = sm.OLS(y, X_with_const).fit()

print(f"Adjusted R²: {results.rsquared_adj:.4f}")
print(f"AIC: {results.aic:.2f}")
print(f"BIC: {results.bic:.2f}")

Limitations to keep in mind:

Non-linear relationships: Adjusted R² assumes linear relationships. A low value might mean you need polynomial features or a different model type, not necessarily bad predictors.
Heteroscedasticity: When error variance isn’t constant, R² and adjusted R² can be misleading. Check residual plots.
Sample size sensitivity: With small samples, adjusted R² becomes unstable. The penalty term (n-1)/(n-p-1) can produce extreme values when n is close to p.
Not comparable across datasets: You can’t compare adjusted R² values between different target variables or datasets.

Quick reference for adjusted R² calculation:

# Scikit-learn (manual calculation required)
adj_r2 = 1 - ((1 - model.score(X, y)) * (n - 1) / (n - p - 1))

# Statsmodels (built-in)
adj_r2 = results.rsquared_adj

Use adjusted R² as one tool among many. It’s excellent for catching overfitting from unnecessary features, but combine it with cross-validation, residual analysis, and domain knowledge for robust model evaluation.