How to Calculate Adjusted R-Squared in Python
R-squared (R²) measures how well your regression model explains the variance in your target variable. A value of 0.85 means your model explains 85% of the variance—sounds straightforward. But there's...
Key Insights
- Adjusted R-squared penalizes model complexity by accounting for the number of predictors, making it essential for comparing models with different feature counts
- While scikit-learn doesn’t provide adjusted R² directly, you can calculate it from the standard R² score using a simple formula involving sample size and predictor count
- Statsmodels offers adjusted R² out of the box via the
.rsquared_adjattribute, making it the more convenient choice for statistical analysis
Introduction to R-Squared and Its Limitations
R-squared (R²) measures how well your regression model explains the variance in your target variable. A value of 0.85 means your model explains 85% of the variance—sounds straightforward. But there’s a catch that trips up many practitioners.
R² has a fundamental flaw: it never decreases when you add more predictors. Throw in a completely random variable? R² goes up (or stays the same). Add your grandmother’s birthday as a feature? R² increases. This behavior makes R² unreliable for comparing models with different numbers of features.
This is where adjusted R-squared becomes essential. It modifies the R² calculation to penalize additional predictors that don’t meaningfully improve the model. When you add a useless feature, adjusted R² actually decreases, giving you honest feedback about your model’s true explanatory power.
You need adjusted R² when:
- Comparing models with different numbers of predictors
- Performing feature selection
- Detecting overfitting from unnecessary variables
- Reporting model performance in academic or business contexts
The Adjusted R-Squared Formula
The adjusted R-squared formula accounts for both sample size and the number of predictors:
Adjusted R² = 1 - [(1 - R²) × (n - 1) / (n - p - 1)]
Where:
- R² is the standard coefficient of determination
- n is the number of observations (samples)
- p is the number of predictors (features)
The term (n - 1) / (n - p - 1) acts as a penalty factor. As you add predictors (increasing p), this ratio grows larger, which inflates the (1 - R²) term and reduces the adjusted R². The penalty becomes more severe when you have fewer observations relative to predictors.
Interpretation guidelines:
- Values closer to 1.0 indicate better fit
- Negative values are possible (and indicate a terrible model)
- A decrease when adding features suggests those features hurt more than help
- Generally, values above 0.7 indicate reasonable explanatory power, though this varies by domain
Here’s how to calculate adjusted R² manually using NumPy:
import numpy as np
def adjusted_r_squared_manual(y_true, y_pred, n_features):
"""
Calculate adjusted R-squared from scratch.
Parameters:
-----------
y_true : array-like
Actual target values
y_pred : array-like
Predicted target values
n_features : int
Number of predictors in the model
Returns:
--------
float : Adjusted R-squared value
"""
y_true = np.array(y_true)
y_pred = np.array(y_pred)
n = len(y_true)
# Calculate R-squared
ss_residual = np.sum((y_true - y_pred) ** 2)
ss_total = np.sum((y_true - np.mean(y_true)) ** 2)
r_squared = 1 - (ss_residual / ss_total)
# Calculate adjusted R-squared
adjusted_r2 = 1 - ((1 - r_squared) * (n - 1) / (n - n_features - 1))
return adjusted_r2
# Example usage
np.random.seed(42)
y_true = np.array([3, 5, 7, 9, 11, 13, 15])
y_pred = np.array([2.8, 5.2, 6.9, 9.1, 10.8, 13.2, 14.9])
n_features = 2
adj_r2 = adjusted_r_squared_manual(y_true, y_pred, n_features)
print(f"Adjusted R²: {adj_r2:.4f}")
Calculating Adjusted R² with Scikit-learn
Scikit-learn’s LinearRegression provides R² through the score() method, but it doesn’t offer adjusted R² directly. You need to calculate it yourself using the formula above.
Here’s a complete workflow with a synthetic dataset:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
def adjusted_r_squared(model, X, y):
"""
Calculate adjusted R-squared for a fitted sklearn model.
Parameters:
-----------
model : fitted sklearn estimator
Must have a score() method
X : array-like of shape (n_samples, n_features)
Feature matrix
y : array-like of shape (n_samples,)
Target values
Returns:
--------
tuple : (r_squared, adjusted_r_squared)
"""
r2 = model.score(X, y)
n = X.shape[0]
p = X.shape[1]
adjusted_r2 = 1 - ((1 - r2) * (n - 1) / (n - p - 1))
return r2, adjusted_r2
# Generate synthetic regression data
X, y = make_regression(
n_samples=200,
n_features=5,
n_informative=3, # Only 3 features actually matter
noise=10,
random_state=42
)
# Split the data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Fit the model
model = LinearRegression()
model.fit(X_train, y_train)
# Calculate metrics on test set
r2, adj_r2 = adjusted_r_squared(model, X_test, y_test)
print(f"R-squared: {r2:.4f}")
print(f"Adjusted R-squared: {adj_r2:.4f}")
print(f"Number of samples: {X_test.shape[0]}")
print(f"Number of features: {X_test.shape[1]}")
This function is reusable across any scikit-learn estimator that implements the score() method, including Ridge, Lasso, and ElasticNet.
Calculating Adjusted R² with Statsmodels
Statsmodels is the more statistics-focused library and provides adjusted R² directly. This makes it the preferred choice when you need comprehensive regression diagnostics.
import numpy as np
import statsmodels.api as sm
from sklearn.datasets import make_regression
# Generate data
X, y = make_regression(
n_samples=200,
n_features=5,
n_informative=3,
noise=10,
random_state=42
)
# Statsmodels requires manually adding a constant for the intercept
X_with_const = sm.add_constant(X)
# Fit OLS model
model = sm.OLS(y, X_with_const)
results = model.fit()
# Access adjusted R-squared directly
print(f"R-squared: {results.rsquared:.4f}")
print(f"Adjusted R-squared: {results.rsquared_adj:.4f}")
# Full summary includes both metrics
print("\n" + "="*60)
print(results.summary())
The summary output provides a wealth of information including confidence intervals, p-values, and diagnostic statistics. The adjusted R² appears in the top-right section of the summary table.
Let’s verify that statsmodels matches our manual calculation:
# Verify manual calculation matches statsmodels
y_pred = results.predict(X_with_const)
n = len(y)
p = X.shape[1] # Don't count the constant
ss_res = np.sum((y - y_pred) ** 2)
ss_tot = np.sum((y - np.mean(y)) ** 2)
r2_manual = 1 - (ss_res / ss_tot)
adj_r2_manual = 1 - ((1 - r2_manual) * (n - 1) / (n - p - 1))
print(f"\nManual R²: {r2_manual:.4f}")
print(f"Statsmodels R²: {results.rsquared:.4f}")
print(f"Manual Adjusted R²: {adj_r2_manual:.4f}")
print(f"Statsmodels Adjusted R²: {results.rsquared_adj:.4f}")
Comparing R² vs Adjusted R² in Practice
The real value of adjusted R² becomes apparent when you start adding features to your model. Let’s demonstrate how the two metrics diverge as we add irrelevant noise features:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
def compare_metrics_with_noise_features(max_noise_features=15):
"""
Demonstrate how R² and Adjusted R² diverge when adding noise features.
"""
# Create base dataset with 3 informative features
np.random.seed(42)
X_base, y = make_regression(
n_samples=100,
n_features=3,
n_informative=3,
noise=15,
random_state=42
)
results = []
for n_noise in range(max_noise_features + 1):
# Add noise features
if n_noise > 0:
noise_features = np.random.randn(100, n_noise)
X = np.hstack([X_base, noise_features])
else:
X = X_base
# Fit model
model = LinearRegression()
model.fit(X, y)
# Calculate metrics
r2 = model.score(X, y)
n, p = X.shape
adj_r2 = 1 - ((1 - r2) * (n - 1) / (n - p - 1))
results.append({
'noise_features': n_noise,
'total_features': p,
'r_squared': r2,
'adjusted_r_squared': adj_r2,
'difference': r2 - adj_r2
})
return pd.DataFrame(results)
# Run comparison
df = compare_metrics_with_noise_features(15)
print("Impact of Adding Noise Features to a Model")
print("=" * 65)
print(df.to_string(index=False, float_format='{:.4f}'.format))
print("\n\nKey Observations:")
print(f"- R² increased from {df['r_squared'].iloc[0]:.4f} to {df['r_squared'].iloc[-1]:.4f}")
print(f"- Adjusted R² decreased from {df['adjusted_r_squared'].iloc[0]:.4f} to {df['adjusted_r_squared'].iloc[-1]:.4f}")
print(f"- Gap between metrics grew from {df['difference'].iloc[0]:.4f} to {df['difference'].iloc[-1]:.4f}")
This example clearly shows the problem: R² keeps climbing even though we’re adding pure noise. Meanwhile, adjusted R² correctly signals that model quality is degrading. When adjusted R² starts declining while R² increases, you have strong evidence of overfitting.
Best Practices and Common Pitfalls
When to use adjusted R² vs other metrics:
Adjusted R² works well for comparing linear models with different numbers of features on the same dataset. However, consider alternatives in these situations:
- AIC/BIC: Better for model selection when comparing non-nested models or when you want stronger penalties for complexity. BIC penalizes more heavily than adjusted R².
- Cross-validation scores: More robust for predictive modeling since they estimate out-of-sample performance.
- RMSE/MAE: When you need interpretable error units rather than variance explained.
import statsmodels.api as sm
# Quick comparison of model selection criteria
X_with_const = sm.add_constant(X)
results = sm.OLS(y, X_with_const).fit()
print(f"Adjusted R²: {results.rsquared_adj:.4f}")
print(f"AIC: {results.aic:.2f}")
print(f"BIC: {results.bic:.2f}")
Limitations to keep in mind:
-
Non-linear relationships: Adjusted R² assumes linear relationships. A low value might mean you need polynomial features or a different model type, not necessarily bad predictors.
-
Heteroscedasticity: When error variance isn’t constant, R² and adjusted R² can be misleading. Check residual plots.
-
Sample size sensitivity: With small samples, adjusted R² becomes unstable. The penalty term
(n-1)/(n-p-1)can produce extreme values when n is close to p. -
Not comparable across datasets: You can’t compare adjusted R² values between different target variables or datasets.
Quick reference for adjusted R² calculation:
# Scikit-learn (manual calculation required)
adj_r2 = 1 - ((1 - model.score(X, y)) * (n - 1) / (n - p - 1))
# Statsmodels (built-in)
adj_r2 = results.rsquared_adj
Use adjusted R² as one tool among many. It’s excellent for catching overfitting from unnecessary features, but combine it with cross-validation, residual analysis, and domain knowledge for robust model evaluation.