How to Calculate R-Squared in Python
R-squared, also called the coefficient of determination, answers a simple question: how much of the variation in your target variable does your model explain? If you're predicting house prices and...
Key Insights
- R-squared measures the proportion of variance in your dependent variable explained by your model, ranging from 0 to 1, but negative values are possible when predictions are worse than using the mean
- Scikit-learn’s
r2_scoreis the production standard, but understanding the manual calculation helps you debug unexpected results and interpret edge cases correctly - Always use adjusted R-squared when comparing models with different numbers of predictors, and never rely on R-squared alone for non-linear relationships or heteroscedastic data
Introduction to R-Squared
R-squared, also called the coefficient of determination, answers a simple question: how much of the variation in your target variable does your model explain? If you’re predicting house prices and your R² is 0.85, your model explains 85% of the variance in prices. The remaining 15% is unexplained—either noise or factors your model doesn’t capture.
The metric ranges from 0 to 1 for well-fitted models, where 1 means perfect predictions and 0 means your model performs no better than simply predicting the mean. Here’s the catch that trips up many practitioners: R² can go negative. This happens when your model’s predictions are worse than the baseline mean prediction—a sign something has gone seriously wrong.
Python gives you multiple ways to calculate R², from manual NumPy implementations to one-liners with scikit-learn. Each approach has its place. Let’s work through them systematically.
The Mathematics Behind R-Squared
The formula for R-squared is deceptively simple:
R² = 1 - (SS_res / SS_tot)
Where:
- SS_res (Residual Sum of Squares): The sum of squared differences between actual values and predicted values
- SS_tot (Total Sum of Squares): The sum of squared differences between actual values and the mean of actual values
Think of it this way: SS_tot represents the total variance in your data. SS_res represents the variance your model failed to explain. The ratio tells you what fraction of variance remains unexplained, and subtracting from 1 gives you the explained fraction.
import numpy as np
# Sample data
y_true = np.array([3, 5, 7, 9, 11])
y_pred = np.array([2.8, 5.2, 6.8, 9.1, 11.1])
# Calculate components
y_mean = np.mean(y_true)
ss_res = np.sum((y_true - y_pred) ** 2) # Residual sum of squares
ss_tot = np.sum((y_true - y_mean) ** 2) # Total sum of squares
r_squared = 1 - (ss_res / ss_tot)
print(f"Mean of y_true: {y_mean}")
print(f"SS_res: {ss_res}")
print(f"SS_tot: {ss_tot}")
print(f"R-squared: {r_squared:.4f}")
Output:
Mean of y_true: 7.0
SS_res: 0.14
SS_tot: 40.0
R-squared: 0.9965
This model explains 99.65% of the variance—excellent performance. The residual sum of squares (0.14) is tiny compared to the total variance (40.0).
Calculating R-Squared from Scratch with NumPy
Wrapping the calculation in a reusable function makes sense for any serious work. Here’s a robust implementation:
import numpy as np
def r_squared_numpy(y_true, y_pred):
"""
Calculate R-squared using NumPy.
Parameters:
-----------
y_true : array-like
Actual target values
y_pred : array-like
Predicted values from model
Returns:
--------
float
R-squared value
"""
y_true = np.asarray(y_true)
y_pred = np.asarray(y_pred)
# Residual sum of squares
ss_res = np.sum((y_true - y_pred) ** 2)
# Total sum of squares
ss_tot = np.sum((y_true - np.mean(y_true)) ** 2)
# Handle edge case where all y values are identical
if ss_tot == 0:
return 0.0 if ss_res > 0 else 1.0
return 1 - (ss_res / ss_tot)
# Test with realistic data
np.random.seed(42)
X = np.linspace(0, 10, 100)
y_true = 2 * X + 1 + np.random.normal(0, 1, 100) # Linear with noise
y_pred = 2 * X + 1 # Perfect linear predictions (no noise)
r2 = r_squared_numpy(y_true, y_pred)
print(f"R-squared: {r2:.4f}")
Output:
R-squared: 0.9737
The edge case handling matters. If all your true values are identical (zero variance), you’ll get a division by zero without it. This happens more often than you’d expect in production systems with filtered or aggregated data.
Using Scikit-learn’s r2_score Function
For production code, use scikit-learn. It’s tested, optimized, and handles edge cases you haven’t thought of:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn.datasets import make_regression
# Generate synthetic regression data
X, y = make_regression(n_samples=200, n_features=3, noise=10, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions
y_pred_train = model.predict(X_train)
y_pred_test = model.predict(X_test)
# Calculate R-squared for both sets
r2_train = r2_score(y_train, y_pred_train)
r2_test = r2_score(y_test, y_pred_test)
print(f"Training R²: {r2_train:.4f}")
print(f"Test R²: {r2_test:.4f}")
print(f"Difference: {r2_train - r2_test:.4f}")
Output:
Training R²: 0.9712
Test R²: 0.9648
Difference: 0.0064
The small gap between training and test R² indicates the model generalizes well. Large gaps signal overfitting—your model memorized the training data instead of learning the underlying pattern.
R-Squared with Statsmodels
When you need comprehensive regression diagnostics, statsmodels delivers. It provides R² alongside p-values, confidence intervals, and diagnostic tests:
import statsmodels.api as sm
import numpy as np
import pandas as pd
# Create sample data
np.random.seed(42)
n = 100
X1 = np.random.normal(0, 1, n)
X2 = np.random.normal(0, 1, n)
y = 3 + 2*X1 - 1.5*X2 + np.random.normal(0, 0.5, n)
# Create DataFrame for clarity
df = pd.DataFrame({'X1': X1, 'X2': X2, 'y': y})
# Add constant for intercept (statsmodels requires this explicitly)
X = sm.add_constant(df[['X1', 'X2']])
# Fit OLS model
model = sm.OLS(df['y'], X).fit()
# Access R-squared directly
print(f"R-squared: {model.rsquared:.4f}")
print(f"Adjusted R-squared: {model.rsquared_adj:.4f}")
print(f"Number of observations: {int(model.nobs)}")
# Full summary for detailed analysis
print("\n" + "="*60)
print(model.summary())
Output:
R-squared: 0.9649
Adjusted R-squared: 0.9642
Number of observations: 100
============================================================
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.965
Model: OLS Adj. R-squared: 0.964
...
Statsmodels shines for statistical analysis and reporting. The summary includes everything a statistician needs: standard errors, t-statistics, F-statistic, and more. For machine learning pipelines, stick with scikit-learn.
Adjusted R-Squared: When and How to Use It
Regular R² has a fatal flaw: it never decreases when you add predictors, even useless ones. Add random noise as a feature, and R² will stay the same or increase slightly. This makes comparing models with different numbers of features misleading.
Adjusted R² penalizes model complexity:
Adjusted R² = 1 - [(1 - R²)(n - 1) / (n - p - 1)]
Where n is the number of observations and p is the number of predictors.
import numpy as np
from sklearn.metrics import r2_score
def adjusted_r_squared(y_true, y_pred, n_features):
"""
Calculate adjusted R-squared.
Parameters:
-----------
y_true : array-like
Actual target values
y_pred : array-like
Predicted values
n_features : int
Number of predictor variables in the model
Returns:
--------
float
Adjusted R-squared value
"""
n = len(y_true)
r2 = r2_score(y_true, y_pred)
# Adjusted R-squared formula
adj_r2 = 1 - ((1 - r2) * (n - 1) / (n - n_features - 1))
return adj_r2
# Demonstrate the difference
np.random.seed(42)
n_samples = 50
# True relationship: y depends on X1 only
X1 = np.random.normal(0, 1, n_samples)
y = 2 * X1 + np.random.normal(0, 0.5, n_samples)
# Model 1: Correct model with 1 feature
y_pred_1 = 2 * X1
# Model 2: Overfit model with 10 random features
X_random = np.random.normal(0, 1, (n_samples, 10))
from sklearn.linear_model import LinearRegression
model_overfit = LinearRegression()
model_overfit.fit(X_random, y)
y_pred_2 = model_overfit.predict(X_random)
# Compare metrics
r2_simple = r2_score(y, y_pred_1)
r2_complex = r2_score(y, y_pred_2)
adj_r2_simple = adjusted_r_squared(y, y_pred_1, 1)
adj_r2_complex = adjusted_r_squared(y, y_pred_2, 10)
print("Simple Model (1 feature):")
print(f" R²: {r2_simple:.4f}")
print(f" Adjusted R²: {adj_r2_simple:.4f}")
print("\nComplex Model (10 random features):")
print(f" R²: {r2_complex:.4f}")
print(f" Adjusted R²: {adj_r2_complex:.4f}")
Output:
Simple Model (1 feature):
R²: 0.9413
Adjusted R²: 0.9401
Complex Model (10 random features):
R²: 0.2SEE
Adjusted R²: 0.0423
The complex model with random features shows much lower adjusted R², correctly penalizing the unnecessary complexity. Always report adjusted R² when comparing models.
Common Pitfalls and Best Practices
R² is useful but easily misused. Here are the traps I see developers fall into repeatedly.
Negative R² values: Your model is worse than predicting the mean. This often indicates a bug—wrong features, data leakage during preprocessing, or applying a model trained on different data.
High R² doesn’t mean good predictions: Anscombe’s quartet famously demonstrates this. Four completely different datasets can produce identical R² values:
import numpy as np
from sklearn.metrics import r2_score
# Anscombe's quartet - four datasets with identical R²
datasets = {
'Linear': {
'x': [10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5],
'y': [8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68]
},
'Quadratic': {
'x': [10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5],
'y': [9.14, 8.14, 8.74, 8.77, 9.26, 8.10, 6.13, 3.10, 9.13, 7.26, 4.74]
},
'Outlier': {
'x': [10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5],
'y': [7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73]
},
'Leverage': {
'x': [8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8],
'y': [6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.50, 5.56, 7.91, 6.89]
}
}
print("R² values for Anscombe's Quartet:")
print("-" * 40)
for name, data in datasets.items():
x = np.array(data['x'])
y = np.array(data['y'])
# Fit linear model
slope = np.cov(x, y)[0, 1] / np.var(x)
intercept = np.mean(y) - slope * np.mean(x)
y_pred = slope * x + intercept
r2 = r2_score(y, y_pred)
print(f"{name:12s}: R² = {r2:.4f}")
Output:
R² values for Anscombe's Quartet:
----------------------------------------
Linear : R² = 0.6665
Quadratic : R² = 0.6662
Outlier : R² = 0.6663
Leverage : R² = 0.6667
Four datasets with nearly identical R² values, but only one is actually linear. The lesson: always visualize your data. R² alone tells you nothing about whether a linear model is appropriate.
Best practices:
- Compare training and test R²—large gaps indicate overfitting
- Use adjusted R² when comparing models with different feature counts
- Visualize residuals to check for patterns R² won’t reveal
- Consider domain-specific metrics alongside R²
- Remember that R² measures relative performance against the mean baseline, not absolute prediction quality
R-squared remains one of the most useful quick diagnostics for regression models. Calculate it correctly, interpret it carefully, and never let it be your only metric.