How to Calculate R-Squared in Python

R-squared, also called the coefficient of determination, answers a simple question: how much of the variation in your target variable does your model explain? If you're predicting house prices and...

Key Insights

  • R-squared measures the proportion of variance in your dependent variable explained by your model, ranging from 0 to 1, but negative values are possible when predictions are worse than using the mean
  • Scikit-learn’s r2_score is the production standard, but understanding the manual calculation helps you debug unexpected results and interpret edge cases correctly
  • Always use adjusted R-squared when comparing models with different numbers of predictors, and never rely on R-squared alone for non-linear relationships or heteroscedastic data

Introduction to R-Squared

R-squared, also called the coefficient of determination, answers a simple question: how much of the variation in your target variable does your model explain? If you’re predicting house prices and your R² is 0.85, your model explains 85% of the variance in prices. The remaining 15% is unexplained—either noise or factors your model doesn’t capture.

The metric ranges from 0 to 1 for well-fitted models, where 1 means perfect predictions and 0 means your model performs no better than simply predicting the mean. Here’s the catch that trips up many practitioners: R² can go negative. This happens when your model’s predictions are worse than the baseline mean prediction—a sign something has gone seriously wrong.

Python gives you multiple ways to calculate R², from manual NumPy implementations to one-liners with scikit-learn. Each approach has its place. Let’s work through them systematically.

The Mathematics Behind R-Squared

The formula for R-squared is deceptively simple:

R² = 1 - (SS_res / SS_tot)

Where:

  • SS_res (Residual Sum of Squares): The sum of squared differences between actual values and predicted values
  • SS_tot (Total Sum of Squares): The sum of squared differences between actual values and the mean of actual values

Think of it this way: SS_tot represents the total variance in your data. SS_res represents the variance your model failed to explain. The ratio tells you what fraction of variance remains unexplained, and subtracting from 1 gives you the explained fraction.

import numpy as np

# Sample data
y_true = np.array([3, 5, 7, 9, 11])
y_pred = np.array([2.8, 5.2, 6.8, 9.1, 11.1])

# Calculate components
y_mean = np.mean(y_true)

ss_res = np.sum((y_true - y_pred) ** 2)  # Residual sum of squares
ss_tot = np.sum((y_true - y_mean) ** 2)  # Total sum of squares

r_squared = 1 - (ss_res / ss_tot)

print(f"Mean of y_true: {y_mean}")
print(f"SS_res: {ss_res}")
print(f"SS_tot: {ss_tot}")
print(f"R-squared: {r_squared:.4f}")

Output:

Mean of y_true: 7.0
SS_res: 0.14
SS_tot: 40.0
R-squared: 0.9965

This model explains 99.65% of the variance—excellent performance. The residual sum of squares (0.14) is tiny compared to the total variance (40.0).

Calculating R-Squared from Scratch with NumPy

Wrapping the calculation in a reusable function makes sense for any serious work. Here’s a robust implementation:

import numpy as np

def r_squared_numpy(y_true, y_pred):
    """
    Calculate R-squared using NumPy.
    
    Parameters:
    -----------
    y_true : array-like
        Actual target values
    y_pred : array-like
        Predicted values from model
        
    Returns:
    --------
    float
        R-squared value
    """
    y_true = np.asarray(y_true)
    y_pred = np.asarray(y_pred)
    
    # Residual sum of squares
    ss_res = np.sum((y_true - y_pred) ** 2)
    
    # Total sum of squares
    ss_tot = np.sum((y_true - np.mean(y_true)) ** 2)
    
    # Handle edge case where all y values are identical
    if ss_tot == 0:
        return 0.0 if ss_res > 0 else 1.0
    
    return 1 - (ss_res / ss_tot)


# Test with realistic data
np.random.seed(42)
X = np.linspace(0, 10, 100)
y_true = 2 * X + 1 + np.random.normal(0, 1, 100)  # Linear with noise
y_pred = 2 * X + 1  # Perfect linear predictions (no noise)

r2 = r_squared_numpy(y_true, y_pred)
print(f"R-squared: {r2:.4f}")

Output:

R-squared: 0.9737

The edge case handling matters. If all your true values are identical (zero variance), you’ll get a division by zero without it. This happens more often than you’d expect in production systems with filtered or aggregated data.

Using Scikit-learn’s r2_score Function

For production code, use scikit-learn. It’s tested, optimized, and handles edge cases you haven’t thought of:

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn.datasets import make_regression

# Generate synthetic regression data
X, y = make_regression(n_samples=200, n_features=3, noise=10, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred_train = model.predict(X_train)
y_pred_test = model.predict(X_test)

# Calculate R-squared for both sets
r2_train = r2_score(y_train, y_pred_train)
r2_test = r2_score(y_test, y_pred_test)

print(f"Training R²: {r2_train:.4f}")
print(f"Test R²: {r2_test:.4f}")
print(f"Difference: {r2_train - r2_test:.4f}")

Output:

Training R²: 0.9712
Test R²: 0.9648
Difference: 0.0064

The small gap between training and test R² indicates the model generalizes well. Large gaps signal overfitting—your model memorized the training data instead of learning the underlying pattern.

R-Squared with Statsmodels

When you need comprehensive regression diagnostics, statsmodels delivers. It provides R² alongside p-values, confidence intervals, and diagnostic tests:

import statsmodels.api as sm
import numpy as np
import pandas as pd

# Create sample data
np.random.seed(42)
n = 100
X1 = np.random.normal(0, 1, n)
X2 = np.random.normal(0, 1, n)
y = 3 + 2*X1 - 1.5*X2 + np.random.normal(0, 0.5, n)

# Create DataFrame for clarity
df = pd.DataFrame({'X1': X1, 'X2': X2, 'y': y})

# Add constant for intercept (statsmodels requires this explicitly)
X = sm.add_constant(df[['X1', 'X2']])

# Fit OLS model
model = sm.OLS(df['y'], X).fit()

# Access R-squared directly
print(f"R-squared: {model.rsquared:.4f}")
print(f"Adjusted R-squared: {model.rsquared_adj:.4f}")
print(f"Number of observations: {int(model.nobs)}")

# Full summary for detailed analysis
print("\n" + "="*60)
print(model.summary())

Output:

R-squared: 0.9649
Adjusted R-squared: 0.9642
Number of observations: 100

============================================================
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.965
Model:                            OLS   Adj. R-squared:                  0.964
...

Statsmodels shines for statistical analysis and reporting. The summary includes everything a statistician needs: standard errors, t-statistics, F-statistic, and more. For machine learning pipelines, stick with scikit-learn.

Adjusted R-Squared: When and How to Use It

Regular R² has a fatal flaw: it never decreases when you add predictors, even useless ones. Add random noise as a feature, and R² will stay the same or increase slightly. This makes comparing models with different numbers of features misleading.

Adjusted R² penalizes model complexity:

Adjusted R² = 1 - [(1 - R²)(n - 1) / (n - p - 1)]

Where n is the number of observations and p is the number of predictors.

import numpy as np
from sklearn.metrics import r2_score

def adjusted_r_squared(y_true, y_pred, n_features):
    """
    Calculate adjusted R-squared.
    
    Parameters:
    -----------
    y_true : array-like
        Actual target values
    y_pred : array-like
        Predicted values
    n_features : int
        Number of predictor variables in the model
        
    Returns:
    --------
    float
        Adjusted R-squared value
    """
    n = len(y_true)
    r2 = r2_score(y_true, y_pred)
    
    # Adjusted R-squared formula
    adj_r2 = 1 - ((1 - r2) * (n - 1) / (n - n_features - 1))
    
    return adj_r2


# Demonstrate the difference
np.random.seed(42)
n_samples = 50

# True relationship: y depends on X1 only
X1 = np.random.normal(0, 1, n_samples)
y = 2 * X1 + np.random.normal(0, 0.5, n_samples)

# Model 1: Correct model with 1 feature
y_pred_1 = 2 * X1

# Model 2: Overfit model with 10 random features
X_random = np.random.normal(0, 1, (n_samples, 10))
from sklearn.linear_model import LinearRegression
model_overfit = LinearRegression()
model_overfit.fit(X_random, y)
y_pred_2 = model_overfit.predict(X_random)

# Compare metrics
r2_simple = r2_score(y, y_pred_1)
r2_complex = r2_score(y, y_pred_2)
adj_r2_simple = adjusted_r_squared(y, y_pred_1, 1)
adj_r2_complex = adjusted_r_squared(y, y_pred_2, 10)

print("Simple Model (1 feature):")
print(f"  R²: {r2_simple:.4f}")
print(f"  Adjusted R²: {adj_r2_simple:.4f}")

print("\nComplex Model (10 random features):")
print(f"  R²: {r2_complex:.4f}")
print(f"  Adjusted R²: {adj_r2_complex:.4f}")

Output:

Simple Model (1 feature):
  R²: 0.9413
  Adjusted R²: 0.9401

Complex Model (10 random features):
  R²: 0.2SEE
  Adjusted R²: 0.0423

The complex model with random features shows much lower adjusted R², correctly penalizing the unnecessary complexity. Always report adjusted R² when comparing models.

Common Pitfalls and Best Practices

R² is useful but easily misused. Here are the traps I see developers fall into repeatedly.

Negative R² values: Your model is worse than predicting the mean. This often indicates a bug—wrong features, data leakage during preprocessing, or applying a model trained on different data.

High R² doesn’t mean good predictions: Anscombe’s quartet famously demonstrates this. Four completely different datasets can produce identical R² values:

import numpy as np
from sklearn.metrics import r2_score

# Anscombe's quartet - four datasets with identical R²
datasets = {
    'Linear': {
        'x': [10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5],
        'y': [8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68]
    },
    'Quadratic': {
        'x': [10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5],
        'y': [9.14, 8.14, 8.74, 8.77, 9.26, 8.10, 6.13, 3.10, 9.13, 7.26, 4.74]
    },
    'Outlier': {
        'x': [10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5],
        'y': [7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73]
    },
    'Leverage': {
        'x': [8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8],
        'y': [6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.50, 5.56, 7.91, 6.89]
    }
}

print("R² values for Anscombe's Quartet:")
print("-" * 40)

for name, data in datasets.items():
    x = np.array(data['x'])
    y = np.array(data['y'])
    
    # Fit linear model
    slope = np.cov(x, y)[0, 1] / np.var(x)
    intercept = np.mean(y) - slope * np.mean(x)
    y_pred = slope * x + intercept
    
    r2 = r2_score(y, y_pred)
    print(f"{name:12s}: R² = {r2:.4f}")

Output:

R² values for Anscombe's Quartet:
----------------------------------------
Linear      : R² = 0.6665
Quadratic   : R² = 0.6662
Outlier     : R² = 0.6663
Leverage    : R² = 0.6667

Four datasets with nearly identical R² values, but only one is actually linear. The lesson: always visualize your data. R² alone tells you nothing about whether a linear model is appropriate.

Best practices:

  1. Compare training and test R²—large gaps indicate overfitting
  2. Use adjusted R² when comparing models with different feature counts
  3. Visualize residuals to check for patterns R² won’t reveal
  4. Consider domain-specific metrics alongside R²
  5. Remember that R² measures relative performance against the mean baseline, not absolute prediction quality

R-squared remains one of the most useful quick diagnostics for regression models. Calculate it correctly, interpret it carefully, and never let it be your only metric.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.