How to Calculate R-Squared in Python

Key Insights

R-squared measures the proportion of variance in your dependent variable explained by your model, ranging from 0 to 1, but negative values are possible when predictions are worse than using the mean
Scikit-learn’s r2_score is the production standard, but understanding the manual calculation helps you debug unexpected results and interpret edge cases correctly
Always use adjusted R-squared when comparing models with different numbers of predictors, and never rely on R-squared alone for non-linear relationships or heteroscedastic data

Introduction to R-Squared

R-squared, also called the coefficient of determination, answers a simple question: how much of the variation in your target variable does your model explain? If you’re predicting house prices and your R² is 0.85, your model explains 85% of the variance in prices. The remaining 15% is unexplained—either noise or factors your model doesn’t capture.

The metric ranges from 0 to 1 for well-fitted models, where 1 means perfect predictions and 0 means your model performs no better than simply predicting the mean. Here’s the catch that trips up many practitioners: R² can go negative. This happens when your model’s predictions are worse than the baseline mean prediction—a sign something has gone seriously wrong.

Python gives you multiple ways to calculate R², from manual NumPy implementations to one-liners with scikit-learn. Each approach has its place. Let’s work through them systematically.

The Mathematics Behind R-Squared

The formula for R-squared is deceptively simple:

R² = 1 - (SS_res / SS_tot)

Where:

SS_res (Residual Sum of Squares): The sum of squared differences between actual values and predicted values
SS_tot (Total Sum of Squares): The sum of squared differences between actual values and the mean of actual values

Think of it this way: SS_tot represents the total variance in your data. SS_res represents the variance your model failed to explain. The ratio tells you what fraction of variance remains unexplained, and subtracting from 1 gives you the explained fraction.

import numpy as np

# Sample data
y_true = np.array([3, 5, 7, 9, 11])
y_pred = np.array([2.8, 5.2, 6.8, 9.1, 11.1])

# Calculate components
y_mean = np.mean(y_true)

ss_res = np.sum((y_true - y_pred) ** 2)  # Residual sum of squares
ss_tot = np.sum((y_true - y_mean) ** 2)  # Total sum of squares

r_squared = 1 - (ss_res / ss_tot)

print(f"Mean of y_true: {y_mean}")
print(f"SS_res: {ss_res}")
print(f"SS_tot: {ss_tot}")
print(f"R-squared: {r_squared:.4f}")

Output:

Mean of y_true: 7.0
SS_res: 0.14
SS_tot: 40.0
R-squared: 0.9965

This model explains 99.65% of the variance—excellent performance. The residual sum of squares (0.14) is tiny compared to the total variance (40.0).

Calculating R-Squared from Scratch with NumPy

Wrapping the calculation in a reusable function makes sense for any serious work. Here’s a robust implementation:

import numpy as np

def r_squared_numpy(y_true, y_pred):
    """
    Calculate R-squared using NumPy.
    
    Parameters:
    -----------
    y_true : array-like
        Actual target values
    y_pred : array-like
        Predicted values from model
        
    Returns:
    --------
    float
        R-squared value
    """
    y_true = np.asarray(y_true)
    y_pred = np.asarray(y_pred)
    
    # Residual sum of squares
    ss_res = np.sum((y_true - y_pred) ** 2)
    
    # Total sum of squares
    ss_tot = np.sum((y_true - np.mean(y_true)) ** 2)
    
    # Handle edge case where all y values are identical
    if ss_tot == 0:
        return 0.0 if ss_res > 0 else 1.0
    
    return 1 - (ss_res / ss_tot)


# Test with realistic data
np.random.seed(42)
X = np.linspace(0, 10, 100)
y_true = 2 * X + 1 + np.random.normal(0, 1, 100)  # Linear with noise
y_pred = 2 * X + 1  # Perfect linear predictions (no noise)

r2 = r_squared_numpy(y_true, y_pred)
print(f"R-squared: {r2:.4f}")

Output:

R-squared: 0.9737

The edge case handling matters. If all your true values are identical (zero variance), you’ll get a division by zero without it. This happens more often than you’d expect in production systems with filtered or aggregated data.

Using Scikit-learn’s r2_score Function

For production code, use scikit-learn. It’s tested, optimized, and handles edge cases you haven’t thought of:

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn.datasets import make_regression

# Generate synthetic regression data
X, y = make_regression(n_samples=200, n_features=3, noise=10, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred_train = model.predict(X_train)
y_pred_test = model.predict(X_test)

# Calculate R-squared for both sets
r2_train = r2_score(y_train, y_pred_train)
r2_test = r2_score(y_test, y_pred_test)

print(f"Training R²: {r2_train:.4f}")
print(f"Test R²: {r2_test:.4f}")
print(f"Difference: {r2_train - r2_test:.4f}")

Output:

Training R²: 0.9712
Test R²: 0.9648
Difference: 0.0064

The small gap between training and test R² indicates the model generalizes well. Large gaps signal overfitting—your model memorized the training data instead of learning the underlying pattern.

R-Squared with Statsmodels

When you need comprehensive regression diagnostics, statsmodels delivers. It provides R² alongside p-values, confidence intervals, and diagnostic tests:

import statsmodels.api as sm
import numpy as np
import pandas as pd

# Create sample data
np.random.seed(42)
n = 100
X1 = np.random.normal(0, 1, n)
X2 = np.random.normal(0, 1, n)
y = 3 + 2*X1 - 1.5*X2 + np.random.normal(0, 0.5, n)

# Create DataFrame for clarity
df = pd.DataFrame({'X1': X1, 'X2': X2, 'y': y})

# Add constant for intercept (statsmodels requires this explicitly)
X = sm.add_constant(df[['X1', 'X2']])

# Fit OLS model
model = sm.OLS(df['y'], X).fit()

# Access R-squared directly
print(f"R-squared: {model.rsquared:.4f}")
print(f"Adjusted R-squared: {model.rsquared_adj:.4f}")
print(f"Number of observations: {int(model.nobs)}")

# Full summary for detailed analysis
print("\n" + "="*60)
print(model.summary())

Output:

R-squared: 0.9649
Adjusted R-squared: 0.9642
Number of observations: 100

============================================================
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.965
Model:                            OLS   Adj. R-squared:                  0.964
...

Statsmodels shines for statistical analysis and reporting. The summary includes everything a statistician needs: standard errors, t-statistics, F-statistic, and more. For machine learning pipelines, stick with scikit-learn.

Adjusted R-Squared: When and How to Use It

Regular R² has a fatal flaw: it never decreases when you add predictors, even useless ones. Add random noise as a feature, and R² will stay the same or increase slightly. This makes comparing models with different numbers of features misleading.

Adjusted R² penalizes model complexity:

Adjusted R² = 1 - [(1 - R²)(n - 1) / (n - p - 1)]

Where n is the number of observations and p is the number of predictors.

import numpy as np
from sklearn.metrics import r2_score

def adjusted_r_squared(y_true, y_pred, n_features):
    """
    Calculate adjusted R-squared.
    
    Parameters:
    -----------
    y_true : array-like
        Actual target values
    y_pred : array-like
        Predicted values
    n_features : int
        Number of predictor variables in the model
        
    Returns:
    --------
    float
        Adjusted R-squared value
    """
    n = len(y_true)
    r2 = r2_score(y_true, y_pred)
    
    # Adjusted R-squared formula
    adj_r2 = 1 - ((1 - r2) * (n - 1) / (n - n_features - 1))
    
    return adj_r2


# Demonstrate the difference
np.random.seed(42)
n_samples = 50

# True relationship: y depends on X1 only
X1 = np.random.normal(0, 1, n_samples)
y = 2 * X1 + np.random.normal(0, 0.5, n_samples)

# Model 1: Correct model with 1 feature
y_pred_1 = 2 * X1

# Model 2: Overfit model with 10 random features
X_random = np.random.normal(0, 1, (n_samples, 10))
from sklearn.linear_model import LinearRegression
model_overfit = LinearRegression()
model_overfit.fit(X_random, y)
y_pred_2 = model_overfit.predict(X_random)

# Compare metrics
r2_simple = r2_score(y, y_pred_1)
r2_complex = r2_score(y, y_pred_2)
adj_r2_simple = adjusted_r_squared(y, y_pred_1, 1)
adj_r2_complex = adjusted_r_squared(y, y_pred_2, 10)

print("Simple Model (1 feature):")
print(f"  R²: {r2_simple:.4f}")
print(f"  Adjusted R²: {adj_r2_simple:.4f}")

print("\nComplex Model (10 random features):")
print(f"  R²: {r2_complex:.4f}")
print(f"  Adjusted R²: {adj_r2_complex:.4f}")

Output:

Simple Model (1 feature):
  R²: 0.9413
  Adjusted R²: 0.9401

Complex Model (10 random features):
  R²: 0.2SEE
  Adjusted R²: 0.0423

The complex model with random features shows much lower adjusted R², correctly penalizing the unnecessary complexity. Always report adjusted R² when comparing models.

Common Pitfalls and Best Practices

R² is useful but easily misused. Here are the traps I see developers fall into repeatedly.

Negative R² values: Your model is worse than predicting the mean. This often indicates a bug—wrong features, data leakage during preprocessing, or applying a model trained on different data.

High R² doesn’t mean good predictions: Anscombe’s quartet famously demonstrates this. Four completely different datasets can produce identical R² values:

import numpy as np
from sklearn.metrics import r2_score

# Anscombe's quartet - four datasets with identical R²
datasets = {
    'Linear': {
        'x': [10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5],
        'y': [8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68]
    },
    'Quadratic': {
        'x': [10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5],
        'y': [9.14, 8.14, 8.74, 8.77, 9.26, 8.10, 6.13, 3.10, 9.13, 7.26, 4.74]
    },
    'Outlier': {
        'x': [10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5],
        'y': [7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73]
    },
    'Leverage': {
        'x': [8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8],
        'y': [6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.50, 5.56, 7.91, 6.89]
    }
}

print("R² values for Anscombe's Quartet:")
print("-" * 40)

for name, data in datasets.items():
    x = np.array(data['x'])
    y = np.array(data['y'])
    
    # Fit linear model
    slope = np.cov(x, y)[0, 1] / np.var(x)
    intercept = np.mean(y) - slope * np.mean(x)
    y_pred = slope * x + intercept
    
    r2 = r2_score(y, y_pred)
    print(f"{name:12s}: R² = {r2:.4f}")

Output:

R² values for Anscombe's Quartet:
----------------------------------------
Linear      : R² = 0.6665
Quadratic   : R² = 0.6662
Outlier     : R² = 0.6663
Leverage    : R² = 0.6667

Four datasets with nearly identical R² values, but only one is actually linear. The lesson: always visualize your data. R² alone tells you nothing about whether a linear model is appropriate.

Best practices:

Compare training and test R²—large gaps indicate overfitting
Use adjusted R² when comparing models with different feature counts
Visualize residuals to check for patterns R² won’t reveal
Consider domain-specific metrics alongside R²
Remember that R² measures relative performance against the mean baseline, not absolute prediction quality

R-squared remains one of the most useful quick diagnostics for regression models. Calculate it correctly, interpret it carefully, and never let it be your only metric.

Introduction to R-Squared

The Mathematics Behind R-Squared

Calculating R-Squared from Scratch with NumPy

Using Scikit-learn’s r2_score Function

R-Squared with Statsmodels

Adjusted R-Squared: When and How to Use It

Common Pitfalls and Best Practices

Liked this? There's more.

Similar Articles