How to Calculate Mean Squared Error in Python

Key Insights

Mean Squared Error (MSE) measures prediction accuracy by squaring differences between actual and predicted values, making it highly sensitive to outliers—use it when large errors should be heavily penalized
You can calculate MSE manually with NumPy in one line (np.mean((actual - predicted)**2)) or use scikit-learn’s mean_squared_error() function for additional features like sample weighting
Always calculate MSE on your test set, not training data, and consider using RMSE instead when you need results in the same units as your target variable for easier interpretation

Introduction to Mean Squared Error

Mean Squared Error (MSE) is the workhorse metric for evaluating regression models. It quantifies how far your predictions deviate from actual values by calculating the average of squared differences. The mathematical formula is straightforward:

MSE = (1/n) × Σ(yᵢ - ŷᵢ)²

Where n is the number of samples, yᵢ represents actual values, and ŷᵢ represents predicted values.

The squaring operation serves two critical purposes: it eliminates negative values (so positive and negative errors don’t cancel out) and it heavily penalizes larger errors. An error of 10 contributes 100 to your MSE, while an error of 2 contributes only 4. This makes MSE particularly useful when large prediction errors are unacceptable in your application.

When should you use MSE over alternatives? Choose MSE when you want to heavily penalize outliers and large errors. Use RMSE (Root Mean Squared Error) when you need results in the same units as your target variable for easier interpretation. Opt for MAE (Mean Absolute Error) when you want equal weighting for all errors or when your data contains outliers that shouldn’t dominate the metric.

Calculating MSE from Scratch with NumPy

Understanding MSE at a fundamental level means implementing it yourself. With NumPy, this is remarkably concise:

import numpy as np

def calculate_mse(actual, predicted):
    """
    Calculate Mean Squared Error manually.
    
    Parameters:
    actual (array-like): Ground truth values
    predicted (array-like): Predicted values
    
    Returns:
    float: Mean Squared Error
    """
    actual = np.array(actual)
    predicted = np.array(predicted)
    
    # Calculate squared differences
    squared_errors = (actual - predicted) ** 2
    
    # Return the mean
    return np.mean(squared_errors)

# Example usage
actual_values = np.array([3.0, -0.5, 2.0, 7.0])
predicted_values = np.array([2.5, 0.0, 2.0, 8.0])

mse = calculate_mse(actual_values, predicted_values)
print(f"Manual MSE: {mse}")  # Output: Manual MSE: 0.375

Let’s break down what happens step-by-step:

Convert inputs to NumPy arrays for vectorized operations
Subtract predicted from actual values element-wise: [0.5, -0.5, 0.0, -1.0]
Square each difference: [0.25, 0.25, 0.0, 1.0]
Calculate the mean: (0.25 + 0.25 + 0.0 + 1.0) / 4 = 0.375

This implementation works efficiently even with large arrays thanks to NumPy’s vectorization. You can condense it further to a one-liner: np.mean((actual - predicted)**2).

Using Scikit-learn’s Built-in MSE Function

For production code, use scikit-learn’s optimized implementation:

from sklearn.metrics import mean_squared_error
import numpy as np

actual_values = np.array([3.0, -0.5, 2.0, 7.0])
predicted_values = np.array([2.5, 0.0, 2.0, 8.0])

# Calculate MSE using sklearn
mse_sklearn = mean_squared_error(actual_values, predicted_values)
print(f"Sklearn MSE: {mse_sklearn}")  # Output: Sklearn MSE: 0.375

# Verify it matches our manual calculation
mse_manual = np.mean((actual_values - predicted_values) ** 2)
print(f"Match: {np.isclose(mse_sklearn, mse_manual)}")  # Output: Match: True

The scikit-learn function offers additional functionality like sample weighting and multioutput handling:

# Weighted MSE - give more importance to certain samples
sample_weights = np.array([1.0, 1.0, 2.0, 1.0])
weighted_mse = mean_squared_error(
    actual_values, 
    predicted_values, 
    sample_weight=sample_weights
)
print(f"Weighted MSE: {weighted_mse}")  # Output: Weighted MSE: 0.3

Real-World Example: Linear Regression Model Evaluation

Here’s a complete workflow evaluating a linear regression model on the California housing dataset:

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np

# Load dataset
housing = fetch_california_housing()
X, y = housing.data, housing.target

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)

# Calculate MSE for both sets
train_mse = mean_squared_error(y_train, y_train_pred)
test_mse = mean_squared_error(y_test, y_test_pred)

print(f"Training MSE: {train_mse:.4f}")    # Training MSE: 0.5243
print(f"Test MSE: {test_mse:.4f}")          # Test MSE: 0.5558

# Interpret the results
print(f"\nTarget variable range: [{y.min():.2f}, {y.max():.2f}]")
print(f"Average squared error on test set: {test_mse:.4f}")
print(f"Typical prediction error: ±{np.sqrt(test_mse):.4f} (RMSE)")

The test MSE of 0.5558 tells us the average squared error in our predictions. Since the target variable (median house value in $100,000s) ranges from 0.15 to 5.0, this MSE represents reasonable but imperfect predictions. The training and test MSE are similar, indicating no significant overfitting.

MSE rarely stands alone. Here’s how it compares to related metrics:

from sklearn.metrics import mean_squared_error, mean_absolute_error
import numpy as np

# Generate predictions with some outliers
np.random.seed(42)
actual = np.random.randn(100) * 10
predicted = actual + np.random.randn(100) * 2

# Add a few large errors (outliers)
predicted[0] += 20
predicted[1] -= 15

# Calculate different metrics
mse = mean_squared_error(actual, predicted)
rmse = np.sqrt(mse)  # or mean_squared_error(..., squared=False)
mae = mean_absolute_error(actual, predicted)

print(f"MSE:  {mse:.4f}")   # MSE:  35.2847
print(f"RMSE: {rmse:.4f}")  # RMSE: 5.9402
print(f"MAE:  {mae:.4f}")   # MAE:  3.0129

# Calculate metrics without outliers
actual_no_outliers = actual[2:]
predicted_no_outliers = predicted[2:]

mse_clean = mean_squared_error(actual_no_outliers, predicted_no_outliers)
mae_clean = mean_absolute_error(actual_no_outliers, predicted_no_outliers)

print(f"\nWithout outliers:")
print(f"MSE:  {mse_clean:.4f}")  # MSE:  4.2156
print(f"MAE:  {mae_clean:.4f}")  # MAE:  1.6321

print(f"\nMSE increased by: {(mse/mse_clean - 1)*100:.1f}%")
print(f"MAE increased by: {(mae/mae_clean - 1)*100:.1f}%")

Notice how MSE increased by ~737% due to outliers, while MAE only increased by ~85%. This demonstrates MSE’s sensitivity to large errors.

RMSE converts MSE back to the original units by taking the square root. Use RMSE when communicating with non-technical stakeholders:

# RMSE is more interpretable
rmse = mean_squared_error(y_test, y_test_pred, squared=False)
print(f"Average prediction error: ±${rmse * 100000:.0f}")

Weighted MSE gives different importance to different samples:

# Give more weight to expensive houses
weights = np.where(y_test > y_test.median(), 2.0, 1.0)
weighted_mse = mean_squared_error(y_test, y_test_pred, sample_weight=weights)

Best Practices and Common Pitfalls

Always evaluate on test data. Training MSE is optimistically biased and doesn’t reflect real-world performance. Your model has already seen the training data, so low training MSE is expected.

Consider the scale of your target variable. An MSE of 100 is excellent if your target ranges from 0 to 10,000, but terrible if it ranges from 0 to 10. Always contextualize MSE by comparing it to the variance of your target variable:

# Calculate baseline MSE (predicting the mean)
baseline_mse = np.var(y_test)
model_mse = mean_squared_error(y_test, y_test_pred)

improvement = (1 - model_mse / baseline_mse) * 100
print(f"Model improves over baseline by {improvement:.1f}%")

Watch for outliers. MSE’s quadratic penalty means a few bad predictions can dominate your metric. If outliers are measurement errors rather than legitimate extreme values, consider using MAE or robust regression techniques.

Beware of different scales across features. If you’re predicting multiple outputs with different scales, MSE will be dominated by the large-scale outputs. Use normalized MSE or evaluate each output separately.

Performance considerations: For datasets with millions of samples, MSE calculation is fast but memory-intensive. Process in batches if needed:

def batched_mse(actual, predicted, batch_size=10000):
    """Calculate MSE in batches for memory efficiency."""
    n_samples = len(actual)
    total_squared_error = 0
    
    for i in range(0, n_samples, batch_size):
        batch_actual = actual[i:i+batch_size]
        batch_predicted = predicted[i:i+batch_size]
        total_squared_error += np.sum((batch_actual - batch_predicted) ** 2)
    
    return total_squared_error / n_samples

MSE is a fundamental metric, but it’s not universally appropriate. Use it when squared errors align with your business objectives, when you want to penalize large errors heavily, and when your data doesn’t contain excessive outliers. Combine it with other metrics like MAE and R² for a complete picture of model performance.

Introduction to Mean Squared Error

Calculating MSE from Scratch with NumPy

Using Scikit-learn’s Built-in MSE Function

Real-World Example: Linear Regression Model Evaluation

MSE Variations and Related Metrics

Best Practices and Common Pitfalls

Liked this? There's more.