How to Calculate Mean Absolute Error in Python

Key Insights

Mean Absolute Error (MAE) measures the average magnitude of prediction errors in the same units as your target variable, making it highly interpretable for stakeholders
MAE is more robust to outliers than MSE/RMSE because it doesn’t square the errors, making it ideal when your dataset contains anomalies you don’t want to overpenalize
Always validate MAE alongside other metrics like R² and RMSE—no single metric tells the complete story of model performance

Introduction to Mean Absolute Error (MAE)

Mean Absolute Error is one of the most intuitive regression metrics you’ll encounter in machine learning. It measures the average absolute difference between predicted and actual values, giving you a straightforward answer to the question: “On average, how far off are my predictions?”

The mathematical formula is deceptively simple:

MAE = (1/n) * Σ|yᵢ - ŷᵢ|

Where n is the number of observations, yᵢ is the actual value, and ŷᵢ is the predicted value. The absolute value ensures that overestimates and underestimates don’t cancel each other out.

Use MAE when you need a metric that’s easy to explain to non-technical stakeholders. If you’re predicting house prices and your MAE is $15,000, that means your model is off by an average of $15,000—no complex interpretation needed. Choose MAE over MSE when your dataset contains outliers that you don’t want to dominate the error calculation. Unlike MSE, which squares errors and thus heavily penalizes large mistakes, MAE treats all errors proportionally.

Calculating MAE from Scratch

Understanding the underlying calculation helps you debug issues and builds intuition about what the metric actually measures. Here’s how to implement MAE using only NumPy:

import numpy as np

def calculate_mae(y_true, y_pred):
    """
    Calculate Mean Absolute Error manually.
    
    Parameters:
    y_true: array-like of actual values
    y_pred: array-like of predicted values
    
    Returns:
    float: Mean Absolute Error
    """
    # Convert to numpy arrays for element-wise operations
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)
    
    # Calculate absolute differences
    absolute_errors = np.abs(y_true - y_pred)
    
    # Return the mean
    return np.mean(absolute_errors)

# Example usage
actual_values = np.array([100, 120, 90, 110, 150])
predicted_values = np.array([105, 115, 95, 105, 145])

mae = calculate_mae(actual_values, predicted_values)
print(f"Manual MAE: {mae}")  # Output: Manual MAE: 5.0

This implementation breaks down each step clearly. We calculate the absolute difference for each prediction, then average those differences. The result tells us that, on average, our predictions are off by 5 units.

Using Scikit-learn’s Built-in MAE Function

In production code, use scikit-learn’s optimized implementation. It’s faster, handles edge cases, and integrates seamlessly with the sklearn ecosystem:

from sklearn.metrics import mean_absolute_error
import numpy as np

# Same example data
actual_values = np.array([100, 120, 90, 110, 150])
predicted_values = np.array([105, 115, 95, 105, 145])

# Calculate MAE using sklearn
mae_sklearn = mean_absolute_error(actual_values, predicted_values)
print(f"Sklearn MAE: {mae_sklearn}")  # Output: Sklearn MAE: 5.0

# Verify it matches our manual calculation
mae_manual = calculate_mae(actual_values, predicted_values)
print(f"Match: {mae_sklearn == mae_manual}")  # Output: Match: True

The mean_absolute_error() function also supports sample weights if you need to give certain predictions more importance:

# Weight the first three predictions more heavily
sample_weights = np.array([2, 2, 2, 1, 1])
weighted_mae = mean_absolute_error(
    actual_values, 
    predicted_values, 
    sample_weight=sample_weights
)
print(f"Weighted MAE: {weighted_mae}")

Practical Example: Evaluating a Regression Model

Let’s work through a complete regression workflow using the California housing dataset. This demonstrates how MAE fits into real model evaluation:

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np

# Load dataset
housing = fetch_california_housing()
X, y = housing.data, housing.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on test set
y_pred = model.predict(X_test)

# Calculate MAE
mae = mean_absolute_error(y_test, y_pred)
print(f"Mean Absolute Error: ${mae:.4f} (in $100,000s)")
print(f"Average prediction is off by: ${mae * 100000:.2f}")

# Compare with other metrics for context
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print(f"\nModel Performance Metrics:")
print(f"MAE:  ${mae:.4f}")
print(f"RMSE: ${rmse:.4f}")
print(f"R²:   {r2:.4f}")

This example shows why MAE is valuable: it tells you in concrete terms (dollars, in this case) how far off your predictions are. An MAE of 0.53 means your model’s predictions are off by an average of $53,000—information that’s immediately actionable for deciding whether the model is good enough for production.

MAE for Multiple Models Comparison

MAE shines when comparing different algorithms. Here’s how to evaluate multiple models systematically:

from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import mean_absolute_error
import pandas as pd

# Dictionary to store results
results = {}

# Define models to compare
models = {
    'Linear Regression': LinearRegression(),
    'Ridge Regression': Ridge(alpha=1.0),
    'Lasso Regression': Lasso(alpha=0.1),
    'Random Forest': RandomForestRegressor(n_estimators=100, random_state=42),
    'Gradient Boosting': GradientBoostingRegressor(n_estimators=100, random_state=42)
}

# Train and evaluate each model
for name, model in models.items():
    # Train
    model.fit(X_train, y_train)
    
    # Predict
    y_pred = model.predict(X_test)
    
    # Calculate MAE
    mae = mean_absolute_error(y_test, y_pred)
    results[name] = mae

# Create comparison DataFrame
comparison_df = pd.DataFrame.from_dict(
    results, 
    orient='index', 
    columns=['MAE']
).sort_values('MAE')

print("\nModel Comparison (sorted by MAE):")
print(comparison_df)
print(f"\nBest Model: {comparison_df.index[0]}")
print(f"Best MAE: ${comparison_df['MAE'].iloc[0]:.4f}")

This systematic comparison reveals which algorithm performs best for your specific problem. The model with the lowest MAE makes the most accurate predictions on average, though you should always consider training time, interpretability, and other factors before making a final decision.

Best Practices and Common Pitfalls

Never rely solely on MAE. While it’s interpretable and robust to outliers, it doesn’t tell you about the direction of errors or capture the variance in your predictions. Always use it alongside R² (to understand explained variance) and RMSE (to understand if large errors exist).

MAE treats all errors equally, which isn’t always desirable. If underestimating house prices by $50,000 is worse than overestimating by the same amount (perhaps due to business constraints), MAE won’t capture this asymmetry. Consider using asymmetric loss functions or custom metrics in such cases.

Here’s robust code that handles common edge cases:

from sklearn.metrics import mean_absolute_error
import numpy as np

def safe_mae(y_true, y_pred):
    """
    Calculate MAE with validation and error handling.
    """
    # Convert to numpy arrays
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)
    
    # Validate inputs
    if len(y_true) == 0 or len(y_pred) == 0:
        raise ValueError("Input arrays cannot be empty")
    
    if len(y_true) != len(y_pred):
        raise ValueError(
            f"Array length mismatch: y_true has {len(y_true)} "
            f"elements, y_pred has {len(y_pred)} elements"
        )
    
    # Check for NaN or infinite values
    if np.any(np.isnan(y_true)) or np.any(np.isnan(y_pred)):
        raise ValueError("Input arrays contain NaN values")
    
    if np.any(np.isinf(y_true)) or np.any(np.isinf(y_pred)):
        raise ValueError("Input arrays contain infinite values")
    
    # Calculate MAE
    return mean_absolute_error(y_true, y_pred)

# Test error handling
try:
    mae = safe_mae([1, 2, 3], [1, 2])  # Mismatched lengths
except ValueError as e:
    print(f"Caught error: {e}")

try:
    mae = safe_mae([1, 2, np.nan], [1, 2, 3])  # NaN value
except ValueError as e:
    print(f"Caught error: {e}")

When comparing models across different datasets or target variables, MAE isn’t scale-invariant. A MAE of 5 is excellent for predicting apartment sizes in square meters but terrible for predicting stock prices in dollars. Use Mean Absolute Percentage Error (MAPE) or scale your targets when comparing across different domains.

Document your MAE results with context: the units, the dataset size, the distribution of your target variable, and what constitutes an acceptable error for your use case. A MAE of $10,000 might be excellent for commercial real estate but unacceptable for residential property under $100,000. The metric is only meaningful with business context.