How to Calculate Accuracy in Python
Accuracy is the most straightforward classification metric in machine learning. It answers a simple question: what percentage of predictions did my model get right? The formula is equally simple:
Key Insights
- Accuracy measures the ratio of correct predictions to total predictions, making it intuitive but potentially misleading for imbalanced datasets where a naive model can achieve high accuracy by always predicting the majority class
- Python offers multiple approaches to calculate accuracy: manual implementation with NumPy for understanding fundamentals, scikit-learn’s
accuracy_score()for production code, and built-in model methods for streamlined workflows - Always evaluate accuracy alongside precision, recall, and F1-score, especially when class distributions are skewed—a 95% accurate model on a 95:5 imbalanced dataset may simply be predicting the majority class every time
Introduction to Accuracy Metrics
Accuracy is the most straightforward classification metric in machine learning. It answers a simple question: what percentage of predictions did my model get right? The formula is equally simple:
Accuracy = (Correct Predictions) / (Total Predictions)
This metric works well when you have balanced classes and when false positives and false negatives carry similar costs. For example, if you’re building a model to classify images of cats versus dogs with roughly equal numbers of each, accuracy provides a clear performance indicator.
Here’s the conceptual calculation with a small dataset:
# Actual labels
actual = [1, 0, 1, 1, 0, 1, 0, 0, 1, 1]
# Model predictions
predicted = [1, 0, 1, 0, 0, 1, 0, 1, 1, 1]
# Manual calculation
correct = sum([1 for i in range(len(actual)) if actual[i] == predicted[i]])
total = len(actual)
accuracy = correct / total
print(f"Correct predictions: {correct}/{total}")
print(f"Accuracy: {accuracy:.2%}")
# Output: Correct predictions: 8/10
# Output: Accuracy: 80.00%
This example shows 8 correct predictions out of 10 total, yielding 80% accuracy. Simple, intuitive, and easy to communicate to stakeholders.
Manual Accuracy Calculation
Understanding how to calculate accuracy from scratch builds intuition and helps you recognize edge cases. Let’s implement accuracy calculation using basic Python operations and then optimize with NumPy.
Basic Python Implementation
def calculate_accuracy_basic(y_true, y_pred):
"""Calculate accuracy using basic Python operations."""
if len(y_true) != len(y_pred):
raise ValueError("Arrays must have the same length")
if len(y_true) == 0:
raise ValueError("Arrays cannot be empty")
correct = 0
for true, pred in zip(y_true, y_pred):
if true == pred:
correct += 1
return correct / len(y_true)
# Example usage
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 1]
y_pred = [1, 0, 1, 0, 0, 1, 0, 1, 1, 1]
accuracy = calculate_accuracy_basic(y_true, y_pred)
print(f"Accuracy: {accuracy:.4f}") # Output: Accuracy: 0.8000
NumPy-Based Implementation
NumPy provides vectorized operations that are faster and more concise:
import numpy as np
def calculate_accuracy_numpy(y_true, y_pred):
"""Calculate accuracy using NumPy for better performance."""
y_true = np.array(y_true)
y_pred = np.array(y_pred)
if y_true.shape != y_pred.shape:
raise ValueError("Arrays must have the same shape")
if y_true.size == 0:
raise ValueError("Arrays cannot be empty")
# Vectorized comparison
correct = np.sum(y_true == y_pred)
total = len(y_true)
return correct / total
# Example with larger dataset
np.random.seed(42)
y_true = np.random.randint(0, 2, 1000)
y_pred = np.random.randint(0, 2, 1000)
accuracy = calculate_accuracy_numpy(y_true, y_pred)
print(f"Accuracy: {accuracy:.4f}")
The NumPy version leverages element-wise comparison (y_true == y_pred) which returns a boolean array, then np.sum() counts the True values. This approach is significantly faster for large datasets.
Using Scikit-learn’s accuracy_score()
For production code, use scikit-learn’s battle-tested implementation. It handles edge cases and integrates seamlessly with the scikit-learn ecosystem.
Basic Usage
from sklearn.metrics import accuracy_score
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 1]
y_pred = [1, 0, 1, 0, 0, 1, 0, 1, 1, 1]
# Default: returns accuracy as a float
accuracy = accuracy_score(y_true, y_pred)
print(f"Accuracy: {accuracy:.4f}") # Output: Accuracy: 0.8000
# Get the count of correct predictions instead
correct_count = accuracy_score(y_true, y_pred, normalize=False)
print(f"Correct predictions: {correct_count}") # Output: Correct predictions: 8
Multi-class Classification
Accuracy works identically for multi-class problems:
from sklearn.metrics import accuracy_score
# Multi-class example: classifying digits 0-9
y_true = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2]
y_pred = [0, 1, 2, 3, 4, 5, 6, 7, 7, 9, 0, 2, 2] # Two errors
accuracy = accuracy_score(y_true, y_pred)
print(f"Multi-class Accuracy: {accuracy:.4f}") # Output: Multi-class Accuracy: 0.8462
Calculating Accuracy for Different ML Models
Let’s see accuracy calculation in real model evaluation workflows.
Logistic Regression with Train/Test Split
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Train model
model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)
# Calculate accuracies
train_pred = model.predict(X_train)
test_pred = model.predict(X_test)
train_accuracy = accuracy_score(y_train, train_pred)
test_accuracy = accuracy_score(y_test, test_pred)
print(f"Training Accuracy: {train_accuracy:.4f}")
print(f"Testing Accuracy: {test_accuracy:.4f}")
Cross-Validation Accuracy
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score
# Create model
dt_model = DecisionTreeClassifier(random_state=42)
# Calculate cross-validated accuracy
cv_scores = cross_val_score(dt_model, X, y, cv=5, scoring='accuracy')
print(f"Cross-validation scores: {cv_scores}")
print(f"Mean CV Accuracy: {cv_scores.mean():.4f} (+/- {cv_scores.std() * 2:.4f})")
Comparing Multiple Models
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
models = {
'Logistic Regression': LogisticRegression(max_iter=10000),
'Decision Tree': DecisionTreeClassifier(random_state=42),
'Random Forest': RandomForestClassifier(random_state=42),
'SVM': SVC(random_state=42)
}
results = {}
for name, model in models.items():
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
results[name] = accuracy
print(f"{name}: {accuracy:.4f}")
Understanding Accuracy Limitations
Accuracy can be dangerously misleading with imbalanced datasets. Consider a fraud detection system where only 1% of transactions are fraudulent:
import numpy as np
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
# Simulated imbalanced dataset: 99% legitimate (0), 1% fraud (1)
np.random.seed(42)
y_true = np.concatenate([np.zeros(990), np.ones(10)])
# Naive model that always predicts "legitimate"
y_pred_naive = np.zeros(1000)
# Slightly better model that catches some fraud
y_pred_better = y_true.copy()
y_pred_better[np.random.choice(np.where(y_true == 1)[0], 5, replace=False)] = 0
print("Naive Model (always predicts 0):")
print(f" Accuracy: {accuracy_score(y_true, y_pred_naive):.4f}")
print(f" Recall: {recall_score(y_true, y_pred_naive):.4f}")
print("\nBetter Model:")
print(f" Accuracy: {accuracy_score(y_true, y_pred_better):.4f}")
print(f" Recall: {recall_score(y_true, y_pred_better):.4f}")
The naive model achieves 99% accuracy by never detecting fraud! This demonstrates why you need multiple metrics:
from sklearn.metrics import classification_report, confusion_matrix
def evaluate_model_comprehensive(y_true, y_pred, model_name):
"""Comprehensive evaluation beyond just accuracy."""
print(f"\n{'='*50}")
print(f"Evaluation for: {model_name}")
print(f"{'='*50}")
print(f"\nAccuracy: {accuracy_score(y_true, y_pred):.4f}")
print(f"\nClassification Report:")
print(classification_report(y_true, y_pred))
print(f"\nConfusion Matrix:")
print(confusion_matrix(y_true, y_pred))
# Evaluate both models
evaluate_model_comprehensive(y_true, y_pred_naive, "Naive Model")
evaluate_model_comprehensive(y_true, y_pred_better, "Better Model")
Best Practices and Tips
Custom Accuracy Function with Validation
def robust_accuracy(y_true, y_pred, verbose=True):
"""Production-ready accuracy calculation with logging."""
y_true = np.array(y_true)
y_pred = np.array(y_pred)
# Validation
assert y_true.shape == y_pred.shape, "Shape mismatch"
assert len(y_true) > 0, "Empty arrays"
accuracy = accuracy_score(y_true, y_pred)
if verbose:
total = len(y_true)
correct = int(accuracy * total)
print(f"Correct: {correct}/{total} ({accuracy:.2%})")
return accuracy
Tracking Accuracy Across Training Epochs
from sklearn.neural_network import MLPClassifier
def train_with_accuracy_tracking(X_train, y_train, X_val, y_val, epochs=10):
"""Track accuracy during training."""
train_accuracies = []
val_accuracies = []
model = MLPClassifier(hidden_layer_sizes=(100,), max_iter=1, warm_start=True)
for epoch in range(epochs):
model.fit(X_train, y_train)
train_acc = accuracy_score(y_train, model.predict(X_train))
val_acc = accuracy_score(y_val, model.predict(X_val))
train_accuracies.append(train_acc)
val_accuracies.append(val_acc)
print(f"Epoch {epoch+1}: Train={train_acc:.4f}, Val={val_acc:.4f}")
return train_accuracies, val_accuracies
# Usage
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
train_accs, val_accs = train_with_accuracy_tracking(X_train, y_train, X_val, y_val)
Complete Evaluation Report
def create_evaluation_report(model, X_test, y_test):
"""Generate comprehensive evaluation report."""
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
precision, recall, f1, _ = precision_recall_fscore_support(
y_test, y_pred, average='weighted'
)
report = {
'accuracy': accuracy,
'precision': precision,
'recall': recall,
'f1_score': f1,
'total_samples': len(y_test)
}
print("\nModel Evaluation Report")
print("-" * 40)
for metric, value in report.items():
if metric != 'total_samples':
print(f"{metric.capitalize():15}: {value:.4f}")
else:
print(f"{metric.capitalize():15}: {value}")
return report
Accuracy is your first metric, not your only metric. Calculate it efficiently, understand its limitations, and always pair it with precision, recall, and domain-specific metrics for robust model evaluation.