How to Calculate Accuracy in Python

Key Insights

Accuracy measures the ratio of correct predictions to total predictions, making it intuitive but potentially misleading for imbalanced datasets where a naive model can achieve high accuracy by always predicting the majority class
Python offers multiple approaches to calculate accuracy: manual implementation with NumPy for understanding fundamentals, scikit-learn’s accuracy_score() for production code, and built-in model methods for streamlined workflows
Always evaluate accuracy alongside precision, recall, and F1-score, especially when class distributions are skewed—a 95% accurate model on a 95:5 imbalanced dataset may simply be predicting the majority class every time

Introduction to Accuracy Metrics

Accuracy is the most straightforward classification metric in machine learning. It answers a simple question: what percentage of predictions did my model get right? The formula is equally simple:

Accuracy = (Correct Predictions) / (Total Predictions)

This metric works well when you have balanced classes and when false positives and false negatives carry similar costs. For example, if you’re building a model to classify images of cats versus dogs with roughly equal numbers of each, accuracy provides a clear performance indicator.

Here’s the conceptual calculation with a small dataset:

# Actual labels
actual = [1, 0, 1, 1, 0, 1, 0, 0, 1, 1]

# Model predictions
predicted = [1, 0, 1, 0, 0, 1, 0, 1, 1, 1]

# Manual calculation
correct = sum([1 for i in range(len(actual)) if actual[i] == predicted[i]])
total = len(actual)
accuracy = correct / total

print(f"Correct predictions: {correct}/{total}")
print(f"Accuracy: {accuracy:.2%}")
# Output: Correct predictions: 8/10
# Output: Accuracy: 80.00%

This example shows 8 correct predictions out of 10 total, yielding 80% accuracy. Simple, intuitive, and easy to communicate to stakeholders.

Manual Accuracy Calculation

Understanding how to calculate accuracy from scratch builds intuition and helps you recognize edge cases. Let’s implement accuracy calculation using basic Python operations and then optimize with NumPy.

Basic Python Implementation

def calculate_accuracy_basic(y_true, y_pred):
    """Calculate accuracy using basic Python operations."""
    if len(y_true) != len(y_pred):
        raise ValueError("Arrays must have the same length")
    
    if len(y_true) == 0:
        raise ValueError("Arrays cannot be empty")
    
    correct = 0
    for true, pred in zip(y_true, y_pred):
        if true == pred:
            correct += 1
    
    return correct / len(y_true)

# Example usage
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 1]
y_pred = [1, 0, 1, 0, 0, 1, 0, 1, 1, 1]

accuracy = calculate_accuracy_basic(y_true, y_pred)
print(f"Accuracy: {accuracy:.4f}")  # Output: Accuracy: 0.8000

NumPy-Based Implementation

NumPy provides vectorized operations that are faster and more concise:

import numpy as np

def calculate_accuracy_numpy(y_true, y_pred):
    """Calculate accuracy using NumPy for better performance."""
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)
    
    if y_true.shape != y_pred.shape:
        raise ValueError("Arrays must have the same shape")
    
    if y_true.size == 0:
        raise ValueError("Arrays cannot be empty")
    
    # Vectorized comparison
    correct = np.sum(y_true == y_pred)
    total = len(y_true)
    
    return correct / total

# Example with larger dataset
np.random.seed(42)
y_true = np.random.randint(0, 2, 1000)
y_pred = np.random.randint(0, 2, 1000)

accuracy = calculate_accuracy_numpy(y_true, y_pred)
print(f"Accuracy: {accuracy:.4f}")

The NumPy version leverages element-wise comparison (y_true == y_pred) which returns a boolean array, then np.sum() counts the True values. This approach is significantly faster for large datasets.

Using Scikit-learn’s accuracy_score()

For production code, use scikit-learn’s battle-tested implementation. It handles edge cases and integrates seamlessly with the scikit-learn ecosystem.

Basic Usage

from sklearn.metrics import accuracy_score

y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 1]
y_pred = [1, 0, 1, 0, 0, 1, 0, 1, 1, 1]

# Default: returns accuracy as a float
accuracy = accuracy_score(y_true, y_pred)
print(f"Accuracy: {accuracy:.4f}")  # Output: Accuracy: 0.8000

# Get the count of correct predictions instead
correct_count = accuracy_score(y_true, y_pred, normalize=False)
print(f"Correct predictions: {correct_count}")  # Output: Correct predictions: 8

Multi-class Classification

Accuracy works identically for multi-class problems:

from sklearn.metrics import accuracy_score

# Multi-class example: classifying digits 0-9
y_true = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2]
y_pred = [0, 1, 2, 3, 4, 5, 6, 7, 7, 9, 0, 2, 2]  # Two errors

accuracy = accuracy_score(y_true, y_pred)
print(f"Multi-class Accuracy: {accuracy:.4f}")  # Output: Multi-class Accuracy: 0.8462

Calculating Accuracy for Different ML Models

Let’s see accuracy calculation in real model evaluation workflows.

Logistic Regression with Train/Test Split

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train model
model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)

# Calculate accuracies
train_pred = model.predict(X_train)
test_pred = model.predict(X_test)

train_accuracy = accuracy_score(y_train, train_pred)
test_accuracy = accuracy_score(y_test, test_pred)

print(f"Training Accuracy: {train_accuracy:.4f}")
print(f"Testing Accuracy: {test_accuracy:.4f}")

Cross-Validation Accuracy

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score

# Create model
dt_model = DecisionTreeClassifier(random_state=42)

# Calculate cross-validated accuracy
cv_scores = cross_val_score(dt_model, X, y, cv=5, scoring='accuracy')

print(f"Cross-validation scores: {cv_scores}")
print(f"Mean CV Accuracy: {cv_scores.mean():.4f} (+/- {cv_scores.std() * 2:.4f})")

Comparing Multiple Models

from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC

models = {
    'Logistic Regression': LogisticRegression(max_iter=10000),
    'Decision Tree': DecisionTreeClassifier(random_state=42),
    'Random Forest': RandomForestClassifier(random_state=42),
    'SVM': SVC(random_state=42)
}

results = {}
for name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    results[name] = accuracy
    print(f"{name}: {accuracy:.4f}")

Understanding Accuracy Limitations

Accuracy can be dangerously misleading with imbalanced datasets. Consider a fraud detection system where only 1% of transactions are fraudulent:

import numpy as np
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Simulated imbalanced dataset: 99% legitimate (0), 1% fraud (1)
np.random.seed(42)
y_true = np.concatenate([np.zeros(990), np.ones(10)])

# Naive model that always predicts "legitimate"
y_pred_naive = np.zeros(1000)

# Slightly better model that catches some fraud
y_pred_better = y_true.copy()
y_pred_better[np.random.choice(np.where(y_true == 1)[0], 5, replace=False)] = 0

print("Naive Model (always predicts 0):")
print(f"  Accuracy: {accuracy_score(y_true, y_pred_naive):.4f}")
print(f"  Recall: {recall_score(y_true, y_pred_naive):.4f}")

print("\nBetter Model:")
print(f"  Accuracy: {accuracy_score(y_true, y_pred_better):.4f}")
print(f"  Recall: {recall_score(y_true, y_pred_better):.4f}")

The naive model achieves 99% accuracy by never detecting fraud! This demonstrates why you need multiple metrics:

from sklearn.metrics import classification_report, confusion_matrix

def evaluate_model_comprehensive(y_true, y_pred, model_name):
    """Comprehensive evaluation beyond just accuracy."""
    print(f"\n{'='*50}")
    print(f"Evaluation for: {model_name}")
    print(f"{'='*50}")
    
    print(f"\nAccuracy: {accuracy_score(y_true, y_pred):.4f}")
    print(f"\nClassification Report:")
    print(classification_report(y_true, y_pred))
    print(f"\nConfusion Matrix:")
    print(confusion_matrix(y_true, y_pred))

# Evaluate both models
evaluate_model_comprehensive(y_true, y_pred_naive, "Naive Model")
evaluate_model_comprehensive(y_true, y_pred_better, "Better Model")

Best Practices and Tips

Custom Accuracy Function with Validation

def robust_accuracy(y_true, y_pred, verbose=True):
    """Production-ready accuracy calculation with logging."""
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)
    
    # Validation
    assert y_true.shape == y_pred.shape, "Shape mismatch"
    assert len(y_true) > 0, "Empty arrays"
    
    accuracy = accuracy_score(y_true, y_pred)
    
    if verbose:
        total = len(y_true)
        correct = int(accuracy * total)
        print(f"Correct: {correct}/{total} ({accuracy:.2%})")
    
    return accuracy

Tracking Accuracy Across Training Epochs

from sklearn.neural_network import MLPClassifier

def train_with_accuracy_tracking(X_train, y_train, X_val, y_val, epochs=10):
    """Track accuracy during training."""
    train_accuracies = []
    val_accuracies = []
    
    model = MLPClassifier(hidden_layer_sizes=(100,), max_iter=1, warm_start=True)
    
    for epoch in range(epochs):
        model.fit(X_train, y_train)
        
        train_acc = accuracy_score(y_train, model.predict(X_train))
        val_acc = accuracy_score(y_val, model.predict(X_val))
        
        train_accuracies.append(train_acc)
        val_accuracies.append(val_acc)
        
        print(f"Epoch {epoch+1}: Train={train_acc:.4f}, Val={val_acc:.4f}")
    
    return train_accuracies, val_accuracies

# Usage
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
train_accs, val_accs = train_with_accuracy_tracking(X_train, y_train, X_val, y_val)

Complete Evaluation Report

def create_evaluation_report(model, X_test, y_test):
    """Generate comprehensive evaluation report."""
    from sklearn.metrics import accuracy_score, precision_recall_fscore_support
    
    y_pred = model.predict(X_test)
    
    accuracy = accuracy_score(y_test, y_pred)
    precision, recall, f1, _ = precision_recall_fscore_support(
        y_test, y_pred, average='weighted'
    )
    
    report = {
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1_score': f1,
        'total_samples': len(y_test)
    }
    
    print("\nModel Evaluation Report")
    print("-" * 40)
    for metric, value in report.items():
        if metric != 'total_samples':
            print(f"{metric.capitalize():15}: {value:.4f}")
        else:
            print(f"{metric.capitalize():15}: {value}")
    
    return report

Accuracy is your first metric, not your only metric. Calculate it efficiently, understand its limitations, and always pair it with precision, recall, and domain-specific metrics for robust model evaluation.