Test Pyramid: Unit, Integration, E2E Balance

Key Insights

The test pyramid isn’t about hitting exact ratios—it’s about optimizing for fast feedback, maintainability, and confidence in your deployments
Integration tests often provide the best ROI for existing codebases because they catch the bugs that actually break production: component boundaries and data flow
Delete tests that don’t earn their keep; a slow, flaky test suite that developers ignore is worse than a lean suite they trust

The Test Pyramid Explained

Mike Cohn introduced the test pyramid in 2009, and despite being over fifteen years old, teams still get it wrong. The concept is simple: structure your test suite like a pyramid with many unit tests at the base, fewer integration tests in the middle, and even fewer end-to-end tests at the top.

The shape isn’t arbitrary. It reflects three fundamental tradeoffs:

Speed: Unit tests run in milliseconds. Integration tests take seconds. E2E tests take minutes. When your test suite takes 45 minutes, developers stop running it locally.

Cost: Unit tests are cheap to write and maintain. E2E tests require infrastructure, test data management, and constant maintenance as your UI evolves.

Reliability: Unit tests are deterministic. E2E tests fight timing issues, network flakiness, and browser quirks. A flaky test suite erodes trust faster than no tests at all.

The pyramid optimizes for fast, reliable feedback while still validating that your system works as a whole.

Unit Tests: The Foundation

A unit test verifies a single function or class in isolation. Dependencies are mocked or stubbed. The test runs entirely in memory with no I/O.

The benefits are clear: sub-millisecond execution, precise failure localization, and the ability to test edge cases that are difficult to reproduce in a running system.

Here’s a price calculator that handles discounts, tax, and validates inputs:

# price_calculator.py
from decimal import Decimal
from typing import Optional

class InvalidPriceError(Exception):
    pass

class InvalidDiscountError(Exception):
    pass

def calculate_total(
    base_price: Decimal,
    quantity: int,
    discount_percent: Optional[Decimal] = None,
    tax_rate: Decimal = Decimal("0.08")
) -> Decimal:
    if base_price < 0:
        raise InvalidPriceError("Price cannot be negative")
    if quantity < 1:
        raise InvalidPriceError("Quantity must be at least 1")
    if discount_percent is not None and not (0 <= discount_percent <= 100):
        raise InvalidDiscountError("Discount must be between 0 and 100")
    
    subtotal = base_price * quantity
    
    if discount_percent:
        discount = subtotal * (discount_percent / 100)
        subtotal -= discount
    
    tax = subtotal * tax_rate
    return (subtotal + tax).quantize(Decimal("0.01"))

# test_price_calculator.py
import pytest
from decimal import Decimal
from price_calculator import calculate_total, InvalidPriceError, InvalidDiscountError

class TestCalculateTotal:
    def test_basic_calculation(self):
        result = calculate_total(Decimal("10.00"), 2)
        assert result == Decimal("21.60")  # (10 * 2) * 1.08
    
    def test_with_discount(self):
        result = calculate_total(
            Decimal("100.00"), 1, 
            discount_percent=Decimal("20")
        )
        assert result == Decimal("86.40")  # (100 - 20) * 1.08
    
    def test_custom_tax_rate(self):
        result = calculate_total(
            Decimal("100.00"), 1,
            tax_rate=Decimal("0.10")
        )
        assert result == Decimal("110.00")
    
    def test_negative_price_raises_error(self):
        with pytest.raises(InvalidPriceError):
            calculate_total(Decimal("-10.00"), 1)
    
    def test_zero_quantity_raises_error(self):
        with pytest.raises(InvalidPriceError):
            calculate_total(Decimal("10.00"), 0)
    
    def test_invalid_discount_raises_error(self):
        with pytest.raises(InvalidDiscountError):
            calculate_total(Decimal("10.00"), 1, discount_percent=Decimal("150"))

These tests run in under 10 milliseconds combined. When one fails, you know exactly which calculation is broken.

Integration Tests: The Middle Layer

Integration tests verify that components work together correctly. They cross boundaries—databases, external services, message queues—but don’t require the entire system to be running.

This is where most production bugs live. Your unit tests pass because each component works in isolation. But the database query returns dates in a different format than expected, or the email service rejects your payload structure.

# user_service.py
from dataclasses import dataclass
from typing import Optional

@dataclass
class User:
    id: Optional[int]
    email: str
    name: str

class UserRepository:
    def __init__(self, db_connection):
        self.db = db_connection
    
    def save(self, user: User) -> User:
        cursor = self.db.execute(
            "INSERT INTO users (email, name) VALUES (?, ?) RETURNING id",
            (user.email, user.name)
        )
        user.id = cursor.fetchone()[0]
        return user
    
    def find_by_email(self, email: str) -> Optional[User]:
        cursor = self.db.execute(
            "SELECT id, email, name FROM users WHERE email = ?",
            (email,)
        )
        row = cursor.fetchone()
        return User(id=row[0], email=row[1], name=row[2]) if row else None

class EmailService:
    def send_welcome_email(self, user: User) -> bool:
        # In production, this calls an external API
        raise NotImplementedError

class UserService:
    def __init__(self, repository: UserRepository, email_service: EmailService):
        self.repository = repository
        self.email_service = email_service
    
    def register(self, email: str, name: str) -> User:
        existing = self.repository.find_by_email(email)
        if existing:
            raise ValueError("Email already registered")
        
        user = self.repository.save(User(id=None, email=email, name=name))
        self.email_service.send_welcome_email(user)
        return user

# test_user_service_integration.py
import pytest
import sqlite3
from unittest.mock import Mock
from user_service import UserService, UserRepository, EmailService, User

@pytest.fixture
def db_connection():
    conn = sqlite3.connect(":memory:")
    conn.execute("""
        CREATE TABLE users (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            email TEXT UNIQUE NOT NULL,
            name TEXT NOT NULL
        )
    """)
    yield conn
    conn.close()

@pytest.fixture
def mock_email_service():
    service = Mock(spec=EmailService)
    service.send_welcome_email.return_value = True
    return service

@pytest.fixture
def user_service(db_connection, mock_email_service):
    repository = UserRepository(db_connection)
    return UserService(repository, mock_email_service)

class TestUserRegistration:
    def test_successful_registration_persists_user(self, user_service, db_connection):
        user = user_service.register("test@example.com", "Test User")
        
        # Verify user was persisted to database
        cursor = db_connection.execute(
            "SELECT email, name FROM users WHERE id = ?", 
            (user.id,)
        )
        row = cursor.fetchone()
        assert row == ("test@example.com", "Test User")
    
    def test_registration_sends_welcome_email(self, user_service, mock_email_service):
        user = user_service.register("test@example.com", "Test User")
        
        mock_email_service.send_welcome_email.assert_called_once()
        call_arg = mock_email_service.send_welcome_email.call_args[0][0]
        assert call_arg.email == "test@example.com"
    
    def test_duplicate_email_raises_error(self, user_service):
        user_service.register("test@example.com", "First User")
        
        with pytest.raises(ValueError, match="Email already registered"):
            user_service.register("test@example.com", "Second User")

Notice the pattern: we use a real database (SQLite in-memory) but mock the email service. This catches database interaction bugs while avoiding external dependencies.

E2E Tests: The Critical Paths

End-to-end tests validate your system from the user’s perspective. They’re expensive, slow, and prone to flakiness. Use them sparingly for critical business flows.

Here’s a Playwright test for a checkout flow:

// tests/checkout.spec.ts
import { test, expect } from '@playwright/test';

test.describe('Checkout Flow', () => {
  test.beforeEach(async ({ page }) => {
    // Seed test data via API to avoid UI setup
    await fetch(`${process.env.API_URL}/test/seed`, {
      method: 'POST',
      body: JSON.stringify({ scenario: 'checkout_test' })
    });
  });

  test('complete purchase from cart to confirmation', async ({ page }) => {
    // Login
    await page.goto('/login');
    await page.fill('[data-testid="email"]', 'test@example.com');
    await page.fill('[data-testid="password"]', 'testpassword');
    await page.click('[data-testid="login-button"]');
    await expect(page).toHaveURL('/dashboard');

    // Add item to cart
    await page.goto('/products/test-product');
    await page.click('[data-testid="add-to-cart"]');
    await expect(page.locator('[data-testid="cart-count"]')).toHaveText('1');

    // Proceed to checkout
    await page.click('[data-testid="cart-icon"]');
    await page.click('[data-testid="checkout-button"]');

    // Fill payment details (using test card)
    await page.fill('[data-testid="card-number"]', '4242424242424242');
    await page.fill('[data-testid="card-expiry"]', '12/25');
    await page.fill('[data-testid="card-cvc"]', '123');

    // Complete purchase
    await page.click('[data-testid="pay-button"]');

    // Verify confirmation
    await expect(page).toHaveURL(/\/orders\/\d+/);
    await expect(page.locator('[data-testid="order-status"]')).toHaveText('Confirmed');
    await expect(page.locator('[data-testid="order-total"]')).toBeVisible();
  });
});

This single test covers authentication, cart management, payment processing, and order creation. It’s worth the maintenance cost because checkout is your revenue stream.

Finding Your Balance

The 70/20/10 ratio (unit/integration/E2E) is a starting point, not a mandate. Your optimal balance depends on several factors:

Domain complexity: Financial calculations need exhaustive unit tests. CRUD apps benefit more from integration tests.

Team size: Small teams can’t maintain large E2E suites. Focus on integration tests that catch boundary issues.

Deployment frequency: Deploying ten times a day? You need fast tests. Weekly releases can tolerate slower suites.

Watch for anti-patterns. The “ice cream cone” (mostly E2E tests) leads to slow, flaky pipelines. The “hourglass” (unit and E2E but no integration) misses the bugs that actually break production.

Practical Implementation Strategy

For existing codebases, start with integration tests. They provide immediate value without requiring refactoring.

When code is too tangled for unit tests, use dependency injection to create seams:

# Before: Tightly coupled, untestable
class OrderProcessor:
    def process(self, order_id: int):
        order = database.query(f"SELECT * FROM orders WHERE id = {order_id}")
        if order.total > 1000:
            EmailClient().send("manager@company.com", "Large order alert")
        PaymentGateway().charge(order.customer_id, order.total)

# After: Dependencies injected, testable
class OrderProcessor:
    def __init__(self, repository, email_service, payment_gateway):
        self.repository = repository
        self.email_service = email_service
        self.payment_gateway = payment_gateway
    
    def process(self, order_id: int):
        order = self.repository.find(order_id)
        if order.total > 1000:
            self.email_service.send("manager@company.com", "Large order alert")
        self.payment_gateway.charge(order.customer_id, order.total)

Now you can unit test the business logic with mocks, and integration test the real components separately.

Measuring Test Suite Health

Track these metrics:

Execution time: If your suite takes over 10 minutes, developers won’t run it locally. Parallelize or prune.

Flakiness rate: Any test that fails randomly more than 1% of the time needs fixing or deletion. Flaky tests train developers to ignore failures.

Failure localization: When a test fails, can you identify the bug within 30 seconds? If not, your tests are too coarse.

Delete tests ruthlessly. A test that hasn’t caught a bug in two years, takes 30 seconds to run, and breaks every time you refactor the UI isn’t providing value. Remove it.

Your test suite is a living system. Review it quarterly. Add tests where bugs escape to production. Remove tests that no longer justify their cost. The goal isn’t maximum coverage—it’s maximum confidence per minute of execution time.