Mocking: Stubs, Mocks, Fakes, and Spies

Key Insights

Test doubles (stubs, mocks, fakes, and spies) solve different problems—choosing the wrong one leads to brittle tests that break on implementation changes rather than actual bugs.
Mocks verify behavior (was this method called?), while stubs provide canned data—conflating these concepts is the root cause of most over-mocking problems.
Fakes are underutilized; an in-memory implementation often produces more maintainable tests than a web of mock expectations.

Why Test Doubles Matter

Every non-trivial application has dependencies. Your code talks to databases, sends emails, processes payments, and calls external APIs. Testing this code in isolation requires replacing these dependencies with something you control. Enter test doubles.

The term “test double” comes from Gerard Meszaros’s book xUnit Test Patterns. Like a stunt double in film, a test double stands in for a real component during testing. But not all test doubles are created equal. Meszaros identified distinct categories—stubs, mocks, fakes, and spies—each serving different purposes.

Most developers use “mock” as a catch-all term, but this imprecision causes real problems. When you understand the distinctions, you write tests that are more expressive, less brittle, and actually test what matters.

Stubs: Providing Canned Answers

A stub provides pre-programmed responses to method calls. It doesn’t verify anything—it just returns data your test needs to proceed.

Use stubs when you need to isolate your system under test from a dependency that provides data. The stub removes uncertainty: instead of wondering what the payment gateway might return, you dictate the response.

# The dependency we want to stub
class PaymentGateway:
    def charge(self, amount: float, card_token: str) -> dict:
        # Real implementation calls external API
        pass

# A stub implementation
class StubPaymentGateway:
    def __init__(self, should_succeed: bool = True):
        self.should_succeed = should_succeed
    
    def charge(self, amount: float, card_token: str) -> dict:
        if self.should_succeed:
            return {"status": "success", "transaction_id": "txn_123"}
        return {"status": "failed", "error": "insufficient_funds"}

# The class under test
class OrderProcessor:
    def __init__(self, payment_gateway: PaymentGateway):
        self.payment_gateway = payment_gateway
    
    def process_order(self, order: dict) -> str:
        result = self.payment_gateway.charge(order["total"], order["card_token"])
        if result["status"] == "success":
            return f"Order confirmed: {result['transaction_id']}"
        return f"Payment failed: {result['error']}"

# Tests using stubs
def test_successful_payment():
    gateway = StubPaymentGateway(should_succeed=True)
    processor = OrderProcessor(gateway)
    
    result = processor.process_order({"total": 99.99, "card_token": "tok_abc"})
    
    assert "Order confirmed" in result

def test_failed_payment():
    gateway = StubPaymentGateway(should_succeed=False)
    processor = OrderProcessor(gateway)
    
    result = processor.process_order({"total": 99.99, "card_token": "tok_abc"})
    
    assert "Payment failed" in result

Notice what the stub doesn’t do: it doesn’t verify that charge() was called, or that it received the correct arguments. The stub’s job is purely to provide data. The assertions focus on the behavior of OrderProcessor, not on how it interacts with the gateway.

Mocks: Verifying Behavior

A mock is an object pre-programmed with expectations. Unlike a stub, a mock verifies that specific interactions occurred. Use mocks when the interaction itself is the behavior you’re testing.

The classic example is notification systems. When a user signs up, you want to verify that a welcome email gets sent—not just that the signup succeeded.

from unittest.mock import Mock, call

class EmailService:
    def send_email(self, to: str, subject: str, body: str) -> None:
        # Real implementation sends email via SMTP/API
        pass

class UserRegistration:
    def __init__(self, email_service: EmailService):
        self.email_service = email_service
    
    def register(self, email: str, name: str) -> dict:
        # Create user in database (simplified)
        user_id = "user_123"
        
        # Send welcome email
        self.email_service.send_email(
            to=email,
            subject="Welcome to Our Platform",
            body=f"Hello {name}, thanks for signing up!"
        )
        
        return {"user_id": user_id, "email": email}

def test_registration_sends_welcome_email():
    # Create a mock with expectations
    mock_email_service = Mock(spec=EmailService)
    registration = UserRegistration(mock_email_service)
    
    registration.register("alice@example.com", "Alice")
    
    # Verify the interaction occurred with correct arguments
    mock_email_service.send_email.assert_called_once_with(
        to="alice@example.com",
        subject="Welcome to Our Platform",
        body="Hello Alice, thanks for signing up!"
    )

The test doesn’t care what send_email returns. It cares that the method was called with the right arguments. This is behavioral verification—the defining characteristic of mocks.

Fakes: Working Implementations

A fake is a fully functional implementation that takes shortcuts unsuitable for production. The canonical example is an in-memory database replacing PostgreSQL.

Fakes shine when you need realistic behavior without external dependencies. They’re more work to create than stubs or mocks, but they pay dividends in test reliability.

from typing import Optional
from dataclasses import dataclass

@dataclass
class User:
    id: str
    email: str
    name: str

class UserRepository:
    def save(self, user: User) -> None:
        raise NotImplementedError
    
    def find_by_id(self, user_id: str) -> Optional[User]:
        raise NotImplementedError
    
    def find_by_email(self, email: str) -> Optional[User]:
        raise NotImplementedError

# Fake implementation
class InMemoryUserRepository(UserRepository):
    def __init__(self):
        self._users: dict[str, User] = {}
    
    def save(self, user: User) -> None:
        self._users[user.id] = user
    
    def find_by_id(self, user_id: str) -> Optional[User]:
        return self._users.get(user_id)
    
    def find_by_email(self, email: str) -> Optional[User]:
        for user in self._users.values():
            if user.email == email:
                return user
        return None

class UserService:
    def __init__(self, repository: UserRepository):
        self.repository = repository
    
    def create_user(self, email: str, name: str) -> User:
        existing = self.repository.find_by_email(email)
        if existing:
            raise ValueError("Email already registered")
        
        user = User(id=f"user_{len(email)}", email=email, name=name)
        self.repository.save(user)
        return user

def test_cannot_register_duplicate_email():
    repo = InMemoryUserRepository()
    service = UserService(repo)
    
    service.create_user("alice@example.com", "Alice")
    
    try:
        service.create_user("alice@example.com", "Alice Smith")
        assert False, "Should have raised ValueError"
    except ValueError as e:
        assert "already registered" in str(e)

def test_can_find_user_after_creation():
    repo = InMemoryUserRepository()
    service = UserService(repo)
    
    created = service.create_user("bob@example.com", "Bob")
    found = repo.find_by_id(created.id)
    
    assert found is not None
    assert found.email == "bob@example.com"

The fake repository behaves like a real database for these tests. You can save users, query them, and test edge cases like duplicate detection. No mock setup, no stubbed return values—just working code.

Spies: Recording Interactions

A spy wraps a real object and records method calls for later verification. Unlike mocks, spies don’t replace behavior by default—they observe it.

Spies are useful when you want the real implementation to run but need to verify it was called correctly. Logging is a perfect use case.

from unittest.mock import patch, call

class Logger:
    def info(self, message: str) -> None:
        print(f"INFO: {message}")
    
    def error(self, message: str) -> None:
        print(f"ERROR: {message}")

class DataProcessor:
    def __init__(self, logger: Logger):
        self.logger = logger
    
    def process(self, data: list[int]) -> int:
        self.logger.info(f"Processing {len(data)} items")
        
        if not data:
            self.logger.error("Empty data received")
            return 0
        
        result = sum(data)
        self.logger.info(f"Processing complete, result: {result}")
        return result

def test_processor_logs_operations():
    logger = Logger()
    processor = DataProcessor(logger)
    
    # Spy on the logger methods
    with patch.object(logger, 'info', wraps=logger.info) as spy_info:
        result = processor.process([1, 2, 3, 4, 5])
        
        assert result == 15
        # Verify logging happened without changing behavior
        assert spy_info.call_count == 2
        spy_info.assert_any_call("Processing 5 items")
        spy_info.assert_any_call("Processing complete, result: 15")

def test_processor_logs_error_on_empty_data():
    logger = Logger()
    processor = DataProcessor(logger)
    
    with patch.object(logger, 'error', wraps=logger.error) as spy_error:
        result = processor.process([])
        
        assert result == 0
        spy_error.assert_called_once_with("Empty data received")

The wraps parameter is key—it tells the spy to call the real method while still recording the interaction. The actual logging still happens; we’re just watching.

Choosing the Right Test Double

Here’s a decision framework:

Situation	Use
Need specific return values from a dependency	Stub
Need to verify a method was called correctly	Mock
Dependency has complex behavior worth preserving	Fake
Want real behavior but need to verify calls	Spy

The most common mistake is over-mocking. When every dependency is mocked, tests become coupled to implementation details. Change how two internal classes communicate, and dozens of tests break—even though the feature still works.

Here’s an over-mocked test:

def test_checkout_overmocked():
    mock_cart = Mock()
    mock_cart.get_items.return_value = [{"id": 1, "price": 10}]
    mock_cart.get_total.return_value = 10
    
    mock_inventory = Mock()
    mock_inventory.check_availability.return_value = True
    mock_inventory.reserve_items.return_value = True
    
    mock_payment = Mock()
    mock_payment.charge.return_value = {"status": "success"}
    
    checkout = CheckoutService(mock_cart, mock_inventory, mock_payment)
    result = checkout.process()
    
    # Brittle assertions tied to implementation
    mock_cart.get_items.assert_called_once()
    mock_inventory.check_availability.assert_called_once()
    mock_inventory.reserve_items.assert_called_once()
    mock_payment.charge.assert_called_once_with(10)

This test will break if you refactor CheckoutService to call get_total() before get_items(), even though the behavior is identical. Better approach: use fakes for cart and inventory, mock only the payment gateway (the true external dependency), and assert on outcomes.

Practical Guidelines

Start with these rules of thumb:

Prefer fakes for repositories and data stores. The investment pays off in test clarity.
Use mocks for external service calls where the interaction is what you’re testing (notifications, analytics events).
Use stubs when you just need data to flow through your code.
Use spies sparingly—usually for observing cross-cutting concerns like logging.
If a test has more than two mocks, reconsider your design. Either the code has too many dependencies, or you’re testing at the wrong level.

Popular mocking libraries to explore:

Python: unittest.mock, pytest-mock
Java: Mockito, EasyMock
JavaScript: Jest (built-in), Sinon.js
Go: testify/mock, gomock

The goal isn’t to eliminate dependencies from tests—it’s to control them. Understanding the difference between stubs, mocks, fakes, and spies gives you precision. Use that precision to write tests that catch bugs without handcuffing your ability to refactor.