Test Doubles: When to Use Mock vs Stub vs Fake

Key Insights

Stubs control inputs, mocks verify outputs, fakes simulate systems—choosing the wrong type leads to brittle tests that break during refactoring or, worse, tests that pass while bugs slip through.
Default to stubs and fakes over mocks—behavior verification through mocks should be reserved for cases where the interaction itself is the requirement, not an implementation detail.
Over-mocking is the most common testing anti-pattern—when every test requires updating after internal refactoring, you’ve coupled your tests to implementation rather than behavior.

The Test Double Taxonomy

Gerard Meszaros coined the term “test double” in his book xUnit Test Patterns to describe any object that stands in for a real dependency during testing. The film industry calls them stunt doubles—our tests use test doubles.

The problem is that developers use “mock” as a catch-all term for any test double. This imprecision matters. When someone says “just mock it,” they might mean stub it, fake it, or actually mock it—three different techniques with different implications for test design.

Understanding the distinctions helps you write tests that verify the right things, remain stable during refactoring, and actually catch bugs. Let’s break down each type and when to reach for it.

Stubs: Providing Canned Answers

A stub provides predetermined responses to calls made during a test. It doesn’t verify anything—it simply returns what you tell it to return. Stubs control the indirect inputs to your system under test.

Use stubs when your code depends on external data or state that you need to control. The classic example: testing code that behaves differently based on payment processing results.

# The dependency we need to stub
class PaymentGateway:
    def charge(self, amount: Decimal, card_token: str) -> PaymentResult:
        # Real implementation hits Stripe API
        pass

# Our stub
class StubPaymentGateway:
    def __init__(self, result: PaymentResult):
        self._result = result
    
    def charge(self, amount: Decimal, card_token: str) -> PaymentResult:
        return self._result

# Testing the order service with controlled payment outcomes
class TestOrderService:
    def test_successful_payment_completes_order(self):
        gateway = StubPaymentGateway(PaymentResult(success=True, transaction_id="txn_123"))
        order_service = OrderService(payment_gateway=gateway)
        
        result = order_service.checkout(order_id="order_456", card_token="tok_visa")
        
        assert result.status == OrderStatus.COMPLETED
        assert result.transaction_id == "txn_123"
    
    def test_failed_payment_leaves_order_pending(self):
        gateway = StubPaymentGateway(PaymentResult(success=False, error="Card declined"))
        order_service = OrderService(payment_gateway=gateway)
        
        result = order_service.checkout(order_id="order_456", card_token="tok_visa")
        
        assert result.status == OrderStatus.PAYMENT_FAILED
        assert "Card declined" in result.error_message

Notice what we’re not doing: we’re not verifying that charge was called with specific arguments. We’re testing how OrderService responds to different payment outcomes. The stub is invisible infrastructure—it exists only to put the system in a testable state.

Stubs shine when testing conditional logic based on dependency responses: API success/failure, feature flags, configuration values, or time-sensitive operations (stub the clock).

Mocks: Verifying Behavior

A mock is pre-programmed with expectations about the calls it will receive. Unlike stubs, mocks verify interactions. You’re testing that your code communicates correctly with its collaborators.

Use mocks when the interaction itself is the requirement. Sending an email, publishing an event, or logging an audit trail—these are behaviors where the side effect is the point.

# Using a mocking library (pytest-mock / unittest.mock)
class TestUserRegistration:
    def test_sends_welcome_email_on_registration(self, mocker):
        # Create a mock email service
        mock_email_service = mocker.Mock(spec=EmailService)
        user_service = UserService(email_service=mock_email_service)
        
        user_service.register(
            email="alice@example.com",
            name="Alice Smith"
        )
        
        # Verify the interaction occurred with correct parameters
        mock_email_service.send_email.assert_called_once_with(
            to="alice@example.com",
            template="welcome",
            context={"name": "Alice Smith"}
        )
    
    def test_sends_verification_email_before_welcome(self, mocker):
        mock_email_service = mocker.Mock(spec=EmailService)
        user_service = UserService(email_service=mock_email_service)
        
        user_service.register(email="bob@example.com", name="Bob Jones")
        
        # Verify call order matters for this requirement
        calls = mock_email_service.send_email.call_args_list
        assert calls[0].kwargs["template"] == "verify_email"
        assert calls[1].kwargs["template"] == "welcome"

The key distinction: we’re not testing what UserService.register returns. We’re testing that it does something—specifically, that it sends the right emails. The mock lets us verify this without actually sending emails.

Be cautious with mocks. They couple your test to implementation details. If you later refactor to batch emails or use a different method signature, these tests break even if the system still works correctly.

Fakes: Working Implementations

A fake is a lightweight but functional implementation of a dependency. Unlike stubs (which return canned data) or mocks (which verify calls), fakes actually work—they just take shortcuts unsuitable for production.

The canonical example is an in-memory repository replacing a real database:

# Production implementation
class PostgresUserRepository:
    def __init__(self, connection_pool):
        self._pool = connection_pool
    
    def save(self, user: User) -> None:
        with self._pool.connection() as conn:
            conn.execute(
                "INSERT INTO users (id, email, name) VALUES (%s, %s, %s)",
                (user.id, user.email, user.name)
            )
    
    def find_by_email(self, email: str) -> User | None:
        with self._pool.connection() as conn:
            row = conn.execute(
                "SELECT id, email, name FROM users WHERE email = %s", (email,)
            ).fetchone()
            return User(**row) if row else None

# Fake implementation for testing
class InMemoryUserRepository:
    def __init__(self):
        self._users: dict[str, User] = {}
    
    def save(self, user: User) -> None:
        self._users[user.id] = user
    
    def find_by_email(self, email: str) -> User | None:
        for user in self._users.values():
            if user.email == email:
                return user
        return None

# Tests using the fake
class TestUserService:
    def test_prevents_duplicate_email_registration(self):
        repo = InMemoryUserRepository()
        existing_user = User(id="1", email="taken@example.com", name="First User")
        repo.save(existing_user)
        
        user_service = UserService(user_repository=repo, email_service=StubEmailService())
        
        with pytest.raises(DuplicateEmailError):
            user_service.register(email="taken@example.com", name="Second User")
    
    def test_can_retrieve_registered_user(self):
        repo = InMemoryUserRepository()
        user_service = UserService(user_repository=repo, email_service=StubEmailService())
        
        user_service.register(email="new@example.com", name="New User")
        
        found = repo.find_by_email("new@example.com")
        assert found is not None
        assert found.name == "New User"

Fakes provide realistic behavior without external dependencies. They’re perfect for repositories, caches, queues, and file systems. The trade-off: you must maintain them. When the real implementation gains features, the fake needs updates too.

Decision Framework: Choosing the Right Double

When facing a dependency in your test, ask these questions in order:

1. Can I use the real thing? If the dependency is fast, deterministic, and has no side effects, use it. Don’t double what doesn’t need doubling.

2. Do I need to verify an interaction? If sending a notification, publishing an event, or writing an audit log is the requirement (not an implementation detail), use a mock.

3. Do I need realistic behavior across multiple operations? If your test involves saving and retrieving, or enqueueing and processing, use a fake.

4. Do I just need to control an input? If you need the dependency to return specific data so you can test your code’s response, use a stub.

Here’s the decision as code:

# STUB: Control input to test different code paths
def test_handles_rate_limiting(self):
    api_client = StubApiClient(raises=RateLimitError())
    # Test how our code handles rate limiting

# MOCK: Verify critical interaction occurred
def test_publishes_order_event(self, mocker):
    mock_publisher = mocker.Mock(spec=EventPublisher)
    # Verify publish() called with OrderCreated event

# FAKE: Need working implementation without infrastructure
def test_order_total_calculation_with_discounts(self):
    cart_repo = InMemoryCartRepository()
    # Save items, apply discounts, verify totals

Common Anti-Patterns

The most insidious testing problem is over-mocking. Here’s a test that verifies implementation rather than behavior:

# Anti-pattern: Testing implementation details
def test_checkout_process(self, mocker):
    mock_repo = mocker.Mock(spec=OrderRepository)
    mock_repo.find_by_id.return_value = Order(id="1", items=[...])
    mock_inventory = mocker.Mock(spec=InventoryService)
    mock_inventory.check_availability.return_value = True
    mock_inventory.reserve_items.return_value = ReservationResult(success=True)
    mock_payment = mocker.Mock(spec=PaymentGateway)
    mock_payment.charge.return_value = PaymentResult(success=True)
    
    service = CheckoutService(mock_repo, mock_inventory, mock_payment)
    service.checkout("1", "tok_visa")
    
    # Brittle: testing HOW, not WHAT
    mock_repo.find_by_id.assert_called_once_with("1")
    mock_inventory.check_availability.assert_called_once()
    mock_inventory.reserve_items.assert_called_once()
    mock_payment.charge.assert_called_once()
    mock_repo.save.assert_called_once()

This test breaks if you change the order of operations, add caching, or refactor internal methods—even if checkout still works. Refactor to use fakes and test outcomes:

# Better: Test behavior with fakes
def test_successful_checkout_creates_completed_order(self):
    order_repo = InMemoryOrderRepository()
    order_repo.save(Order(id="1", status=OrderStatus.PENDING, items=[...]))
    
    service = CheckoutService(
        order_repository=order_repo,
        inventory_service=FakeInventoryService(always_available=True),
        payment_gateway=StubPaymentGateway(PaymentResult(success=True))
    )
    
    result = service.checkout("1", "tok_visa")
    
    # Test WHAT happened, not HOW
    assert result.success is True
    saved_order = order_repo.find_by_id("1")
    assert saved_order.status == OrderStatus.COMPLETED

Pragmatic Test Double Usage

Test doubles are tools, not religions. The goal is tests that catch bugs, survive refactoring, and communicate intent.

Prefer stubs for controlling inputs—they’re simple and don’t overconstrain your implementation.

Use mocks sparingly for verifying interactions that are actual requirements, not implementation details.

Invest in fakes for complex dependencies you’ll use across many tests—the maintenance cost pays off in test clarity and flexibility.

When in doubt, ask: “If I refactor the internals without changing the behavior, will this test still pass?” If the answer is no, you’ve probably over-mocked. Step back and test the outcome, not the journey.