Test-Driven Development: Red-Green-Refactor Cycle

Key Insights

TDD isn’t about testing—it’s a design discipline that forces you to think about interfaces and behavior before implementation, resulting in more modular, testable code.
The “green” phase demands discipline: write the absolute minimum code to pass the test, even if it feels wrong. This prevents over-engineering and keeps you focused on actual requirements.
Skipping the refactor step is the most common TDD failure. Without it, you accumulate technical debt faster than if you hadn’t used TDD at all.

Introduction to TDD

Test-Driven Development is a software development practice where you write a failing test before writing the production code that makes it pass. Kent Beck formalized TDD as part of Extreme Programming in the late 1990s, though the idea of writing tests first predates XP.

Here’s the counterintuitive truth about TDD: it’s not primarily a testing technique. It’s a design technique that happens to produce tests as a byproduct. When you write a test first, you’re forced to think about how your code will be used before you think about how it will be implemented. This subtle shift changes everything.

Writing tests first means you design your API from the consumer’s perspective. You can’t write a test for code that’s impossible to test, so TDD naturally pushes you toward loosely coupled, highly cohesive modules. The tests are almost secondary—the real value is in the design pressure.

The Red-Green-Refactor Cycle Explained

TDD follows a strict three-phase cycle:

Red: Write a test that fails. This test describes behavior you want but haven’t implemented yet. The test must fail—if it passes immediately, either your test is wrong or the behavior already exists.

Green: Write the minimum code necessary to make the test pass. Not elegant code. Not complete code. Just enough to turn that red test green.

Refactor: Clean up both your test and production code while keeping all tests passing. This is where you remove duplication, improve names, and extract abstractions.

Let’s walk through a complete example. We’re building a discount calculator for an e-commerce system:

# RED PHASE: Write a failing test
import pytest
from discount import calculate_discount

def test_no_discount_for_small_orders():
    assert calculate_discount(total=50, customer_type="regular") == 0

Run this test. It fails because discount.py doesn’t exist. Good—that’s red.

# GREEN PHASE: Minimal code to pass
# discount.py

def calculate_discount(total, customer_type):
    return 0

The test passes. Yes, this implementation is incomplete. That’s the point. We’ve proven our test works by seeing it fail, and now we have a green baseline.

# RED PHASE: Add another test
def test_ten_percent_discount_for_orders_over_100():
    assert calculate_discount(total=200, customer_type="regular") == 20

This fails. Now make it pass:

# GREEN PHASE: Extend the implementation
def calculate_discount(total, customer_type):
    if total > 100:
        return total * 0.10
    return 0

Green again. Now we add premium customers:

def test_twenty_percent_discount_for_premium_customers():
    assert calculate_discount(total=200, customer_type="premium") == 40

# GREEN PHASE
def calculate_discount(total, customer_type):
    if customer_type == "premium":
        return total * 0.20
    if total > 100:
        return total * 0.10
    return 0

Now the refactor phase. The code works but has magic numbers and unclear logic:

# REFACTOR PHASE
PREMIUM_DISCOUNT_RATE = 0.20
STANDARD_DISCOUNT_RATE = 0.10
MINIMUM_ORDER_FOR_DISCOUNT = 100

def calculate_discount(total, customer_type):
    if customer_type == "premium":
        return total * PREMIUM_DISCOUNT_RATE
    
    if total > MINIMUM_ORDER_FOR_DISCOUNT:
        return total * STANDARD_DISCOUNT_RATE
    
    return 0

All tests still pass. The code is cleaner. Cycle complete.

Writing Effective Failing Tests (Red Phase)

The red phase is where most TDD practitioners struggle. The key principle: test behavior, not implementation.

Bad tests verify that specific methods were called or that internal state changed in specific ways. Good tests verify that given certain inputs, you get expected outputs or side effects.

Let’s build a UserValidator class:

# Start with the simplest possible case
import pytest
from user_validator import UserValidator

def test_empty_username_is_invalid():
    validator = UserValidator()
    result = validator.validate("")
    
    assert result.is_valid is False
    assert "username" in result.errors

Notice what this test does and doesn’t do:

It tests observable behavior (validation result)
It doesn’t test how validation happens internally
It uses a clear, descriptive name
It follows Arrange-Act-Assert structure

Here’s a bad version of the same test:

# DON'T DO THIS
def test_validate_calls_check_empty():
    validator = UserValidator()
    validator.validate("")
    
    # Testing implementation details
    assert validator._check_empty_called is True

This test will break if you refactor the internals, even if the behavior remains correct. That’s a maintenance nightmare.

Add tests incrementally, one behavior at a time:

def test_username_with_spaces_only_is_invalid():
    validator = UserValidator()
    result = validator.validate("   ")
    
    assert result.is_valid is False

def test_username_too_short_is_invalid():
    validator = UserValidator()
    result = validator.validate("ab")
    
    assert result.is_valid is False
    assert "at least 3 characters" in result.errors["username"]

def test_valid_username_passes():
    validator = UserValidator()
    result = validator.validate("john_doe")
    
    assert result.is_valid is True
    assert result.errors == {}

Making Tests Pass (Green Phase)

The green phase requires discipline. Your instinct will be to write “proper” code immediately. Resist it.

The “fake it till you make it” technique means you can hardcode values to pass tests initially:

# First test: empty username invalid
class ValidationResult:
    def __init__(self, is_valid, errors=None):
        self.is_valid = is_valid
        self.errors = errors or {}

class UserValidator:
    def validate(self, username):
        # Fake it!
        return ValidationResult(False, {"username": "Required"})

This passes the first test. When the “valid username passes” test fails, you’re forced to add real logic:

class UserValidator:
    def validate(self, username):
        errors = {}
        
        if not username or not username.strip():
            errors["username"] = "Required"
        elif len(username.strip()) < 3:
            errors["username"] = "Must be at least 3 characters"
        
        return ValidationResult(
            is_valid=len(errors) == 0,
            errors=errors
        )

The tests drove you from a fake implementation to a real one. Each test added forced the implementation to become more complete.

Refactoring with Confidence

Refactoring is the step most developers skip, and it’s the most important one. Without refactoring, TDD produces working but messy code.

Common refactoring patterns during TDD:

Extract Method: When you see logic that could be named:

# Before
class UserValidator:
    def validate(self, username):
        errors = {}
        if not username or not username.strip():
            errors["username"] = "Required"
        elif len(username.strip()) < 3:
            errors["username"] = "Must be at least 3 characters"
        elif not username.isalnum() and "_" not in username:
            errors["username"] = "Only alphanumeric and underscore allowed"
        return ValidationResult(is_valid=len(errors) == 0, errors=errors)

# After
class UserValidator:
    MIN_LENGTH = 3
    
    def validate(self, username):
        errors = {}
        error = self._validate_username(username)
        if error:
            errors["username"] = error
        return ValidationResult(is_valid=len(errors) == 0, errors=errors)
    
    def _validate_username(self, username):
        cleaned = username.strip() if username else ""
        
        if not cleaned:
            return "Required"
        if len(cleaned) < self.MIN_LENGTH:
            return f"Must be at least {self.MIN_LENGTH} characters"
        if not self._is_valid_format(cleaned):
            return "Only alphanumeric and underscore allowed"
        return None
    
    def _is_valid_format(self, value):
        return all(c.isalnum() or c == "_" for c in value)

Run your tests after every small refactoring. If they pass, continue. If they fail, you’ve broken something—undo and try again.

Common TDD Pitfalls

Testing implementation details: If your test breaks when you refactor internals without changing behavior, you’re testing implementation. Test inputs and outputs, not how the sausage is made.

Skipping the refactor step: “I’ll clean it up later” is a lie. Later never comes. Refactor immediately while context is fresh.

Writing too many tests at once: One failing test at a time. If you write five tests before any code, you’ll be tempted to write code that passes all five at once, skipping the incremental design benefits.

Testing trivial code: Don’t write tests for getters, setters, or code with no logic. TDD isn’t about 100% coverage—it’s about design feedback.

Integrating TDD into Your Workflow

TDD pairs naturally with CI/CD. Your pipeline should run tests on every commit, giving you immediate feedback when something breaks. Fast tests are essential—if your test suite takes 20 minutes, you won’t run it frequently enough.

When does TDD add value? For business logic, algorithms, and any code where correctness matters. When is it overhead? For exploratory prototypes, UI layouts, or code you’ll throw away.

To practice TDD, try coding katas: small, repeatable exercises like the Bowling Game, Roman Numerals, or String Calculator. Sites like Exercism and Codewars offer structured practice.

TDD isn’t dogma. It’s a tool. Use it when it helps you write better code faster. Skip it when it doesn’t. But learn it properly first—you can’t make an informed decision about when to break the rules until you understand why the rules exist.