Test Data Management: Factories and Builders

Every test suite eventually drowns in test data. It starts innocently—a few inline object creations, some copied JSON fixtures, maybe a shared setup file. Then your `User` model gains three new...

Key Insights

  • Factories provide sensible defaults so tests only specify what’s relevant to the behavior being tested, making test intent crystal clear
  • The Builder pattern shines for complex objects with many optional fields or multi-step construction, offering a fluent API that reads like a specification
  • Traits and sequences let you define meaningful variations (like “admin” or “expired”) without duplicating factory logic, keeping your test data DRY and expressive

The Test Data Problem

Every test suite eventually drowns in test data. It starts innocently—a few inline object creations, some copied JSON fixtures, maybe a shared setup file. Then your User model gains three new required fields, and suddenly forty tests break. Not because the logic changed, but because the test data was wrong.

Hardcoded test data creates invisible coupling. When you write user = User(name="John", email="john@test.com") in every test file, you’ve coupled each test to the current shape of User. Add a required organization_id field? Every single instantiation breaks.

Worse, hardcoded data obscures test intent. When a test creates a user with fifteen fields specified, which ones actually matter for this test? Is the email important, or just noise? You can’t tell without reading the entire test carefully.

The solution is indirection: define how to create valid objects in one place, then let tests override only what matters.

Factory Pattern Fundamentals

A factory is a function or class that creates objects with sensible defaults. Tests call the factory, optionally overriding specific fields, and get back a valid object.

Here’s the simplest possible factory in Python:

def create_user(**overrides):
    defaults = {
        "name": "Test User",
        "email": "test@example.com",
        "is_active": True,
        "created_at": datetime.now(),
    }
    return User(**{**defaults, **overrides})

Now tests become expressive:

def test_inactive_users_cannot_login():
    user = create_user(is_active=False)
    assert not user.can_login()

def test_user_email_validation():
    user = create_user(email="invalid-email")
    assert not user.is_valid()

Each test specifies exactly what matters. The first test cares about is_active. The second cares about email format. Neither needs to know about created_at or any other field.

The JavaScript equivalent follows the same pattern:

function createUser(overrides = {}) {
  return {
    id: crypto.randomUUID(),
    name: "Test User",
    email: "test@example.com",
    isActive: true,
    createdAt: new Date(),
    ...overrides,
  };
}

// Usage
const adminUser = createUser({ role: "admin" });
const suspendedUser = createUser({ isActive: false, suspendedAt: new Date() });

This pattern works for any language. The key insight is that defaults should produce a valid, typical object. Edge cases come from overrides.

Builder Pattern for Complex Objects

Factories work well for simple objects, but some entities have complex construction requirements. An e-commerce order might have line items, shipping details, discounts, and payment information. A simple override dictionary becomes unwieldy.

The Builder pattern provides a fluent interface for step-by-step construction:

class OrderBuilder:
    def __init__(self):
        self._customer_id = "cust_default"
        self._items = []
        self._shipping = {"method": "standard", "address": "123 Test St"}
        self._discounts = []
        self._status = "pending"
    
    def for_customer(self, customer_id):
        self._customer_id = customer_id
        return self
    
    def with_item(self, product_id, quantity=1, price=10.00):
        self._items.append({
            "product_id": product_id,
            "quantity": quantity,
            "unit_price": price
        })
        return self
    
    def with_shipping(self, method, address=None):
        self._shipping = {
            "method": method,
            "address": address or self._shipping["address"]
        }
        return self
    
    def with_discount(self, code, amount):
        self._discounts.append({"code": code, "amount": amount})
        return self
    
    def as_completed(self):
        self._status = "completed"
        return self
    
    def build(self):
        return Order(
            customer_id=self._customer_id,
            items=self._items,
            shipping=self._shipping,
            discounts=self._discounts,
            status=self._status
        )

Now complex test scenarios read like specifications:

def test_free_shipping_on_large_orders():
    order = (OrderBuilder()
        .for_customer("cust_123")
        .with_item("prod_a", quantity=5, price=50.00)
        .with_item("prod_b", quantity=3, price=30.00)
        .with_shipping("express")
        .build())
    
    assert order.qualifies_for_free_shipping()

def test_discount_stacking():
    order = (OrderBuilder()
        .with_item("prod_a", price=100.00)
        .with_discount("SAVE10", 10.00)
        .with_discount("LOYALTY", 5.00)
        .build())
    
    assert order.total == 85.00

Builders excel when object construction has conditional logic or when you need to build object graphs with relationships.

Factory Libraries in Practice

Hand-rolled factories work, but mature codebases benefit from dedicated libraries. These provide sequences (for unique values), traits (for variations), lazy evaluation, and database integration.

Factory Boy is the gold standard for Python. Here’s a realistic example:

import factory
from factory import fuzzy
from myapp.models import User, Organization

class OrganizationFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = Organization
    
    name = factory.Sequence(lambda n: f"Organization {n}")
    slug = factory.LazyAttribute(lambda obj: obj.name.lower().replace(" ", "-"))
    plan = "free"

class UserFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = User
    
    email = factory.Sequence(lambda n: f"user{n}@example.com")
    name = factory.Faker("name")
    organization = factory.SubFactory(OrganizationFactory)
    is_active = True
    
    class Params:
        premium = factory.Trait(
            organization=factory.SubFactory(
                OrganizationFactory, 
                plan="premium"
            )
        )
        admin = factory.Trait(
            is_staff=True,
            is_superuser=True
        )

# Usage
regular_user = UserFactory()
admin_user = UserFactory(admin=True)
premium_user = UserFactory(premium=True)
specific_org_user = UserFactory(organization__name="Acme Corp")

For TypeScript, Fishery provides similar capabilities:

import { Factory } from "fishery";
import { User, Organization } from "./types";

const organizationFactory = Factory.define<Organization>(() => ({
  id: crypto.randomUUID(),
  name: "Test Organization",
  plan: "free",
}));

const userFactory = Factory.define<User>(({ sequence, associations }) => ({
  id: crypto.randomUUID(),
  email: `user${sequence}@example.com`,
  name: "Test User",
  organization: associations.organization || organizationFactory.build(),
  isActive: true,
}));

// Usage
const user = userFactory.build();
const premiumUser = userFactory.build({
  organization: organizationFactory.build({ plan: "premium" }),
});

Handling Relationships and State

Real applications have object graphs. A blog post has an author and comments. An order has a customer, line items, and a shipping address. Factories must handle these relationships cleanly.

class AuthorFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = Author
    
    name = factory.Faker("name")
    bio = factory.Faker("paragraph")

class CommentFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = Comment
    
    post = factory.SubFactory("tests.factories.PostFactory")
    author_name = factory.Faker("name")
    content = factory.Faker("sentence")
    created_at = factory.LazyFunction(datetime.now)

class PostFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = Post
    
    title = factory.Sequence(lambda n: f"Post Title {n}")
    content = factory.Faker("paragraphs", nb=3)
    author = factory.SubFactory(AuthorFactory)
    status = "draft"
    
    @factory.post_generation
    def comments(self, create, extracted, **kwargs):
        if not create:
            return
        if extracted:
            for comment in extracted:
                self.comments.add(comment)
        elif kwargs.get("count"):
            CommentFactory.create_batch(
                kwargs["count"], 
                post=self
            )

# Create post with 5 comments
post = PostFactory(comments__count=5)

# Create post with specific comments
comments = CommentFactory.build_batch(3, author_name="John")
post = PostFactory(comments=comments)

The key distinction is build() versus create(). Building creates in-memory objects; creating persists to the database. Use build() for unit tests, create() for integration tests that need database state.

Traits, Sequences, and Variations

Traits define named variations that you can combine. This is more maintainable than creating separate factories for each variation.

class SubscriptionFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = Subscription
    
    user = factory.SubFactory(UserFactory)
    plan = "basic"
    status = "active"
    started_at = factory.LazyFunction(datetime.now)
    expires_at = factory.LazyAttribute(
        lambda obj: obj.started_at + timedelta(days=30)
    )
    
    class Params:
        expired = factory.Trait(
            status="expired",
            expires_at=factory.LazyFunction(
                lambda: datetime.now() - timedelta(days=1)
            )
        )
        trial = factory.Trait(
            plan="trial",
            expires_at=factory.LazyAttribute(
                lambda obj: obj.started_at + timedelta(days=14)
            )
        )
        premium = factory.Trait(plan="premium")
        annual = factory.Trait(
            expires_at=factory.LazyAttribute(
                lambda obj: obj.started_at + timedelta(days=365)
            )
        )

# Combine traits freely
expired_trial = SubscriptionFactory(expired=True, trial=True)
premium_annual = SubscriptionFactory(premium=True, annual=True)

Sequences generate unique values automatically, preventing collision errors in tests that create multiple objects.

Anti-Patterns to Avoid

Over-specified factories include too many fields in defaults. If your factory sets twenty fields, tests become coupled to all of them. Stick to the minimum required for a valid object.

Hidden dependencies occur when factories implicitly create related objects. If UserFactory always creates an Organization, tests might unknowingly depend on that organization existing. Make dependencies explicit or optional.

External service calls in factories are a testing nightmare. Never let a factory hit an API, send an email, or trigger webhooks. Mock external dependencies at the factory level if needed.

The “god factory” tries to handle every possible variation through parameters. Instead of create_user(admin=True, premium=True, expired=True, trial=True), use traits that can be combined or create separate specialized factories.

Shared mutable state between tests causes flaky failures. Each test should create its own data. If you’re using database factories, ensure proper test isolation through transactions or truncation.

Keep factories close to the models they create. When a model changes, the factory should be updated in the same commit. This is the whole point—centralizing test data creation so changes propagate automatically.

Factories and builders aren’t just conveniences. They’re essential infrastructure for a maintainable test suite. Invest in them early, maintain them rigorously, and your tests will remain readable and resilient as your codebase grows.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.