DRY Principle: Don't Repeat Yourself

Key Insights

DRY is about eliminating knowledge duplication, not just identical code—two pieces of code that look different might violate DRY, while identical code might not.
The “Rule of Three” provides a practical threshold: tolerate duplication twice, but extract on the third occurrence when patterns become clear.
The wrong abstraction is far more expensive than duplication—when in doubt, keep code separate until the right abstraction reveals itself.

What is DRY?

DRY—Don’t Repeat Yourself—originates from Andy Hunt and Dave Thomas’s The Pragmatic Programmer, where they define it as: “Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.”

Notice what this definition doesn’t say. It doesn’t say “never write similar-looking code.” It doesn’t say “extract everything into functions.” The principle targets knowledge duplication, not syntactic similarity.

Consider two functions that both contain price * 0.08. If one calculates sales tax and the other calculates a completely unrelated commission rate that happens to be 8%, these aren’t DRY violations—they represent different pieces of knowledge that coincidentally share a value. But if you have the sales tax rate hardcoded in three different files? That’s a DRY violation, even if the surrounding code looks completely different.

This distinction matters because misunderstanding DRY leads developers to create abstractions that couple unrelated concepts, making systems harder to change rather than easier.

Why Repetition is Costly

When knowledge lives in multiple places, every change becomes a scavenger hunt. You update the business logic in one location, ship to production, and discover three days later that the same logic existed elsewhere—now inconsistent and generating incorrect data.

Consider this scenario: your e-commerce platform calculates order totals in the checkout flow, the order history display, and the admin refund calculator. Each implementation handles tax slightly differently:

# checkout.py
def calculate_checkout_total(items, state):
    subtotal = sum(item.price * item.quantity for item in items)
    tax = subtotal * 0.08 if state == "CA" else subtotal * 0.06
    return subtotal + tax

# order_history.py
def get_order_display_total(order):
    subtotal = sum(line.price * line.qty for line in order.lines)
    tax_rate = 0.08 if order.shipping_state == "CA" else 0.06
    return subtotal + (subtotal * tax_rate)

# admin_refunds.py
def calculate_refund_amount(order):
    subtotal = sum(item.unit_price * item.quantity for item in order.items)
    # Bug: forgot to handle state-specific tax
    tax = subtotal * 0.07
    return subtotal + tax

Three implementations, three opportunities for inconsistency. When California changes its tax rate, you’ll update checkout, probably remember order history, and almost certainly miss the admin tool.

The fix centralizes this knowledge:

# pricing.py
class TaxCalculator:
    RATES = {
        "CA": 0.08,
        "NY": 0.08,
        "TX": 0.0625,
    }
    DEFAULT_RATE = 0.06
    
    @classmethod
    def calculate_tax(cls, subtotal: float, state: str) -> float:
        rate = cls.RATES.get(state, cls.DEFAULT_RATE)
        return subtotal * rate

def calculate_order_total(items, state: str) -> float:
    subtotal = sum(item.price * item.quantity for item in items)
    tax = TaxCalculator.calculate_tax(subtotal, state)
    return subtotal + tax

Now tax logic lives in one place. Rate changes require one edit. State-specific rules have one home.

Recognizing DRY Violations

Obvious duplication—copy-pasted blocks—is easy to spot. Subtle duplication hides better. Watch for these patterns:

Magic numbers scattered throughout code: The same timeout, threshold, or limit appearing in multiple files without a named constant.

Parallel data structures: A list of country codes in your validation logic, another in your dropdown component, another in your API serializer.

Repeated conditionals: The same if user.role == "admin" check appearing in dozens of places instead of an is_admin() method.

Structural duplication with cosmetic differences:

// These functions encode the same knowledge
function validateUserEmail(email) {
    if (!email) return { valid: false, error: "Email required" };
    if (!email.includes("@")) return { valid: false, error: "Invalid email" };
    if (email.length > 255) return { valid: false, error: "Email too long" };
    return { valid: true, error: null };
}

function checkContactEmail(contactEmail) {
    if (!contactEmail) return { success: false, message: "Email required" };
    if (contactEmail.indexOf("@") === -1) return { success: false, message: "Invalid email" };
    if (contactEmail.length > 255) return { success: false, message: "Email too long" };
    return { success: true, message: null };
}

Different variable names, different property names in the return object, includes vs indexOf—but identical knowledge. When email validation rules change, both functions need updating.

Practical DRY Techniques

Function extraction is the most common technique. When you see repeated logic, pull it into a function with a clear name that describes what it does, not how.

Constants and configuration eliminate magic values:

# Before: magic numbers everywhere
if retry_count > 3:
    if timeout > 30:
        
# After: named constants
MAX_RETRIES = 3
REQUEST_TIMEOUT_SECONDS = 30

if retry_count > MAX_RETRIES:
    if timeout > REQUEST_TIMEOUT_SECONDS:

Composition over inheritance often provides cleaner DRY solutions. Rather than creating deep inheritance hierarchies, compose behaviors:

# Reusable validator components
class EmailValidator:
    def validate(self, value: str) -> list[str]:
        errors = []
        if not value:
            errors.append("Email is required")
        elif "@" not in value:
            errors.append("Invalid email format")
        return errors

class LengthValidator:
    def __init__(self, max_length: int):
        self.max_length = max_length
    
    def validate(self, value: str) -> list[str]:
        if value and len(value) > self.max_length:
            return [f"Must be {self.max_length} characters or less"]
        return []

# Compose validators as needed
class UserEmailField:
    validators = [EmailValidator(), LengthValidator(255)]
    
    def validate(self, value: str) -> list[str]:
        errors = []
        for validator in self.validators:
            errors.extend(validator.validate(value))
        return errors

DRY Across Layers

DRY violations often span architectural boundaries. Your TypeScript frontend defines an interface, your Python backend defines a dataclass, and your OpenAPI spec defines a schema—all representing the same entity.

Code generation establishes a single source of truth:

# openapi.yaml - the single source of truth
components:
  schemas:
    User:
      type: object
      required:
        - id
        - email
      properties:
        id:
          type: integer
        email:
          type: string
          format: email
        name:
          type: string
          nullable: true

Generate types from this spec:

# Generate TypeScript types
npx openapi-typescript openapi.yaml -o src/types/api.ts

# Generate Python models
datamodel-codegen --input openapi.yaml --output models.py

Now your API contract lives in one place. Frontend and backend types stay synchronized automatically.

Similar patterns apply to:

Database schemas generating ORM models
Protobuf definitions generating client/server code
JSON Schema generating validation logic

When DRY Goes Wrong

DRY becomes dangerous when developers extract abstractions prematurely. Consider this over-abstracted “solution”:

# Over-DRY'd: a "flexible" handler that's impossible to understand
class GenericEntityProcessor:
    def __init__(self, entity_type, pre_hooks=None, post_hooks=None, 
                 validators=None, transformers=None, error_strategy="raise"):
        self.entity_type = entity_type
        self.pre_hooks = pre_hooks or []
        self.post_hooks = post_hooks or []
        self.validators = validators or []
        self.transformers = transformers or []
        self.error_strategy = error_strategy
    
    def process(self, data, context=None, **kwargs):
        context = context or {}
        for hook in self.pre_hooks:
            data = hook(data, context, **kwargs)
        for validator in self.validators:
            if not validator.validate(data, context, self.entity_type):
                if self.error_strategy == "raise":
                    raise ValidationError(validator.get_errors())
                elif self.error_strategy == "collect":
                    context.setdefault("errors", []).extend(validator.get_errors())
        # ... 50 more lines of "flexible" processing

This abstraction handles users, orders, and products—poorly. When you need to add user-specific logic, you’re fighting the abstraction. When debugging, you’re tracing through layers of indirection. The “duplication” it eliminated was probably 20 lines across three simple, readable functions.

The Rule of Three provides guidance: tolerate duplication the first two times you encounter it. On the third occurrence, patterns become clear, and you can extract a well-designed abstraction. Premature extraction locks in assumptions before you understand the problem.

WET (Write Everything Twice) formalizes this as intentional practice. Duplication twice is acceptable; it’s cheap insurance against premature abstraction.

Balancing DRY with Pragmatism

Sandi Metz captured the tradeoff perfectly: “Duplication is far cheaper than the wrong abstraction.”

When you encounter duplication, ask:

Do these pieces of code represent the same knowledge, or do they just look similar?
If I change one, should the other always change too?
Have I seen this pattern enough times to understand what abstraction it needs?

If you’re uncertain, leave the duplication. Duplicated code is easy to find and consolidate later. A bad abstraction infects your codebase, creating coupling that’s expensive to unwind.

DRY is a principle, not a rule. Apply it when it reduces the cost of change. Ignore it when abstraction would increase complexity without proportional benefit. The goal isn’t eliminating all duplication—it’s building systems that are easy to understand and modify.