Refactoring: Improving Code Structure

Key Insights

Refactoring is the disciplined practice of restructuring existing code without changing its external behavior—it’s how you pay down technical debt incrementally rather than letting it compound into a rewrite.
Code smells like long methods, duplicate code, and feature envy are symptoms, not diseases. Learning to recognize them gives you a diagnostic framework for knowing when and where to refactor.
Tests aren’t optional for refactoring—they’re the safety net that lets you move fast with confidence. Without them, you’re just editing code and hoping for the best.

What Is Refactoring and Why It Matters

Refactoring is restructuring code without changing what it does. That definition sounds simple, but the discipline it implies is profound. You’re not adding features. You’re not fixing bugs. You’re improving the internal structure so the code becomes easier to understand, modify, and extend.

Every codebase accumulates technical debt. Shortcuts taken under deadline pressure, designs that made sense before requirements evolved, code written before you understood the domain—all of it adds friction. Technical debt isn’t inherently bad; it’s a tool, like financial debt. The problem is when you stop paying it down.

Refactoring is how you make those payments. Small, continuous improvements that keep your codebase healthy. The alternative is a rewrite, which is almost always more expensive, risky, and time-consuming than teams estimate. Refactoring lets you improve incrementally while continuing to deliver value.

The key insight is that refactoring and feature development are complementary activities. You refactor to make the next feature easier to build. Then you build the feature. Then you refactor to clean up what you learned. This rhythm keeps codebases sustainable over years, not months.

Code Smells: Recognizing When to Refactor

Martin Fowler popularized the term “code smell” to describe surface indicators of deeper problems. Smells aren’t bugs—the code works—but they signal structural issues that will cause pain later.

Long Methods are the most common smell. When a function exceeds 20-30 lines, it’s usually doing too much. You’ll find yourself scrolling to understand it, or reading the same lines multiple times.

Duplicate Code violates DRY (Don’t Repeat Yourself). When you fix a bug in one place and realize you need to fix it in three others, you’ve found duplication.

Large Classes accumulate responsibilities over time. A User class that handles authentication, profile management, email preferences, and billing has become a god object.

Feature Envy occurs when a method seems more interested in another class’s data than its own. If you’re constantly reaching into another object to get values for calculations, the calculation probably belongs in that other object.

Primitive Obsession means using basic types where a domain object would be clearer. Passing around strings for email addresses, integers for money, or arrays for coordinates obscures intent.

Here’s a method exhibiting multiple smells:

def process_order(order_data, user_id, items, discount_code, shipping_address):
    # Validate user
    user = db.query(f"SELECT * FROM users WHERE id = {user_id}")
    if not user:
        return {"error": "User not found"}
    if user["status"] == "suspended":
        return {"error": "User suspended"}
    
    # Calculate totals
    subtotal = 0
    for item in items:
        product = db.query(f"SELECT * FROM products WHERE id = {item['product_id']}")
        if product["stock"] < item["quantity"]:
            return {"error": f"Insufficient stock for {product['name']}"}
        price = product["price"] * item["quantity"]
        if product["category"] == "electronics":
            price = price * 0.95  # Electronics discount
        subtotal += price
    
    # Apply discount
    if discount_code:
        discount = db.query(f"SELECT * FROM discounts WHERE code = '{discount_code}'")
        if discount and discount["expires_at"] > datetime.now():
            if discount["type"] == "percentage":
                subtotal = subtotal * (1 - discount["value"] / 100)
            else:
                subtotal = subtotal - discount["value"]
    
    # Calculate shipping
    if shipping_address["country"] == "US":
        if subtotal > 100:
            shipping = 0
        else:
            shipping = 9.99
    else:
        shipping = 29.99
    
    total = subtotal + shipping
    
    # Create order
    order_id = db.insert("orders", {
        "user_id": user_id,
        "total": total,
        "status": "pending"
    })
    
    return {"order_id": order_id, "total": total}

This 50-line function validates users, calculates prices, applies discounts, determines shipping, and creates orders. It’s a long method with multiple responsibilities, making it hard to test, understand, or modify safely.

Essential Refactoring Techniques

Extract Method is your workhorse refactoring. When you see a block of code that does one thing, pull it into a named function. The name documents intent, and the smaller functions become testable in isolation.

Let’s refactor that order processing function:

class OrderProcessor:
    def __init__(self, user_repository, product_repository, discount_repository):
        self.users = user_repository
        self.products = product_repository
        self.discounts = discount_repository
    
    def process(self, user_id: int, items: list, discount_code: str, shipping_address: dict):
        user = self._validate_user(user_id)
        line_items = self._build_line_items(items)
        subtotal = self._calculate_subtotal(line_items)
        subtotal = self._apply_discount(subtotal, discount_code)
        shipping = self._calculate_shipping(subtotal, shipping_address)
        
        return self._create_order(user_id, subtotal + shipping)
    
    def _validate_user(self, user_id: int):
        user = self.users.find(user_id)
        if not user:
            raise OrderError("User not found")
        if user.is_suspended:
            raise OrderError("User suspended")
        return user
    
    def _build_line_items(self, items: list) -> list:
        line_items = []
        for item in items:
            product = self.products.find(item["product_id"])
            if product.stock < item["quantity"]:
                raise OrderError(f"Insufficient stock for {product.name}")
            line_items.append(LineItem(product, item["quantity"]))
        return line_items
    
    def _calculate_subtotal(self, line_items: list) -> Decimal:
        return sum(item.total_price for item in line_items)
    
    def _apply_discount(self, subtotal: Decimal, code: str) -> Decimal:
        if not code:
            return subtotal
        discount = self.discounts.find_valid(code)
        return discount.apply(subtotal) if discount else subtotal
    
    def _calculate_shipping(self, subtotal: Decimal, address: dict) -> Decimal:
        return ShippingCalculator.calculate(subtotal, address)
    
    def _create_order(self, user_id: int, total: Decimal):
        return Order.create(user_id=user_id, total=total, status="pending")

Rename Variable seems trivial but dramatically improves readability. d becomes discount, temp becomes accumulated_total.

Replace Conditional with Polymorphism eliminates complex switch statements by using objects that know how to handle their own cases:

# Before
def calculate_area(shape):
    if shape["type"] == "circle":
        return 3.14159 * shape["radius"] ** 2
    elif shape["type"] == "rectangle":
        return shape["width"] * shape["height"]
    elif shape["type"] == "triangle":
        return 0.5 * shape["base"] * shape["height"]

# After
class Circle:
    def __init__(self, radius):
        self.radius = radius
    
    def area(self):
        return 3.14159 * self.radius ** 2

class Rectangle:
    def __init__(self, width, height):
        self.width = width
        self.height = height
    
    def area(self):
        return self.width * self.height

Introduce Parameter Object consolidates related parameters that travel together. If you’re passing street, city, state, and zip_code to multiple functions, create an Address class.

Refactoring Safely: Tests as Your Safety Net

Here’s the uncomfortable truth: refactoring without tests is just editing code and hoping. You need automated verification that behavior remains unchanged.

Before refactoring, ensure you have tests covering the code you’re about to change. If tests don’t exist, write them first. For legacy code without tests, write “characterization tests”—tests that capture current behavior, even if that behavior includes bugs.

class TestOrderProcessor:
    def test_valid_order_returns_order_id(self):
        processor = OrderProcessor(
            user_repository=FakeUserRepo(valid_user),
            product_repository=FakeProductRepo(in_stock_product),
            discount_repository=FakeDiscountRepo()
        )
        
        result = processor.process(
            user_id=1,
            items=[{"product_id": 1, "quantity": 2}],
            discount_code=None,
            shipping_address={"country": "US"}
        )
        
        assert "order_id" in result
        assert result["total"] == Decimal("19.98")  # 2 * 9.99
    
    def test_suspended_user_raises_error(self):
        processor = OrderProcessor(
            user_repository=FakeUserRepo(suspended_user),
            product_repository=FakeProductRepo(),
            discount_repository=FakeDiscountRepo()
        )
        
        with pytest.raises(OrderError, match="User suspended"):
            processor.process(user_id=1, items=[], discount_code=None, shipping_address={})

Make small changes, run tests, commit. Repeat. If tests fail, you know exactly which change caused the problem because you only changed one thing. This discipline feels slow but actually accelerates overall progress by eliminating debugging sessions.

Real-World Refactoring Walkthrough

Let’s trace the progression of our order processing code through multiple refactoring stages.

Stage 1: Extract Methods — Pull each logical block into its own function. The main function becomes a readable summary of the process.

Stage 2: Introduce Domain Objects — Create LineItem, Discount, and ShippingCalculator classes that encapsulate their own logic.

Stage 3: Inject Dependencies — Replace direct database calls with repository interfaces. This enables testing and decouples the processor from storage details.

Stage 4: Extract Class — If OrderProcessor grows too large, split it. Perhaps PricingCalculator handles subtotals and discounts while OrderProcessor orchestrates the flow.

Each stage leaves the code working. Each stage makes the next improvement possible.

Tools and IDE Support

Modern IDEs make mechanical refactoring nearly effortless. IntelliJ and VS Code both offer:

Rename (F2 in most editors) — Updates all references automatically
Extract Method/Function — Select code, invoke the refactoring, name the new function
Extract Variable — Turn an expression into a named variable
Inline — The reverse of extract, for when abstraction hurts more than it helps

Linters like ESLint, Pylint, and RuboCop identify code smells automatically. SonarQube provides deeper static analysis, flagging complexity, duplication, and potential bugs.

Use these tools, but don’t let them think for you. They find candidates; you decide what actually needs attention.

When Not to Refactor

Refactoring isn’t always the right call. Consider skipping it when:

You’re against a hard deadline. Ship first, refactor after. Technical debt is a valid tool when used consciously.

The code is scheduled for deletion. Don’t polish code that’s being replaced next sprint.

Requirements are still unclear. Premature abstraction creates the wrong abstractions. Wait until patterns emerge from real usage.

You don’t have tests. Write tests first, or accept that you’re taking on risk.

The refactoring scope keeps expanding. Set a timebox. If you can’t finish the refactoring in that time, revert and try a smaller scope.

Pragmatism beats purity. The goal is sustainable software development, not perfect code. Refactor enough to keep velocity high, but don’t let refactoring become procrastination disguised as productivity.