Regular Expression Matching: DP Implementation

Regular expression matching with `.` (matches any single character) and `*` (matches zero or more of the preceding element) is a classic dynamic programming problem. Given a string `text` and a...

Key Insights

  • Regular expression matching with * creates overlapping subproblems because the wildcard can match zero, one, or many characters, leading to exponential recursive calls without memoization
  • The DP state dp[i][j] represents whether the suffix text[i:] matches pattern[j:], and transitions depend on whether the next pattern character is followed by *
  • Bottom-up DP achieves O(m×n) time complexity and can be space-optimized to O(n) using a rolling array, making it practical for production use

The Problem: Why Naive Recursion Fails

Regular expression matching with . (matches any single character) and * (matches zero or more of the preceding element) is a classic dynamic programming problem. Given a string text and a pattern pattern, determine if the pattern matches the entire string.

The naive recursive approach explores every possible way * can consume characters. For a pattern like a*a*a*a*b against a string of as, each * branches into multiple paths. The time complexity explodes to O(2^n) in pathological cases.

This is where dynamic programming transforms an intractable problem into an efficient one.

Understanding the Recursive Structure

Before jumping to DP, let’s understand the recursive logic. At any position in the text and pattern, we face two scenarios:

Case 1: No * following the current pattern character Simply check if the current characters match (either identical or pattern has .), then recurse on the remaining strings.

Case 2: * follows the current pattern character We have a choice:

  • Skip the x* entirely (match zero occurrences)
  • If current characters match, consume one text character and keep the x* pattern (allowing more matches)

Here’s the recursive solution:

def is_match_recursive(text: str, pattern: str) -> bool:
    def match(i: int, j: int) -> bool:
        # Base case: pattern exhausted
        if j == len(pattern):
            return i == len(text)
        
        # Check if current characters match
        first_match = i < len(text) and pattern[j] in {text[i], '.'}
        
        # Case: '*' follows current pattern character
        if j + 1 < len(pattern) and pattern[j + 1] == '*':
            # Skip 'x*' entirely OR consume one char and stay on 'x*'
            return match(i, j + 2) or (first_match and match(i + 1, j))
        
        # Case: no '*', must match current and advance both
        return first_match and match(i + 1, j + 1)
    
    return match(0, 0)

This solution is correct but catastrophically slow. The branching at each * creates a recursion tree with exponential nodes.

Identifying Overlapping Subproblems

The key insight for DP is recognizing that match(i, j) gets called repeatedly with the same arguments. Consider matching "aaa" against "a*a":

match(0, 0) → tries match(0, 2) and match(1, 0)
match(1, 0) → tries match(1, 2) and match(2, 0)
match(2, 0) → tries match(2, 2) and match(3, 0)

The state space is bounded: there are only (len(text) + 1) × (len(pattern) + 1) unique subproblems. We can cache results.

def is_match_memo(text: str, pattern: str) -> bool:
    memo = {}
    
    def match(i: int, j: int) -> bool:
        if (i, j) in memo:
            return memo[(i, j)]
        
        if j == len(pattern):
            result = i == len(text)
        else:
            first_match = i < len(text) and pattern[j] in {text[i], '.'}
            
            if j + 1 < len(pattern) and pattern[j + 1] == '*':
                result = match(i, j + 2) or (first_match and match(i + 1, j))
            else:
                result = first_match and match(i + 1, j + 1)
        
        memo[(i, j)] = result
        return result
    
    return match(0, 0)

This top-down approach with memoization achieves O(m×n) time complexity. Each subproblem is solved once.

Bottom-Up DP Implementation

The bottom-up approach builds the solution iteratively, filling a 2D table from the base cases upward. This eliminates recursion overhead and makes the logic explicit.

def is_match_dp(text: str, pattern: str) -> bool:
    m, n = len(text), len(pattern)
    
    # dp[i][j] = True if text[i:] matches pattern[j:]
    dp = [[False] * (n + 1) for _ in range(m + 1)]
    
    # Base case: empty text matches empty pattern
    dp[m][n] = True
    
    # Fill table from bottom-right to top-left
    for i in range(m, -1, -1):
        for j in range(n - 1, -1, -1):
            # Check if current characters match
            first_match = i < m and pattern[j] in {text[i], '.'}
            
            if j + 1 < n and pattern[j + 1] == '*':
                # Option 1: skip 'x*' (match zero occurrences)
                # Option 2: use 'x*' to match current char, stay on pattern
                dp[i][j] = dp[i][j + 2] or (first_match and dp[i + 1][j])
            else:
                # No '*': must match current char and advance both
                dp[i][j] = first_match and dp[i + 1][j + 1]
    
    return dp[0][0]

The iteration order matters. We process from i = m down to 0 and j = n - 1 down to 0 because each cell depends on cells to its right and below.

Walkthrough with Examples

Let’s trace through text = "aab" and pattern = "c*a*b":

Pattern: c * a * b
Index:   0 1 2 3 4

Text: a a b
Index: 0 1 2

Building the DP table (True = T, False = F):

        j=0   j=1   j=2   j=3   j=4   j=5
        'c'   '*'   'a'   '*'   'b'   ''
i=3 ''   F     F     F     F     F     T
i=2 'b'  F     F     F     F     T     F
i=1 'a'  F     F     T     T     F     F
i=0 'a'  F     F     T     T     F     F

Let’s trace key cells:

  • dp[3][5] = True: empty matches empty
  • dp[3][4] = False: empty doesn’t match “b”
  • dp[3][3] = dp[3][5] = True: empty matches “” (skip “a”)
  • dp[2][4] = True: “b” matches “b”
  • dp[1][3] = dp[1][5] or (True and dp[2][3]) = True: “a” matches “a*”
  • dp[0][0] = dp[0][2] = True: “aab” matches “cab”

The answer is dp[0][0] = True.

Now consider a failing case: text = "mississippi", pattern = "mis*is*p*.":

The pattern requires exactly one character after the final p*, but “mississippi” ends with “ppi”—two characters after the p sequence. The DP correctly returns False.

Complexity Analysis and Space Optimization

Time Complexity: O(m × n) where m = len(text), n = len(pattern). We fill each cell once with O(1) work.

Space Complexity: O(m × n) for the DP table.

We can optimize space to O(n) by observing that each row only depends on the current row and the row below. We use two arrays and alternate:

def is_match_optimized(text: str, pattern: str) -> bool:
    m, n = len(text), len(pattern)
    
    # Current and previous rows
    curr = [False] * (n + 1)
    prev = [False] * (n + 1)
    
    # Base case: empty text, empty pattern
    prev[n] = True
    
    # Handle empty text with pattern (only 'x*' patterns can match)
    for j in range(n - 2, -1, -2):
        if pattern[j + 1] == '*':
            prev[j] = prev[j + 2]
    
    for i in range(m - 1, -1, -1):
        curr[n] = False  # Non-empty text can't match empty pattern
        
        for j in range(n - 1, -1, -1):
            first_match = pattern[j] in {text[i], '.'}
            
            if j + 1 < n and pattern[j + 1] == '*':
                # Skip 'x*' or consume char
                curr[j] = curr[j + 2] or (first_match and prev[j])
            else:
                curr[j] = first_match and prev[j + 1]
        
        prev, curr = curr, prev
    
    return prev[0]

This reduces space from O(m × n) to O(n) while maintaining O(m × n) time.

Practical Considerations

When to use DP vs. NFA-based approaches:

DP works well for this simplified regex with only . and *. Production regex engines (PCRE, RE2) use finite automata because they handle far more operators: +, ?, |, character classes, anchors, backreferences, and lookahead/lookbehind.

For interview problems and educational purposes, DP is the right tool. For production code, use your language’s regex library—it’s battle-tested and handles edge cases you haven’t considered.

Related problems worth studying:

  • Wildcard Matching (LeetCode 44): Uses ? (single char) and * (any sequence). Simpler because * is standalone, not a modifier.
  • Edit Distance: Same DP structure with different transitions.
  • Distinct Subsequences: Counting version of string matching.

Common implementation bugs:

  1. Off-by-one errors in the * lookahead check (j + 1 < n)
  2. Forgetting that * can match zero occurrences
  3. Incorrect base case handling for patterns ending in x*

The DP approach transforms an exponential problem into polynomial time by recognizing that the massive recursion tree contains only O(m × n) unique states. Master this pattern—it appears constantly in string processing, sequence alignment, and parsing problems.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.