Wildcard Pattern Matching: DP Solution

Wildcard pattern matching is everywhere. When you type `*.txt` in your terminal, use `SELECT * FROM` in SQL, or configure ignore patterns in `.gitignore`, you're using wildcard matching. The problem...

Key Insights

  • Wildcard pattern matching with * creates exponential branching in naive solutions, but overlapping subproblems make it a textbook dynamic programming candidate
  • The recurrence relation hinges on how * can match zero or more characters: dp[i][j] = dp[i-1][j] OR dp[i][j-1]
  • Space optimization from O(m×n) to O(n) requires careful handling of a single “diagonal” value when processing the * wildcard

The Problem: Wildcards in the Real World

Wildcard pattern matching is everywhere. When you type *.txt in your terminal, use SELECT * FROM in SQL, or configure ignore patterns in .gitignore, you’re using wildcard matching. The problem is deceptively simple: given a pattern containing wildcards and an input string, determine if they match.

The rules are straightforward:

  • ? matches exactly one character (any character)
  • * matches any sequence of characters, including an empty sequence
  • Regular characters must match exactly

For example:

  • Pattern a?c matches abc, adc, but not ac or abbc
  • Pattern a*c matches ac, abc, aXYZc, but not ab
  • Pattern * matches everything, including empty strings

The challenge? That innocent-looking * wildcard creates computational chaos.

Why Brute Force Fails

Your first instinct might be a recursive solution. Walk through the pattern and string simultaneously, handling each case:

def is_match_naive(pattern: str, text: str) -> bool:
    def match(p_idx: int, t_idx: int) -> bool:
        # Base case: pattern exhausted
        if p_idx == len(pattern):
            return t_idx == len(text)
        
        # Base case: text exhausted but pattern remains
        if t_idx == len(text):
            # Only valid if remaining pattern is all *
            return all(c == '*' for c in pattern[p_idx:])
        
        current_pattern = pattern[p_idx]
        
        if current_pattern == '?':
            # ? matches any single character
            return match(p_idx + 1, t_idx + 1)
        
        elif current_pattern == '*':
            # * matches zero characters OR one-or-more characters
            return match(p_idx + 1, t_idx) or match(p_idx, t_idx + 1)
        
        else:
            # Regular character must match exactly
            if current_pattern == text[t_idx]:
                return match(p_idx + 1, t_idx + 1)
            return False
    
    return match(0, 0)

This looks clean, but it’s a trap. The * case branches into two recursive calls: match zero characters (p_idx + 1, t_idx) or consume one character and try again (p_idx, t_idx + 1). With a pattern like *a*b*c*d* against a long string, you get exponential branching.

Consider matching **** against abcd. Each * can match 0 to n characters, creating O(n^k) combinations where k is the number of wildcards. For patterns with many * characters against long strings, this explodes.

But here’s the insight: we’re solving the same subproblems repeatedly. The call match(2, 5) might be reached through dozens of different paths. This overlap screams dynamic programming.

Identifying the DP Substructure

Let’s define our subproblem precisely. Let dp[i][j] represent whether pattern[0..i-1] matches text[0..j-1]. We use 1-based indexing for the DP table to handle empty prefixes cleanly.

The recurrence relation:

Base cases:
  dp[0][0] = True                    # Empty pattern matches empty text
  dp[0][j] = False for j > 0         # Empty pattern can't match non-empty text
  dp[i][0] = True if pattern[0..i-1] is all '*', else False

Transitions:
  If pattern[i-1] == text[j-1] or pattern[i-1] == '?':
      dp[i][j] = dp[i-1][j-1]        # Characters match, advance both
  
  If pattern[i-1] == '*':
      dp[i][j] = dp[i-1][j]          # * matches zero characters
               OR dp[i][j-1]          # * matches one more character
  
  Otherwise:
      dp[i][j] = False               # Mismatch

The * transition is the key insight. When we see a *:

  • dp[i-1][j]: The * matches nothing, so we check if the pattern without * matches the same text
  • dp[i][j-1]: The * consumes one character from text, and we check if the same pattern (with *) matches the remaining text

This “matches one more character” case propagates across the row, allowing * to consume arbitrarily many characters.

Bottom-Up DP Implementation

Now we build the solution systematically:

def is_match(pattern: str, text: str) -> bool:
    m, n = len(pattern), len(text)
    
    # dp[i][j] = does pattern[0..i-1] match text[0..j-1]?
    dp = [[False] * (n + 1) for _ in range(m + 1)]
    
    # Base case: empty pattern matches empty text
    dp[0][0] = True
    
    # Base case: pattern with only * can match empty text
    for i in range(1, m + 1):
        if pattern[i - 1] == '*':
            dp[i][0] = dp[i - 1][0]
        else:
            break  # Once we hit a non-*, rest will be False
    
    # Fill the table
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            p_char = pattern[i - 1]
            t_char = text[j - 1]
            
            if p_char == t_char or p_char == '?':
                # Direct match or single-char wildcard
                dp[i][j] = dp[i - 1][j - 1]
            
            elif p_char == '*':
                # * matches zero chars OR consumes one more char
                dp[i][j] = dp[i - 1][j] or dp[i][j - 1]
            
            # else: mismatch, dp[i][j] stays False
    
    return dp[m][n]

Let’s trace through an example. Pattern: a*b, Text: aXYb

        ""    a     X     Y     b
  ""    T     F     F     F     F
  a     F     T     F     F     F
  *     F     T     T     T     T
  b     F     F     F     F     T

The * row shows how the wildcard “spreads” True values across, allowing it to match X, XY, XYb, etc. The final cell dp[3][4] is True, confirming the match.

Space Optimization

We only reference the previous row (dp[i-1][...]) and earlier cells in the current row (dp[i][j-1]). This means we can reduce space from O(m×n) to O(n):

def is_match_optimized(pattern: str, text: str) -> bool:
    m, n = len(pattern), len(text)
    
    # Previous row
    prev = [False] * (n + 1)
    prev[0] = True
    
    for i in range(1, m + 1):
        p_char = pattern[i - 1]
        
        # Current row
        curr = [False] * (n + 1)
        
        # Handle empty text case
        if p_char == '*':
            curr[0] = prev[0]
        
        for j in range(1, n + 1):
            t_char = text[j - 1]
            
            if p_char == t_char or p_char == '?':
                curr[j] = prev[j - 1]
            elif p_char == '*':
                # prev[j] = dp[i-1][j] (zero match)
                # curr[j-1] = dp[i][j-1] (consume one more)
                curr[j] = prev[j] or curr[j - 1]
        
        prev = curr
    
    return prev[n]

The trick with * is that curr[j-1] was just computed in this iteration, representing the “consume one more character” case. This left-to-right propagation within the same row is what makes * work correctly.

Edge Cases & Testing

Wildcard matching has notorious edge cases. Here’s a comprehensive test suite:

import unittest

class TestWildcardMatching(unittest.TestCase):
    
    def test_exact_match(self):
        self.assertTrue(is_match("abc", "abc"))
        self.assertFalse(is_match("abc", "abd"))
    
    def test_single_char_wildcard(self):
        self.assertTrue(is_match("a?c", "abc"))
        self.assertTrue(is_match("???", "abc"))
        self.assertFalse(is_match("a?c", "ac"))
        self.assertFalse(is_match("a?c", "abbc"))
    
    def test_star_wildcard(self):
        self.assertTrue(is_match("a*", "a"))
        self.assertTrue(is_match("a*", "abcdef"))
        self.assertTrue(is_match("*a", "a"))
        self.assertTrue(is_match("*a", "xyza"))
        self.assertFalse(is_match("*a", "xyz"))
    
    def test_empty_inputs(self):
        self.assertTrue(is_match("", ""))
        self.assertFalse(is_match("", "a"))
        self.assertFalse(is_match("a", ""))
        self.assertTrue(is_match("*", ""))
        self.assertTrue(is_match("**", ""))
    
    def test_consecutive_stars(self):
        # Multiple * should behave like single *
        self.assertTrue(is_match("a**b", "ab"))
        self.assertTrue(is_match("a**b", "aXXXb"))
        self.assertTrue(is_match("***", "anything"))
    
    def test_complex_patterns(self):
        self.assertTrue(is_match("*a*b*c*", "abc"))
        self.assertTrue(is_match("*a*b*c*", "XXaYYbZZcWW"))
        self.assertFalse(is_match("*a*b*c*", "acb"))
        self.assertTrue(is_match("a*b?c*d", "aXXXbYcZZZd"))
    
    def test_all_stars(self):
        self.assertTrue(is_match("*****", ""))
        self.assertTrue(is_match("*****", "anything"))
    
    def test_pathological_case(self):
        # This would timeout with naive recursion
        pattern = "a" + "*a" * 20
        text = "a" * 21
        self.assertTrue(is_match(pattern, text))

if __name__ == "__main__":
    unittest.main()

The pathological case in the last test is critical. Patterns like a*a*a*a*... against strings of as are designed to break naive implementations. Our DP solution handles them in polynomial time.

Complexity Analysis & Wrap-Up

Time Complexity: O(m × n) where m is pattern length and n is text length. We fill each cell exactly once with O(1) work per cell.

Space Complexity: O(n) with the optimized solution, O(m × n) for the basic version.

This is a significant improvement over the exponential worst case of naive recursion. The solution is also practical—it’s essentially what simplified glob matching implementations use.

Comparison to Regex: Full regex matching (with . and * meaning “previous character zero or more times”) is a related but different problem. Regex * is a quantifier on the preceding element, while wildcard * is a standalone “match anything” token. Regex matching uses similar DP techniques but with different transition rules.

Related Problems:

  • LeetCode 10: Regular Expression Matching
  • LeetCode 44: Wildcard Matching (this problem)
  • Implementing glob patterns for file systems

The wildcard matching problem is a perfect example of how recognizing overlapping subproblems transforms an intractable recursive solution into an elegant polynomial-time algorithm. The key insight—that * either matches nothing or consumes one character and tries again—directly maps to the DP transition. Master this pattern, and you’ll recognize similar structures in string matching problems everywhere.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.