Wildcard Pattern Matching: DP Solution
Wildcard pattern matching is everywhere. When you type `*.txt` in your terminal, use `SELECT * FROM` in SQL, or configure ignore patterns in `.gitignore`, you're using wildcard matching. The problem...
Key Insights
- Wildcard pattern matching with
*creates exponential branching in naive solutions, but overlapping subproblems make it a textbook dynamic programming candidate - The recurrence relation hinges on how
*can match zero or more characters:dp[i][j] = dp[i-1][j] OR dp[i][j-1] - Space optimization from O(m×n) to O(n) requires careful handling of a single “diagonal” value when processing the
*wildcard
The Problem: Wildcards in the Real World
Wildcard pattern matching is everywhere. When you type *.txt in your terminal, use SELECT * FROM in SQL, or configure ignore patterns in .gitignore, you’re using wildcard matching. The problem is deceptively simple: given a pattern containing wildcards and an input string, determine if they match.
The rules are straightforward:
?matches exactly one character (any character)*matches any sequence of characters, including an empty sequence- Regular characters must match exactly
For example:
- Pattern
a?cmatchesabc,adc, but notacorabbc - Pattern
a*cmatchesac,abc,aXYZc, but notab - Pattern
*matches everything, including empty strings
The challenge? That innocent-looking * wildcard creates computational chaos.
Why Brute Force Fails
Your first instinct might be a recursive solution. Walk through the pattern and string simultaneously, handling each case:
def is_match_naive(pattern: str, text: str) -> bool:
def match(p_idx: int, t_idx: int) -> bool:
# Base case: pattern exhausted
if p_idx == len(pattern):
return t_idx == len(text)
# Base case: text exhausted but pattern remains
if t_idx == len(text):
# Only valid if remaining pattern is all *
return all(c == '*' for c in pattern[p_idx:])
current_pattern = pattern[p_idx]
if current_pattern == '?':
# ? matches any single character
return match(p_idx + 1, t_idx + 1)
elif current_pattern == '*':
# * matches zero characters OR one-or-more characters
return match(p_idx + 1, t_idx) or match(p_idx, t_idx + 1)
else:
# Regular character must match exactly
if current_pattern == text[t_idx]:
return match(p_idx + 1, t_idx + 1)
return False
return match(0, 0)
This looks clean, but it’s a trap. The * case branches into two recursive calls: match zero characters (p_idx + 1, t_idx) or consume one character and try again (p_idx, t_idx + 1). With a pattern like *a*b*c*d* against a long string, you get exponential branching.
Consider matching **** against abcd. Each * can match 0 to n characters, creating O(n^k) combinations where k is the number of wildcards. For patterns with many * characters against long strings, this explodes.
But here’s the insight: we’re solving the same subproblems repeatedly. The call match(2, 5) might be reached through dozens of different paths. This overlap screams dynamic programming.
Identifying the DP Substructure
Let’s define our subproblem precisely. Let dp[i][j] represent whether pattern[0..i-1] matches text[0..j-1]. We use 1-based indexing for the DP table to handle empty prefixes cleanly.
The recurrence relation:
Base cases:
dp[0][0] = True # Empty pattern matches empty text
dp[0][j] = False for j > 0 # Empty pattern can't match non-empty text
dp[i][0] = True if pattern[0..i-1] is all '*', else False
Transitions:
If pattern[i-1] == text[j-1] or pattern[i-1] == '?':
dp[i][j] = dp[i-1][j-1] # Characters match, advance both
If pattern[i-1] == '*':
dp[i][j] = dp[i-1][j] # * matches zero characters
OR dp[i][j-1] # * matches one more character
Otherwise:
dp[i][j] = False # Mismatch
The * transition is the key insight. When we see a *:
dp[i-1][j]: The*matches nothing, so we check if the pattern without*matches the same textdp[i][j-1]: The*consumes one character from text, and we check if the same pattern (with*) matches the remaining text
This “matches one more character” case propagates across the row, allowing * to consume arbitrarily many characters.
Bottom-Up DP Implementation
Now we build the solution systematically:
def is_match(pattern: str, text: str) -> bool:
m, n = len(pattern), len(text)
# dp[i][j] = does pattern[0..i-1] match text[0..j-1]?
dp = [[False] * (n + 1) for _ in range(m + 1)]
# Base case: empty pattern matches empty text
dp[0][0] = True
# Base case: pattern with only * can match empty text
for i in range(1, m + 1):
if pattern[i - 1] == '*':
dp[i][0] = dp[i - 1][0]
else:
break # Once we hit a non-*, rest will be False
# Fill the table
for i in range(1, m + 1):
for j in range(1, n + 1):
p_char = pattern[i - 1]
t_char = text[j - 1]
if p_char == t_char or p_char == '?':
# Direct match or single-char wildcard
dp[i][j] = dp[i - 1][j - 1]
elif p_char == '*':
# * matches zero chars OR consumes one more char
dp[i][j] = dp[i - 1][j] or dp[i][j - 1]
# else: mismatch, dp[i][j] stays False
return dp[m][n]
Let’s trace through an example. Pattern: a*b, Text: aXYb
"" a X Y b
"" T F F F F
a F T F F F
* F T T T T
b F F F F T
The * row shows how the wildcard “spreads” True values across, allowing it to match X, XY, XYb, etc. The final cell dp[3][4] is True, confirming the match.
Space Optimization
We only reference the previous row (dp[i-1][...]) and earlier cells in the current row (dp[i][j-1]). This means we can reduce space from O(m×n) to O(n):
def is_match_optimized(pattern: str, text: str) -> bool:
m, n = len(pattern), len(text)
# Previous row
prev = [False] * (n + 1)
prev[0] = True
for i in range(1, m + 1):
p_char = pattern[i - 1]
# Current row
curr = [False] * (n + 1)
# Handle empty text case
if p_char == '*':
curr[0] = prev[0]
for j in range(1, n + 1):
t_char = text[j - 1]
if p_char == t_char or p_char == '?':
curr[j] = prev[j - 1]
elif p_char == '*':
# prev[j] = dp[i-1][j] (zero match)
# curr[j-1] = dp[i][j-1] (consume one more)
curr[j] = prev[j] or curr[j - 1]
prev = curr
return prev[n]
The trick with * is that curr[j-1] was just computed in this iteration, representing the “consume one more character” case. This left-to-right propagation within the same row is what makes * work correctly.
Edge Cases & Testing
Wildcard matching has notorious edge cases. Here’s a comprehensive test suite:
import unittest
class TestWildcardMatching(unittest.TestCase):
def test_exact_match(self):
self.assertTrue(is_match("abc", "abc"))
self.assertFalse(is_match("abc", "abd"))
def test_single_char_wildcard(self):
self.assertTrue(is_match("a?c", "abc"))
self.assertTrue(is_match("???", "abc"))
self.assertFalse(is_match("a?c", "ac"))
self.assertFalse(is_match("a?c", "abbc"))
def test_star_wildcard(self):
self.assertTrue(is_match("a*", "a"))
self.assertTrue(is_match("a*", "abcdef"))
self.assertTrue(is_match("*a", "a"))
self.assertTrue(is_match("*a", "xyza"))
self.assertFalse(is_match("*a", "xyz"))
def test_empty_inputs(self):
self.assertTrue(is_match("", ""))
self.assertFalse(is_match("", "a"))
self.assertFalse(is_match("a", ""))
self.assertTrue(is_match("*", ""))
self.assertTrue(is_match("**", ""))
def test_consecutive_stars(self):
# Multiple * should behave like single *
self.assertTrue(is_match("a**b", "ab"))
self.assertTrue(is_match("a**b", "aXXXb"))
self.assertTrue(is_match("***", "anything"))
def test_complex_patterns(self):
self.assertTrue(is_match("*a*b*c*", "abc"))
self.assertTrue(is_match("*a*b*c*", "XXaYYbZZcWW"))
self.assertFalse(is_match("*a*b*c*", "acb"))
self.assertTrue(is_match("a*b?c*d", "aXXXbYcZZZd"))
def test_all_stars(self):
self.assertTrue(is_match("*****", ""))
self.assertTrue(is_match("*****", "anything"))
def test_pathological_case(self):
# This would timeout with naive recursion
pattern = "a" + "*a" * 20
text = "a" * 21
self.assertTrue(is_match(pattern, text))
if __name__ == "__main__":
unittest.main()
The pathological case in the last test is critical. Patterns like a*a*a*a*... against strings of as are designed to break naive implementations. Our DP solution handles them in polynomial time.
Complexity Analysis & Wrap-Up
Time Complexity: O(m × n) where m is pattern length and n is text length. We fill each cell exactly once with O(1) work per cell.
Space Complexity: O(n) with the optimized solution, O(m × n) for the basic version.
This is a significant improvement over the exponential worst case of naive recursion. The solution is also practical—it’s essentially what simplified glob matching implementations use.
Comparison to Regex: Full regex matching (with . and * meaning “previous character zero or more times”) is a related but different problem. Regex * is a quantifier on the preceding element, while wildcard * is a standalone “match anything” token. Regex matching uses similar DP techniques but with different transition rules.
Related Problems:
- LeetCode 10: Regular Expression Matching
- LeetCode 44: Wildcard Matching (this problem)
- Implementing glob patterns for file systems
The wildcard matching problem is a perfect example of how recognizing overlapping subproblems transforms an intractable recursive solution into an elegant polynomial-time algorithm. The key insight—that * either matches nothing or consumes one character and tries again—directly maps to the DP transition. Master this pattern, and you’ll recognize similar structures in string matching problems everywhere.