Distinct Subsequences: Counting Subsequences DP

Key Insights

The distinct subsequences problem reveals a fundamental DP pattern: at each position, you either match and advance both pointers, or skip and advance only the source pointer—the total count is the sum of both choices.
Space optimization from O(m×n) to O(n) requires backward iteration to prevent overwriting values you still need, a technique applicable to many 2D DP problems.
This counting pattern appears frequently in bioinformatics, text matching, and combinatorics problems—mastering it unlocks solutions to an entire family of related challenges.

Introduction to the Problem

The Distinct Subsequences problem (LeetCode 115) asks a deceptively simple question: given a source string s and a target string t, count how many distinct subsequences of s equal t.

A subsequence differs from a substring in a critical way. A substring must be contiguous characters, while a subsequence can skip characters but must maintain relative order. For “abc”, valid subsequences include “a”, “ac”, “abc”, but not “ca” (wrong order) or “d” (not present).

This problem appears in practical contexts more often than you might expect. Bioinformatics uses similar algorithms for DNA sequence alignment. Text editors leverage these patterns for fuzzy matching. Plagiarism detection systems count common subsequences to identify copied content.

Let’s start with the naive approach to understand why we need dynamic programming:

def num_distinct_brute_force(s: str, t: str) -> int:
    def recurse(i: int, j: int) -> int:
        # Base case: matched all of t
        if j == len(t):
            return 1
        # Base case: exhausted s without matching t
        if i == len(s):
            return 0
        
        # Always try skipping current character in s
        count = recurse(i + 1, j)
        
        # If characters match, also try using this character
        if s[i] == t[j]:
            count += recurse(i + 1, j + 1)
        
        return count
    
    return recurse(0, 0)

This solution has O(2^m) time complexity where m is the length of s. Each character in s presents a binary choice: use it or skip it. For large inputs, this explodes exponentially.

Building Intuition with Small Examples

Consider the classic example: counting subsequences of “rabbbit” that equal “rabbit”. The answer is 3, but why?

The source has three ‘b’ characters at positions 2, 3, and 4. To form “rabbit”, we need exactly two ‘b’ characters. We can choose:

Positions 2 and 3 (skip position 4)
Positions 2 and 4 (skip position 3)
Positions 3 and 4 (skip position 2)

Each selection produces a valid subsequence. The other characters have only one option each, so three total subsequences exist.

Here’s a helper to visualize all subsequences for debugging small inputs:

def enumerate_subsequences(s: str, t: str) -> list[str]:
    """Returns all subsequences of s that equal t, showing selected indices."""
    results = []
    
    def backtrack(i: int, j: int, path: list[int]):
        if j == len(t):
            results.append(f"indices: {path} -> {''.join(s[idx] for idx in path)}")
            return
        if i == len(s):
            return
        
        # Skip current character
        backtrack(i + 1, j, path)
        
        # Use current character if it matches
        if s[i] == t[j]:
            backtrack(i + 1, j + 1, path + [i])
    
    backtrack(0, 0, [])
    return results

# Example usage
for seq in enumerate_subsequences("rabbbit", "rabbit"):
    print(seq)
# Output:
# indices: [0, 1, 2, 3, 4, 5] -> rabbit
# indices: [0, 1, 2, 4, 5, 6] -> rabbit  
# indices: [0, 1, 3, 4, 5, 6] -> rabbit

Notice how the recursive calls overlap. When we’re at position i=3 in s and j=2 in t, we compute the same subproblem regardless of how we arrived there. This overlapping substructure screams for memoization.

Recursive Solution with Memoization (Top-Down DP)

The recurrence relation captures our two choices at each step:

dp(i, j) = dp(i+1, j)                           # skip s[i]
         + dp(i+1, j+1) if s[i] == t[j]         # use s[i] to match t[j]

Base cases:

j == len(t): We’ve matched all of t, return 1
i == len(s): We’ve exhausted s without matching t, return 0

The state space is O(m × n) since i ranges over s and j ranges over t. Each state computes in O(1), giving us O(m × n) time complexity.

from functools import lru_cache

def num_distinct_memo(s: str, t: str) -> int:
    @lru_cache(maxsize=None)
    def dp(i: int, j: int) -> int:
        # Successfully matched all of t
        if j == len(t):
            return 1
        
        # Ran out of characters in s
        if i == len(s):
            return 0
        
        # Remaining s is shorter than remaining t - impossible to match
        if len(s) - i < len(t) - j:
            return 0
        
        # Count ways by skipping s[i]
        result = dp(i + 1, j)
        
        # Add ways by matching s[i] with t[j]
        if s[i] == t[j]:
            result += dp(i + 1, j + 1)
        
        return result
    
    return dp(0, 0)

The pruning condition len(s) - i < len(t) - j prevents exploring impossible states where remaining source characters can’t possibly match remaining target characters.

Tabulation Approach (Bottom-Up DP)

Converting to bottom-up iteration eliminates recursion overhead and makes the algorithm’s structure explicit. We build a 2D table where dp[i][j] represents the number of ways to match t[j:] using s[i:].

def num_distinct_tabulation(s: str, t: str) -> int:
    m, n = len(s), len(t)
    
    # dp[i][j] = ways to form t[j:] from s[i:]
    # Extra row/column for base cases
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    
    # Base case: empty t can be formed from any suffix of s (one way: select nothing)
    for i in range(m + 1):
        dp[i][n] = 1
    
    # Fill table from bottom-right to top-left
    for i in range(m - 1, -1, -1):
        for j in range(n - 1, -1, -1):
            # Always can skip s[i]
            dp[i][j] = dp[i + 1][j]
            
            # If characters match, add ways from matching
            if s[i] == t[j]:
                dp[i][j] += dp[i + 1][j + 1]
    
    return dp[0][0]

The table fills right-to-left, bottom-to-top because each cell depends on cells below and to the right. Visualizing for s=“abb”, t=“ab”:

      a   b   ""
  +---+---+---+
a | 2 | 1 | 1 |  dp[0][0]=2: two ways to form "ab" from "abb"
  +---+---+---+
b | 0 | 1 | 1 |  dp[1][1]=1: one way to form "b" from "bb"  
  +---+---+---+
b | 0 | 1 | 1 |  dp[2][1]=1: one way to form "b" from "b"
  +---+---+---+
""| 0 | 0 | 1 |  base case row
  +---+---+---+

Space Optimization Techniques

Observe that each row only depends on the row below it. We can collapse the 2D table into a single 1D array, reducing space from O(m × n) to O(n).

The critical insight: we must iterate j backwards. If we went forwards, we’d overwrite dp[j+1] before using it for dp[j].

def num_distinct_optimized(s: str, t: str) -> int:
    m, n = len(s), len(t)
    
    # dp[j] = ways to form t[j:] from current suffix of s
    dp = [0] * (n + 1)
    dp[n] = 1  # Base case: one way to form empty string
    
    # Process each character of s from end to start
    for i in range(m - 1, -1, -1):
        # Process t from start to end (backwards in terms of dependency)
        # We go left-to-right but save previous value
        for j in range(n):
            if s[i] == t[j]:
                dp[j] += dp[j + 1]
        # Note: dp[n] stays 1 throughout
    
    return dp[0]

Wait—that iteration order looks wrong. Let me fix it with the correct backward iteration:

def num_distinct_optimized(s: str, t: str) -> int:
    n = len(t)
    
    # dp[j] = ways to form t[j:] from current suffix of s
    dp = [0] * (n + 1)
    dp[n] = 1
    
    for char in s:
        # Iterate backwards to avoid using updated values
        for j in range(n - 1, -1, -1):
            if char == t[j]:
                dp[j] += dp[j + 1]
    
    return dp[0]

Actually, the direction depends on how you frame the problem. Here’s the clearest version:

def num_distinct_space_optimized(s: str, t: str) -> int:
    n = len(t)
    dp = [0] * (n + 1)
    dp[0] = 1  # One way to form empty prefix of t
    
    for char in s:
        # Must go backwards: dp[j] depends on dp[j-1] from BEFORE this iteration
        for j in range(n, 0, -1):
            if char == t[j - 1]:
                dp[j] += dp[j - 1]
    
    return dp[n]

Edge Cases and Variations

Production code must handle edge cases gracefully:

def num_distinct_production(s: str, t: str, mod: int = None) -> int:
    """
    Count distinct subsequences of s that equal t.
    
    Args:
        s: Source string
        t: Target string  
        mod: Optional modulus for large counts (e.g., 10**9 + 7)
    
    Returns:
        Count of distinct subsequences, optionally modulo mod
    """
    # Edge case: empty target matches any source exactly once
    if not t:
        return 1
    
    # Edge case: empty source can't match non-empty target
    if not s:
        return 0
    
    # Edge case: source shorter than target - impossible
    if len(s) < len(t):
        return 0
    
    n = len(t)
    dp = [0] * (n + 1)
    dp[0] = 1
    
    for char in s:
        for j in range(n, 0, -1):
            if char == t[j - 1]:
                dp[j] += dp[j - 1]
                if mod:
                    dp[j] %= mod
    
    return dp[n]

Complexity Summary and Key Takeaways

Approach	Time	Space	Readability
Brute Force	O(2^m)	O(m)	High
Memoization	O(m×n)	O(m×n)	High
Tabulation	O(m×n)	O(m×n)	Medium
Space-Optimized	O(m×n)	O(n)	Lower

Choose memoization for interviews—it’s easier to derive correctly. Use space-optimized tabulation in production when memory matters.

The counting subsequences pattern applies whenever you need to count ways to select elements maintaining order. Key signals: “how many ways”, “subsequence”, “maintain order”. When you see these, consider whether the choice at each position is binary (use/skip) and whether subproblems overlap.

Practice these related problems to cement the pattern: Decode Ways, Unique Paths, and Interleaving String all use similar state transition logic with different recurrence relations.