Subset Sum Problem: DP and Backtracking Solutions
The subset sum problem asks a deceptively simple question: given a set of integers and a target sum, does any subset of those integers add up exactly to the target? Despite its straightforward...
Key Insights
- The subset sum problem is NP-complete, but dynamic programming provides a pseudo-polynomial O(n×sum) solution that works well when the target sum is reasonably bounded.
- Backtracking with intelligent pruning often outperforms DP in practice when you need the actual subset or when solutions are sparse and can be found early.
- Space optimization reduces the 2D DP table to a 1D array, but you must iterate backwards to avoid overwriting values you still need.
Introduction & Problem Definition
The subset sum problem asks a deceptively simple question: given a set of integers and a target sum, does any subset of those integers add up exactly to the target? Despite its straightforward definition, this problem is NP-complete and serves as a foundation for understanding computational complexity.
You’ll encounter subset sum in practical scenarios more often than you might expect. Resource allocation systems need to determine if a combination of tasks fits within a time budget. Financial applications check if a set of transactions can reconcile to a specific balance. Cryptographic systems, particularly knapsack-based encryption schemes, rely on the computational hardness of this problem.
The problem has two variants: the decision version (does a valid subset exist?) and the search version (what is that subset?). We’ll tackle both, since real applications usually need the actual subset, not just a boolean answer.
Naive Recursive Approach
The most intuitive solution explores every possible subset. For each element, you make a binary choice: include it or exclude it. This generates all 2^n possible subsets and checks each one.
def subset_sum_recursive(nums: list[int], target: int) -> bool:
def helper(index: int, remaining: int) -> bool:
# Base cases
if remaining == 0:
return True
if index >= len(nums) or remaining < 0:
return False
# Include current element or exclude it
include = helper(index + 1, remaining - nums[index])
exclude = helper(index + 1, remaining)
return include or exclude
return helper(0, target)
This solution has O(2^n) time complexity because each element doubles the number of recursive paths. For n=30, that’s over a billion operations. For n=50, you’re looking at heat death of the universe territory.
The recursion tree reveals massive redundancy. The subproblem “can we make sum 10 using elements from index 5 onwards?” might be computed hundreds of times through different paths. This overlap is exactly what dynamic programming exploits.
Dynamic Programming Solution (Tabulation)
The bottom-up DP approach builds a 2D boolean table where dp[i][j] answers: “Can we achieve sum j using the first i elements?”
The state transition logic is straightforward:
- If we could achieve sum
jwithout elementi, we can still achieve it (exclude the element) - If we could achieve sum
j - nums[i]without elementi, we can now achieve sumj(include the element)
def subset_sum_dp_2d(nums: list[int], target: int) -> bool:
n = len(nums)
# dp[i][j] = can we make sum j using first i elements?
dp = [[False] * (target + 1) for _ in range(n + 1)]
# Base case: sum of 0 is always achievable (empty subset)
for i in range(n + 1):
dp[i][0] = True
# Fill the table
for i in range(1, n + 1):
for j in range(1, target + 1):
# Exclude current element
dp[i][j] = dp[i - 1][j]
# Include current element (if it doesn't exceed target)
if j >= nums[i - 1]:
dp[i][j] = dp[i][j] or dp[i - 1][j - nums[i - 1]]
return dp[n][target]
This runs in O(n × target) time with O(n × target) space. Notice this is pseudo-polynomial—it’s polynomial in the numeric value of the target, not the input size. If your target is 10^9, this approach becomes impractical.
Since each row only depends on the previous row, we can collapse the 2D table into a 1D array:
def subset_sum_dp_1d(nums: list[int], target: int) -> bool:
dp = [False] * (target + 1)
dp[0] = True # Empty subset sums to 0
for num in nums:
# Iterate backwards to avoid using updated values
for j in range(target, num - 1, -1):
dp[j] = dp[j] or dp[j - num]
return dp[target]
The backwards iteration is critical. If we iterated forwards, dp[j - num] might already reflect the inclusion of the current element, allowing us to use the same element multiple times. This is actually the unbounded knapsack variant—useful for coin change problems, but wrong here.
Memoization Approach (Top-Down DP)
Memoization keeps the recursive structure but caches results. This approach often feels more natural and handles sparse state spaces efficiently—we only compute subproblems actually needed.
def subset_sum_memo(nums: list[int], target: int) -> bool:
memo = {}
def helper(index: int, remaining: int) -> bool:
# Check cache first
if (index, remaining) in memo:
return memo[(index, remaining)]
# Base cases
if remaining == 0:
return True
if index >= len(nums) or remaining < 0:
return False
# Recursive exploration with caching
result = (helper(index + 1, remaining - nums[index]) or
helper(index + 1, remaining))
memo[(index, remaining)] = result
return result
return helper(0, target)
Memoization and tabulation have the same asymptotic complexity, but their practical performance differs. Memoization avoids computing unreachable states and handles early termination naturally. Tabulation has better cache locality and no recursion overhead. For subset sum specifically, tabulation usually wins because most states are reachable.
Backtracking Solution
When you need the actual subset—not just whether one exists—backtracking becomes attractive. It systematically explores the solution space while pruning branches that can’t lead to valid solutions.
def subset_sum_backtrack(nums: list[int], target: int) -> list[int] | None:
nums_sorted = sorted(nums, reverse=True) # Larger elements first for better pruning
result = []
def backtrack(index: int, remaining: int) -> bool:
if remaining == 0:
return True
if index >= len(nums_sorted) or remaining < 0:
return False
# Pruning: if smallest remaining element exceeds remaining sum, stop
# (only valid if all elements are positive)
if nums_sorted[-1] > remaining:
return False
for i in range(index, len(nums_sorted)):
# Skip duplicates to avoid redundant work
if i > index and nums_sorted[i] == nums_sorted[i - 1]:
continue
if nums_sorted[i] > remaining:
continue
result.append(nums_sorted[i])
if backtrack(i + 1, remaining - nums_sorted[i]):
return True
result.pop() # Backtrack
return False
if backtrack(0, target):
return result
return None
Backtracking shines when solutions exist near the beginning of the search space or when aggressive pruning eliminates large subtrees. Sorting elements in descending order helps find solutions faster—larger elements reduce the remaining sum quickly, leading to earlier termination or pruning.
Complexity Analysis & Comparison
| Approach | Time Complexity | Space Complexity | Returns Subset? |
|---|---|---|---|
| Naive Recursive | O(2^n) | O(n) stack | Modifiable |
| DP Tabulation (2D) | O(n × target) | O(n × target) | With backtracing |
| DP Tabulation (1D) | O(n × target) | O(target) | No |
| Memoization | O(n × target) | O(n × target) | Modifiable |
| Backtracking | O(2^n) worst | O(n) | Yes |
Choose DP when:
- The target sum is bounded and reasonably small
- You need guaranteed polynomial-time performance
- You’re solving the decision problem or counting variants
Choose backtracking when:
- You need the actual subset elements
- Solutions are likely to exist and be found early
- The input has structure that enables effective pruning
Variations & Extensions
The subset sum framework extends to several related problems. Counting the number of valid subsets requires a simple modification—instead of boolean OR, we sum the counts:
def count_subsets(nums: list[int], target: int) -> int:
dp = [0] * (target + 1)
dp[0] = 1 # One way to make sum 0: empty subset
for num in nums:
for j in range(target, num - 1, -1):
dp[j] += dp[j - num]
return dp[target]
The partition equal subset sum problem asks whether an array can be split into two subsets with equal sums. This reduces to subset sum with target = total_sum / 2 (after checking that total_sum is even).
Finding the subset with sum closest to a target uses the same DP table but scans for the largest achievable sum not exceeding the target. This variant appears in load balancing and fair division problems.
These problems inherit subset sum’s NP-completeness, meaning no known polynomial-time algorithm exists for the general case. However, the pseudo-polynomial DP solutions remain practical for bounded inputs—which covers most real-world scenarios where you’re dealing with reasonable numeric ranges rather than cryptographic-scale integers.