Segment Tree: Range Query Data Structure
Consider a common scenario: you have an array of a million integers representing sensor readings, and you need to repeatedly answer questions like 'what's the sum of readings between index 50,000 and...
Key Insights
- Segment trees provide O(log n) complexity for both range queries and point updates, solving the fundamental trade-off between prefix sums (fast queries, slow updates) and naive arrays (slow queries, fast updates).
- Lazy propagation transforms range updates from O(n) to O(log n) by deferring work until absolutely necessary—a pattern applicable far beyond segment trees.
- The array-based representation using indices 2i and 2i+1 for children eliminates pointer overhead and improves cache performance, making segment trees practical for production systems.
The Range Query Problem
Consider a common scenario: you have an array of a million integers representing sensor readings, and you need to repeatedly answer questions like “what’s the sum of readings between index 50,000 and 75,000?” while also updating individual readings as new data arrives.
The naive approach scans the range for each query—O(n) time that becomes unacceptable at scale. Prefix sums flip this trade-off: O(1) queries but O(n) updates since every prefix must be recalculated. Neither solution handles the mixed workload well.
Segment trees solve this elegantly. By organizing data into a binary tree where each node represents an array segment, both queries and updates complete in O(log n) time. This balanced approach makes segment trees essential for time-series databases, real-time analytics, and competitive programming.
Segment Tree Fundamentals
A segment tree is a binary tree where each node stores aggregate information about a contiguous segment of the underlying array. Leaf nodes represent individual elements. Internal nodes represent the combination (sum, minimum, maximum, etc.) of their children’s segments.
For an array of n elements, the tree has n leaves and n-1 internal nodes. We store this in a flat array using implicit indexing: node i has children at 2i and 2i+1, with its parent at i/2. This eliminates pointer overhead entirely.
class SegmentTree:
def __init__(self, arr):
self.n = len(arr)
# Tree array: index 0 unused, 1 is root
# Leaves occupy indices n to 2n-1
self.tree = [0] * (2 * self.n)
# Initialize leaves
for i in range(self.n):
self.tree[self.n + i] = arr[i]
# Build internal nodes bottom-up
for i in range(self.n - 1, 0, -1):
self.tree[i] = self.tree[2 * i] + self.tree[2 * i + 1]
The construction runs in O(n) time. We first copy array elements to leaf positions (indices n through 2n-1), then build internal nodes by combining children. The root at index 1 contains the total sum of all elements.
This 2n space representation works when n is a power of two or when we handle the edge cases carefully. For arbitrary n, the iterative approach above handles it naturally.
Query Operations
Range queries traverse the tree, collecting contributions from nodes whose segments fall within the query range. The logic handles three cases:
- Complete overlap: The node’s segment lies entirely within the query range. Return the node’s value.
- No overlap: The node’s segment lies entirely outside the query range. Return the identity element (0 for sum, ∞ for min).
- Partial overlap: Recurse into both children and combine results.
def query(self, left, right):
"""Query sum of elements in [left, right) - half-open interval."""
result = 0
left += self.n # Convert to tree indices
right += self.n
while left < right:
if left % 2 == 1: # left is a right child
result += self.tree[left]
left += 1
if right % 2 == 1: # right is a right child
right -= 1
result += self.tree[right]
left //= 2
right //= 2
return result
This iterative version is elegant. We start at the leaves and move up. When the left pointer is a right child (odd index), its parent covers elements outside our range, so we take this node’s value and move right. Similarly for the right pointer. Then both pointers move to their parents.
The loop runs O(log n) iterations since we halve the range each step. Each iteration does O(1) work, giving us O(log n) total.
Update Operations
Point updates modify a leaf and propagate changes upward. Every ancestor of the modified leaf must be recalculated since their aggregate values depend on the changed element.
def update(self, index, value):
"""Set element at index to value."""
index += self.n # Convert to tree index
self.tree[index] = value
# Propagate changes up to root
while index > 1:
index //= 2
self.tree[index] = self.tree[2 * index] + self.tree[2 * index + 1]
Starting from the leaf, we move to the parent and recalculate its value from both children. This continues until we reach the root. The path length is O(log n), and each step does O(1) work.
The update and query operations share the same logarithmic complexity, making segment trees ideal for workloads with interleaved reads and writes.
Lazy Propagation for Range Updates
What if you need to add 5 to every element between indices 1000 and 2000? Without optimization, you’d call the point update function 1000 times—O(n log n) total.
Lazy propagation solves this by storing pending updates at internal nodes and only pushing them down when necessary. The key insight: if a query doesn’t need to look inside a node, we don’t need to update its children yet.
class LazySegmentTree:
def __init__(self, arr):
self.n = len(arr)
self.tree = [0] * (4 * self.n) # Extra space for safety
self.lazy = [0] * (4 * self.n) # Pending additions
self._build(arr, 1, 0, self.n - 1)
def _build(self, arr, node, start, end):
if start == end:
self.tree[node] = arr[start]
else:
mid = (start + end) // 2
self._build(arr, 2 * node, start, mid)
self._build(arr, 2 * node + 1, mid + 1, end)
self.tree[node] = self.tree[2 * node] + self.tree[2 * node + 1]
def _push_down(self, node, start, end):
"""Push pending updates to children."""
if self.lazy[node] != 0:
mid = (start + end) // 2
# Update children's values
self.tree[2 * node] += self.lazy[node] * (mid - start + 1)
self.tree[2 * node + 1] += self.lazy[node] * (end - mid)
# Transfer lazy values
self.lazy[2 * node] += self.lazy[node]
self.lazy[2 * node + 1] += self.lazy[node]
self.lazy[node] = 0
def range_update(self, left, right, delta, node=1, start=0, end=None):
"""Add delta to all elements in [left, right]."""
if end is None:
end = self.n - 1
if right < start or end < left: # No overlap
return
if left <= start and end <= right: # Complete overlap
self.tree[node] += delta * (end - start + 1)
self.lazy[node] += delta
return
# Partial overlap: push down and recurse
self._push_down(node, start, end)
mid = (start + end) // 2
self.range_update(left, right, delta, 2 * node, start, mid)
self.range_update(left, right, delta, 2 * node + 1, mid + 1, end)
self.tree[node] = self.tree[2 * node] + self.tree[2 * node + 1]
The lazy array stores pending additions. When we update a range that completely covers a node’s segment, we update the node’s value and mark the lazy value—but don’t touch children. Only when a query or update needs to examine children do we push the lazy value down.
This achieves O(log n) for range updates, a massive improvement over the naive O(n log n).
Practical Applications
Segment trees shine in several domains:
Time-series analytics: Finding the maximum temperature in any date range while continuously ingesting new readings. Traditional databases struggle with this; segment trees handle it naturally.
Game leaderboards: Answering “how many players have scores between X and Y?” with real-time score updates. The merge function counts elements in range.
Computational geometry: Counting intersections, finding closest points, and processing sweep line algorithms all leverage segment tree variants.
Here’s a practical example—range maximum query with updates:
class MaxSegmentTree:
def __init__(self, arr):
self.n = len(arr)
self.tree = [float('-inf')] * (2 * self.n)
for i in range(self.n):
self.tree[self.n + i] = arr[i]
for i in range(self.n - 1, 0, -1):
self.tree[i] = max(self.tree[2 * i], self.tree[2 * i + 1])
def update(self, index, value):
index += self.n
self.tree[index] = value
while index > 1:
index //= 2
self.tree[index] = max(self.tree[2 * index], self.tree[2 * index + 1])
def query_max(self, left, right):
result = float('-inf')
left += self.n
right += self.n
while left < right:
if left % 2 == 1:
result = max(result, self.tree[left])
left += 1
if right % 2 == 1:
right -= 1
result = max(result, self.tree[right])
left //= 2
right //= 2
return result
The only changes from the sum version: replace addition with max, and use negative infinity as the identity element.
Implementation Considerations
Iterative vs. recursive: The iterative version shown here is faster due to no function call overhead and better cache behavior. Use it for production code. The recursive version is easier to understand and extend for lazy propagation.
When to use alternatives: Fenwick trees (Binary Indexed Trees) use less memory and are faster for prefix sums specifically. Sparse tables achieve O(1) query time for idempotent operations (min, max, gcd) when updates aren’t needed. Choose segment trees when you need both updates and arbitrary range queries.
Common pitfalls: Off-by-one errors plague segment tree implementations. Decide early whether ranges are inclusive or half-open, and stick with it. The merge function must be associative—addition, max, min, and gcd work; median does not.
Segment trees are a fundamental data structure that every systems programmer should understand. They solve a real problem—efficient range queries with updates—and the underlying principles of lazy evaluation and hierarchical aggregation appear throughout computer science.