Interval Tree: Overlapping Interval Queries

An interval tree is a specialized data structure for storing intervals and efficiently answering the question: 'Which intervals overlap with this point or range?' This seemingly simple query appears...

Key Insights

  • Interval trees augment binary search trees with a “max endpoint” value at each node, enabling O(log n + k) overlap queries by pruning entire subtrees that cannot contain matches.
  • The critical insight is that if a subtree’s max endpoint is less than your query’s start point, no interval in that subtree can possibly overlap—skip it entirely.
  • For most production systems, use an existing implementation (like Python’s intervaltree package or Java’s IntervalTree from Guava) rather than rolling your own, but understanding the internals helps you debug performance issues.

Introduction to Interval Trees

An interval tree is a specialized data structure for storing intervals and efficiently answering the question: “Which intervals overlap with this point or range?” This seemingly simple query appears everywhere in software systems.

Consider calendar applications checking for meeting conflicts. Genomics tools finding overlapping gene sequences. Network routers matching packets against IP address ranges. Database engines managing range locks. Game engines detecting collisions between objects. All of these need fast interval overlap queries.

The interval tree solves this by augmenting a binary search tree with additional information that enables aggressive pruning during searches. Instead of checking every interval, you can skip entire subtrees that provably contain no overlapping intervals.

The Problem with Naive Approaches

The straightforward approach stores intervals in a list and checks each one:

from dataclasses import dataclass
from typing import List, Tuple

@dataclass
class Interval:
    start: int
    end: int
    data: any = None  # Optional payload

def find_overlaps_naive(intervals: List[Interval], query_start: int, query_end: int) -> List[Interval]:
    """O(n) brute-force overlap detection."""
    results = []
    for interval in intervals:
        # Two intervals overlap if neither is completely before or after the other
        if interval.start <= query_end and interval.end >= query_start:
            results.append(interval)
    return results

This works fine for small datasets. With 100 intervals and occasional queries, the simplicity wins. But consider a calendar system with 10 million events, or a genomics database with billions of annotated regions. Scanning every interval for every query becomes prohibitively expensive.

The complexity breakdown tells the story:

Operation Naive List Interval Tree
Build O(n) O(n log n)
Query O(n) O(log n + k)
Insert O(1) O(log n)

Here, k is the number of results returned. The interval tree trades slightly more expensive insertions for dramatically faster queries. When you’re querying frequently against a large dataset, this tradeoff pays off handsomely.

Interval Tree Structure and Properties

An interval tree is a binary search tree ordered by interval start points, with one crucial augmentation: each node stores the maximum endpoint value found anywhere in its subtree.

This max value is the key insight. When searching for overlaps, if a subtree’s max endpoint is less than your query’s start point, no interval in that subtree can overlap with your query. You skip the entire subtree in O(1) time instead of examining its contents.

from dataclasses import dataclass, field
from typing import Optional, List

@dataclass
class Interval:
    start: int
    end: int
    data: any = None
    
    def overlaps(self, other_start: int, other_end: int) -> bool:
        """Check if this interval overlaps with the given range."""
        return self.start <= other_end and self.end >= other_start

@dataclass
class IntervalTreeNode:
    interval: Interval
    max_end: int = field(init=False)  # Max endpoint in this subtree
    left: Optional['IntervalTreeNode'] = None
    right: Optional['IntervalTreeNode'] = None
    
    def __post_init__(self):
        # Initially, max_end is just this interval's endpoint
        self.max_end = self.interval.end
    
    def update_max(self):
        """Recalculate max_end based on children."""
        self.max_end = self.interval.end
        if self.left:
            self.max_end = max(self.max_end, self.left.max_end)
        if self.right:
            self.max_end = max(self.max_end, self.right.max_end)

The max_end field propagates upward through the tree. A node’s max_end equals the maximum of: its own interval’s endpoint, its left child’s max_end, and its right child’s max_end. This invariant must be maintained during every insertion and deletion.

Insertion follows standard BST logic based on interval start points, but propagates max values back up the tree:

class IntervalTree:
    def __init__(self):
        self.root: Optional[IntervalTreeNode] = None
    
    def insert(self, interval: Interval):
        """Insert an interval into the tree."""
        self.root = self._insert_recursive(self.root, interval)
    
    def _insert_recursive(self, node: Optional[IntervalTreeNode], 
                          interval: Interval) -> IntervalTreeNode:
        if node is None:
            return IntervalTreeNode(interval)
        
        # BST insertion based on start point
        if interval.start < node.interval.start:
            node.left = self._insert_recursive(node.left, interval)
        else:
            node.right = self._insert_recursive(node.right, interval)
        
        # Critical: update max_end after insertion
        node.update_max()
        return node
    
    def query_overlaps(self, start: int, end: int) -> List[Interval]:
        """Find all intervals overlapping with [start, end]."""
        results = []
        self._query_recursive(self.root, start, end, results)
        return results
    
    def _query_recursive(self, node: Optional[IntervalTreeNode],
                         start: int, end: int, results: List[Interval]):
        if node is None:
            return
        
        # Check if current node's interval overlaps
        if node.interval.overlaps(start, end):
            results.append(node.interval)
        
        # Pruning decision for left subtree:
        # Only search left if left.max_end >= start
        # (meaning some interval in left subtree might overlap)
        if node.left and node.left.max_end >= start:
            self._query_recursive(node.left, start, end, results)
        
        # Pruning decision for right subtree:
        # Only search right if our query end >= the start of intervals there
        # Since right subtree has intervals with start >= node.interval.start,
        # we search if end >= node.interval.start (conservative) or always search
        # A tighter bound: search if start <= node's interval start or there could be overlap
        if node.right and node.interval.start <= end:
            self._query_recursive(node.right, start, end, results)
    
    def query_point(self, point: int) -> List[Interval]:
        """Find all intervals containing the given point."""
        return self.query_overlaps(point, point)

The pruning logic in _query_recursive is where the magic happens. When deciding whether to search the left subtree, we check if left.max_end >= start. If the maximum endpoint in the entire left subtree is less than our query’s start, no interval there can possibly overlap—we skip it entirely.

Balancing Considerations

An unbalanced interval tree degrades to a linked list, making queries O(n) again. In production, you need a self-balancing variant.

The standard approach uses a red-black tree or AVL tree as the underlying structure. The complication is maintaining the max_end invariant during rotations:

def _rotate_left(self, node: IntervalTreeNode) -> IntervalTreeNode:
    """Left rotation preserving max_end invariant."""
    new_root = node.right
    node.right = new_root.left
    new_root.left = node
    
    # Order matters: update the lower node first
    node.update_max()      # node is now lower in tree
    new_root.update_max()  # new_root depends on node's updated max
    
    return new_root

def _rotate_right(self, node: IntervalTreeNode) -> IntervalTreeNode:
    """Right rotation preserving max_end invariant."""
    new_root = node.left
    node.left = new_root.right
    new_root.right = node
    
    node.update_max()
    new_root.update_max()
    
    return new_root

After any rotation, you must recalculate max_end from the bottom up. The node that moved down gets updated first, then its new parent. Missing this step is a common bug that causes incorrect query results.

Practical Applications and Variations

Here’s a complete meeting scheduler that detects conflicts:

from dataclasses import dataclass
from datetime import datetime, timedelta
from typing import List, Optional

@dataclass
class Meeting:
    title: str
    start: datetime
    end: datetime
    
    def to_interval(self) -> Interval:
        # Convert to timestamps for tree storage
        return Interval(
            start=int(self.start.timestamp()),
            end=int(self.end.timestamp()),
            data=self
        )

class MeetingScheduler:
    def __init__(self):
        self.tree = IntervalTree()
        self.meetings: List[Meeting] = []
    
    def schedule(self, meeting: Meeting) -> tuple[bool, List[Meeting]]:
        """
        Attempt to schedule a meeting.
        Returns (success, conflicts) tuple.
        """
        interval = meeting.to_interval()
        conflicts = self.tree.query_overlaps(interval.start, interval.end)
        
        if conflicts:
            # Return the actual Meeting objects, not intervals
            return False, [c.data for c in conflicts]
        
        self.tree.insert(interval)
        self.meetings.append(meeting)
        return True, []
    
    def find_meetings_at(self, time: datetime) -> List[Meeting]:
        """Find all meetings occurring at a specific time."""
        timestamp = int(time.timestamp())
        intervals = self.tree.query_point(timestamp)
        return [i.data for i in intervals]

# Usage example
scheduler = MeetingScheduler()
base = datetime(2024, 1, 15, 9, 0)

meetings = [
    Meeting("Standup", base, base + timedelta(minutes=30)),
    Meeting("Design Review", base + timedelta(hours=1), base + timedelta(hours=2)),
    Meeting("Conflict!", base + timedelta(minutes=15), base + timedelta(hours=1)),
]

for meeting in meetings:
    success, conflicts = scheduler.schedule(meeting)
    if not success:
        print(f"Cannot schedule '{meeting.title}' - conflicts with: "
              f"{[c.title for c in conflicts]}")

Variations worth knowing:

  • Centered interval trees split intervals at a center point, storing intervals that span the center separately. Better for static datasets.
  • Segment trees solve a different problem: aggregate queries over ranges (sum, min, max) rather than finding overlapping intervals.
  • R-trees extend the concept to multiple dimensions for spatial queries.

Performance Analysis and When to Use

Time complexity:

  • Build: O(n log n) for n intervals
  • Insert: O(log n) amortized with balancing
  • Query: O(log n + k) where k is result count
  • Delete: O(log n) amortized with balancing

Space complexity: O(n)

When interval trees are overkill:

  • Fewer than ~1000 intervals with infrequent queries
  • Static data where you can sort once and binary search
  • Single-point queries only (a sorted list with binary search suffices)

When to use interval trees:

  • Large datasets with frequent overlap queries
  • Dynamic data with insertions and deletions
  • Range-to-range overlap queries

Production implementations:

  • Python: intervaltree package (pip install intervaltree)
  • Java: Guava’s RangeSet or RangeMap
  • C++: Boost.Icl (Interval Container Library)
  • Go: github.com/biogo/store/interval

These battle-tested libraries handle balancing, edge cases, and optimizations you don’t want to reimplement. Use them unless you have specific requirements they don’t meet.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.