Binary Search: Divide and Conquer Search

Key Insights

Binary search achieves O(log n) time complexity by eliminating half the search space with each comparison, making it exponentially faster than linear search for large datasets.
The algorithm’s simplicity hides subtle pitfalls—integer overflow in midpoint calculation and off-by-one errors cause the majority of binary search bugs in production code.
Beyond basic searching, binary search variations solve problems like finding insertion points, locating boundaries in sorted data, and debugging with git bisect.

Introduction to Binary Search

Binary search is the canonical divide and conquer algorithm. Given a sorted collection, it finds a target value by repeatedly dividing the search space in half. Each comparison eliminates 50% of remaining candidates.

The performance difference is dramatic. Linear search through a million elements requires up to one million comparisons. Binary search needs at most 20. That’s the power of O(log n) versus O(n).

There’s one non-negotiable prerequisite: your data must be sorted. Binary search on unsorted data produces garbage results. If you’re searching repeatedly, the O(n log n) cost of sorting pays for itself quickly. If you’re searching once, linear search might be the better choice.

How Binary Search Works

The algorithm maintains two pointers—low and high—that define the current search space. On each iteration:

Calculate the midpoint between low and high
Compare the target with the element at midpoint
If equal, return the position
If target is smaller, search the left half (move high)
If target is larger, search the right half (move low)
Repeat until found or search space is empty

Here’s a clean iterative implementation:

def binary_search(arr: list[int], target: int) -> int:
    """
    Returns index of target in arr, or -1 if not found.
    """
    low, high = 0, len(arr) - 1
    
    while low <= high:
        mid = low + (high - low) // 2
        
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            low = mid + 1
        else:
            high = mid - 1
    
    return -1

Consider searching for 23 in [2, 5, 8, 12, 16, 23, 38, 56, 72, 91]:

First iteration: mid=4, arr[4]=16 < 23, search right half
Second iteration: mid=7, arr[7]=56 > 23, search left half
Third iteration: mid=5, arr[5]=23, found at index 5

Three comparisons instead of six with linear search. The savings compound as data grows.

Iterative vs Recursive Implementations

The recursive version maps directly to the algorithm’s definition:

def binary_search_recursive(arr: list[int], target: int, 
                            low: int = 0, high: int = None) -> int:
    """
    Recursive binary search implementation.
    """
    if high is None:
        high = len(arr) - 1
    
    # Base case: search space exhausted
    if low > high:
        return -1
    
    mid = low + (high - low) // 2
    
    if arr[mid] == target:
        return mid
    elif arr[mid] < target:
        return binary_search_recursive(arr, target, mid + 1, high)
    else:
        return binary_search_recursive(arr, target, low, mid - 1)

The trade-offs are straightforward:

Aspect	Iterative	Recursive
Space complexity	O(1)	O(log n) stack frames
Readability	Slightly more verbose	Maps to mathematical definition
Performance	Marginally faster	Function call overhead
Stack overflow risk	None	Possible with deep recursion

For production code, prefer iterative. The recursive version is elegant for teaching but wastes stack space. Python’s default recursion limit of 1000 means recursive binary search fails on arrays larger than 2^1000 elements—not a practical concern, but the principle matters.

Edge Cases and Common Pitfalls

Binary search looks simple. It isn’t. Jon Bentley reported that 90% of professional programmers couldn’t write a correct binary search in two hours. Here’s what trips people up.

Integer overflow in midpoint calculation

The naive calculation (low + high) / 2 overflows when low + high exceeds the maximum integer value. This bug lurked in Java’s Arrays.binarySearch() for nine years before being discovered.

// WRONG: overflows for large arrays
int mid = (low + high) / 2;

// CORRECT: safe calculation
int mid = low + (high - low) / 2;

// ALSO CORRECT: using unsigned right shift (Java/C++)
int mid = (low + high) >>> 1;

In Python, integers have arbitrary precision, so overflow isn’t a concern. In languages with fixed-width integers, always use the safe formula.

Off-by-one errors

The boundary conditions are subtle. Using low < high instead of low <= high misses single-element arrays. Using high = mid instead of high = mid - 1 creates infinite loops.

# Common mistake: wrong loop condition
while low < high:  # WRONG: misses when low == high
    # ...

# Common mistake: wrong boundary update
high = mid  # WRONG: can cause infinite loop when mid == low

Empty and single-element arrays

Always handle these explicitly in your mental model:

def binary_search_safe(arr: list[int], target: int) -> int:
    if not arr:
        return -1
    
    low, high = 0, len(arr) - 1
    
    while low <= high:
        mid = low + (high - low) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            low = mid + 1
        else:
            high = mid - 1
    
    return -1

Variations and Extensions

The basic algorithm finds any occurrence of a target. Real problems often need more specific positions.

Finding the insertion point

Where should a new element go to maintain sorted order? This is what Python’s bisect module provides:

def find_insertion_point(arr: list[int], target: int) -> int:
    """
    Returns the index where target should be inserted.
    Equivalent to bisect.bisect_left().
    """
    low, high = 0, len(arr)
    
    while low < high:
        mid = low + (high - low) // 2
        if arr[mid] < target:
            low = mid + 1
        else:
            high = mid
    
    return low

Note the differences from standard binary search: high starts at len(arr) (not len(arr) - 1), the loop condition is < (not <=), and we don’t check for equality.

Finding the leftmost occurrence

When duplicates exist, find the first one:

def find_leftmost(arr: list[int], target: int) -> int:
    """
    Returns index of first occurrence of target, or -1 if not found.
    """
    low, high = 0, len(arr) - 1
    result = -1
    
    while low <= high:
        mid = low + (high - low) // 2
        if arr[mid] == target:
            result = mid
            high = mid - 1  # Keep searching left
        elif arr[mid] < target:
            low = mid + 1
        else:
            high = mid - 1
    
    return result

The key insight: when you find the target, don’t stop. Record the position and keep searching left for an earlier occurrence.

Real-World Applications

Binary search principles appear throughout computing.

Database indexing uses B-trees, which are essentially multi-way binary search structures. Each node comparison eliminates a fraction of the data, enabling O(log n) lookups in tables with billions of rows.

Git bisect automates debugging by binary searching through commit history. Given a known good commit and a known bad commit, it finds the regression in O(log n) test runs.

Search autocomplete uses binary search to find the range of completions matching a prefix in a sorted dictionary.

Here’s a practical variation—searching a rotated sorted array (a common interview question that appears in real systems like circular buffers):

def search_rotated(arr: list[int], target: int) -> int:
    """
    Search in a sorted array that's been rotated.
    Example: [4, 5, 6, 7, 0, 1, 2] is [0,1,2,4,5,6,7] rotated.
    """
    low, high = 0, len(arr) - 1
    
    while low <= high:
        mid = low + (high - low) // 2
        
        if arr[mid] == target:
            return mid
        
        # Determine which half is sorted
        if arr[low] <= arr[mid]:
            # Left half is sorted
            if arr[low] <= target < arr[mid]:
                high = mid - 1
            else:
                low = mid + 1
        else:
            # Right half is sorted
            if arr[mid] < target <= arr[high]:
                low = mid + 1
            else:
                high = mid - 1
    
    return -1

The trick is recognizing that at least one half is always sorted, then determining which half could contain the target.

Performance Analysis and When to Use

Time complexity: O(log n) for all cases. Each comparison halves the search space.

Space complexity: O(1) for iterative, O(log n) for recursive due to call stack.

When binary search is the wrong choice:

Unsorted data where sorting cost exceeds search savings
Small arrays (linear search has less overhead for n < 10-20)
Linked lists (no O(1) random access means no O(log n) search)
Frequently changing data where maintaining sort order is expensive
When you need the element, not just existence (hash tables give O(1) lookup)

Interpolation search is worth mentioning as an alternative. Instead of always checking the midpoint, it estimates position based on the target’s value relative to the range. For uniformly distributed data, it achieves O(log log n) average case. For non-uniform data, it degrades to O(n). Binary search’s consistent O(log n) makes it the safer default.

Binary search is a fundamental tool. Master the basic implementation, understand the edge cases, and recognize the variations. The pattern of eliminating half the problem space appears constantly in algorithm design—from merge sort to tree traversals to distributed systems. Get this one right.