Tree Sort: BST-Based Sorting Method

Tree sort is one of those algorithms that seems elegant in theory but rarely gets recommended in practice. The concept is straightforward: insert all elements into a Binary Search Tree (BST), then...

Key Insights

  • Tree sort leverages Binary Search Tree properties to achieve O(n log n) average-case sorting, but degrades to O(n²) with sorted input unless you use self-balancing trees
  • The algorithm shines when you need to maintain a sorted collection with ongoing insertions and deletions, not as a one-shot sorting mechanism
  • For pure sorting tasks, quicksort and mergesort outperform tree sort due to better cache locality and lower constant factors—use tree sort when you need the tree structure itself

Introduction to Tree Sort

Tree sort is one of those algorithms that seems elegant in theory but rarely gets recommended in practice. The concept is straightforward: insert all elements into a Binary Search Tree (BST), then walk the tree in-order to extract elements in sorted order. The BST property guarantees that an in-order traversal visits nodes in ascending order.

The complexity profile tells an important story. Average case runs in O(n log n) time—competitive with the best comparison-based sorts. But the worst case degrades to O(n²), which happens more often than you’d like. Space complexity sits at O(n) for storing the tree nodes, plus O(h) stack space for traversal where h is the tree height.

Understanding tree sort matters not because you’ll use it as your go-to sorting algorithm, but because it illuminates the relationship between data structures and algorithms. The same BST that powers database indexes and in-memory sorted collections can sort data—it’s all about how you use the structure.

How Tree Sort Works

The algorithm proceeds in two distinct phases:

Phase 1: Build the BST Iterate through the input array and insert each element into the BST. Each insertion follows the standard BST rule: if the new value is less than the current node, go left; otherwise, go right. Continue until you find an empty spot.

Phase 2: In-Order Traversal Walk the tree using in-order traversal (left subtree → node → right subtree). This visits every node exactly once, outputting values in sorted order.

Here’s the fundamental BST node structure and insertion logic:

class TreeNode:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None

def insert(root, value):
    if root is None:
        return TreeNode(value)
    
    if value < root.value:
        root.left = insert(root.left, value)
    else:
        root.right = insert(root.right, value)
    
    return root

The insertion function is recursive, but you can easily convert it to iterative if stack depth concerns you. Each insertion takes O(h) time where h is the current tree height—O(log n) for balanced trees, O(n) for degenerate cases.

Implementation

Let’s build a complete, working tree sort implementation:

class TreeNode:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None

class TreeSort:
    def __init__(self):
        self.root = None
    
    def insert(self, value):
        """Insert a value into the BST."""
        if self.root is None:
            self.root = TreeNode(value)
            return
        
        current = self.root
        while True:
            if value < current.value:
                if current.left is None:
                    current.left = TreeNode(value)
                    return
                current = current.left
            else:
                if current.right is None:
                    current.right = TreeNode(value)
                    return
                current = current.right
    
    def inorder_traversal(self, node, result):
        """Collect values in sorted order via in-order traversal."""
        if node is None:
            return
        
        self.inorder_traversal(node.left, result)
        result.append(node.value)
        self.inorder_traversal(node.right, result)
    
    def sort(self, arr):
        """Sort an array using tree sort."""
        self.root = None
        
        # Phase 1: Build BST
        for value in arr:
            self.insert(value)
        
        # Phase 2: Extract sorted values
        result = []
        self.inorder_traversal(self.root, result)
        return result

# Usage
sorter = TreeSort()
data = [64, 34, 25, 12, 22, 11, 90]
sorted_data = sorter.sort(data)
print(sorted_data)  # [11, 12, 22, 25, 34, 64, 90]

Here’s the equivalent in JavaScript for those working in that ecosystem:

class TreeNode {
    constructor(value) {
        this.value = value;
        this.left = null;
        this.right = null;
    }
}

function treeSort(arr) {
    if (arr.length === 0) return [];
    
    let root = null;
    
    // Insert into BST
    function insert(node, value) {
        if (node === null) return new TreeNode(value);
        
        if (value < node.value) {
            node.left = insert(node.left, value);
        } else {
            node.right = insert(node.right, value);
        }
        return node;
    }
    
    // In-order traversal
    function inorder(node, result) {
        if (node === null) return;
        inorder(node.left, result);
        result.push(node.value);
        inorder(node.right, result);
    }
    
    // Build tree
    for (const value of arr) {
        root = insert(root, value);
    }
    
    // Extract sorted values
    const result = [];
    inorder(root, result);
    return result;
}

// Usage
const data = [64, 34, 25, 12, 22, 11, 90];
console.log(treeSort(data)); // [11, 12, 22, 25, 34, 64, 90]

Handling the Worst Case with Self-Balancing Trees

The elephant in the room: what happens when you feed sorted data into tree sort?

# Sorted input creates a degenerate tree
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Each insertion goes to the right child, creating a linked list instead of a tree. The height becomes n instead of log n, and every insertion takes O(n) time. Total complexity: O(n²).

This isn’t a theoretical concern—real-world data often has patterns. Log timestamps, sequential IDs, and alphabetically sorted names all trigger this behavior.

The solution: self-balancing BSTs. AVL trees and Red-Black trees automatically rebalance after insertions, guaranteeing O(log n) height.

# Using Python's sortedcontainers library (Red-Black tree based)
from sortedcontainers import SortedList

def balanced_tree_sort(arr):
    sorted_list = SortedList(arr)
    return list(sorted_list)

# This maintains O(n log n) even for sorted input
data = list(range(10000))  # Worst case for naive BST
result = balanced_tree_sort(data)  # Still O(n log n)

If you’re implementing from scratch, here’s a simplified AVL rotation concept:

def get_height(node):
    return node.height if node else 0

def get_balance(node):
    return get_height(node.left) - get_height(node.right) if node else 0

def rotate_right(y):
    x = y.left
    T2 = x.right
    
    x.right = y
    y.left = T2
    
    y.height = 1 + max(get_height(y.left), get_height(y.right))
    x.height = 1 + max(get_height(x.left), get_height(x.right))
    
    return x

For production code, use established libraries rather than rolling your own balanced tree implementation.

Performance Analysis and Benchmarks

Let’s see how tree sort stacks up against the competition:

import time
import random
from sortedcontainers import SortedList

def benchmark_sorts(sizes):
    results = {}
    
    for size in sizes:
        data = [random.randint(0, 1000000) for _ in range(size)]
        
        # Naive tree sort
        sorter = TreeSort()
        start = time.perf_counter()
        sorter.sort(data.copy())
        naive_time = time.perf_counter() - start
        
        # Balanced tree sort
        start = time.perf_counter()
        list(SortedList(data.copy()))
        balanced_time = time.perf_counter() - start
        
        # Built-in sort (Timsort)
        start = time.perf_counter()
        sorted(data.copy())
        builtin_time = time.perf_counter() - start
        
        results[size] = {
            'naive_tree': naive_time,
            'balanced_tree': balanced_time,
            'timsort': builtin_time
        }
        
        print(f"Size {size:,}:")
        print(f"  Naive Tree Sort:    {naive_time:.4f}s")
        print(f"  Balanced Tree Sort: {balanced_time:.4f}s")
        print(f"  Timsort:            {builtin_time:.4f}s")
        print()

benchmark_sorts([1000, 10000, 100000])

Typical results show Timsort winning by 5-10x on random data. The gap widens further on partially sorted data where Timsort’s adaptive nature shines.

Tree sort loses on several fronts:

  • Cache locality: Arrays are contiguous in memory; trees scatter nodes across the heap
  • Constant factors: Pointer chasing and node allocation add overhead
  • Memory overhead: Each element needs a node with two pointers

Practical Applications and Limitations

So when does tree sort actually make sense?

Good use cases:

  • Streaming sorted data: When elements arrive continuously and you need to maintain sorted order, a BST (especially self-balancing) handles insertions efficiently
  • Frequent insertions and deletions: Unlike arrays, BSTs don’t require shifting elements
  • Range queries: Need all elements between X and Y? BSTs handle this naturally
  • Database indexing concepts: B-trees (multi-way BSTs) power database indexes for exactly these reasons

Poor use cases:

  • One-shot sorting: Just use your language’s built-in sort
  • Memory-constrained environments: The pointer overhead adds up
  • Performance-critical hot paths: Cache misses hurt
# Good use case: maintaining sorted data with updates
class SortedEventLog:
    def __init__(self):
        self.events = SortedList(key=lambda e: e['timestamp'])
    
    def add_event(self, event):
        self.events.add(event)  # O(log n) insertion, stays sorted
    
    def get_events_in_range(self, start_time, end_time):
        # Efficient range query
        start_idx = self.events.bisect_left({'timestamp': start_time})
        end_idx = self.events.bisect_right({'timestamp': end_time})
        return list(self.events[start_idx:end_idx])

Conclusion

Tree sort teaches us that the right data structure can solve multiple problems simultaneously. A BST gives you sorting, searching, range queries, and dynamic updates—all in one package.

But for pure sorting? Stick with quicksort, mergesort, or your language’s built-in sort. They’re faster, use less memory, and handle edge cases gracefully.

Use tree sort’s underlying concept—maintaining sorted data in a tree structure—when you need the tree’s other capabilities. That’s where it earns its keep. The algorithm itself is a stepping stone to understanding why databases use B-trees and why std::map and TreeMap exist. Learn it for the insight, not for the sorting.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.