Red-Black Tree: Balanced BST with Color Properties
Binary search trees promise O(log n) search, insertion, and deletion. They deliver that promise only when balanced. Insert sorted data into a naive BST and you get a linked list with O(n) operations....
Key Insights
- Red-black trees guarantee O(log n) operations through five color-based invariants that ensure no path is more than twice as long as any other, trading perfect balance for fewer rotations during modifications.
- Insertion requires at most two rotations and O(log n) recoloring operations, making red-black trees more efficient for write-heavy workloads compared to AVL trees.
- Real-world systems like the Linux kernel scheduler, Java’s TreeMap, and C++ std::map chose red-black trees specifically because they optimize for modification performance while maintaining acceptable search times.
Introduction to Red-Black Trees
Binary search trees promise O(log n) search, insertion, and deletion. They deliver that promise only when balanced. Insert sorted data into a naive BST and you get a linked list with O(n) operations. Self-balancing trees solve this problem by automatically restructuring after modifications.
Red-black trees take an elegant approach to balance. Instead of tracking exact heights like AVL trees, they assign colors to nodes and enforce invariants that guarantee approximate balance. The result: no root-to-leaf path can be more than twice as long as any other. This looser constraint means fewer structural changes during modifications.
The trade-off is straightforward. AVL trees maintain stricter balance, giving slightly faster lookups. Red-black trees perform fewer rotations, giving faster insertions and deletions. For most applications, especially those with mixed read-write workloads, red-black trees win.
The Five Red-Black Properties
Every valid red-black tree satisfies these five invariants:
- Every node is either red or black. This is the fundamental coloring rule.
- The root is black. Simplifies reasoning about the tree structure.
- Every leaf (NIL) is black. We use sentinel NIL nodes, not null pointers.
- Red nodes have black children. No two red nodes can be adjacent on any path.
- All paths from a node to descendant leaves contain the same number of black nodes. This is the “black-height” property.
Property 4 and 5 together guarantee balance. If red nodes can’t be adjacent, and every path has equal black nodes, then the longest possible path (alternating red-black) is at most twice the shortest (all black). This bounds the tree height at 2 log(n+1).
typedef enum { RED, BLACK } Color;
typedef struct Node {
int key;
Color color;
struct Node *parent;
struct Node *left;
struct Node *right;
} Node;
typedef struct {
Node *root;
Node *nil; // Sentinel node for leaves
} RBTree;
RBTree* create_tree() {
RBTree *tree = malloc(sizeof(RBTree));
tree->nil = malloc(sizeof(Node));
tree->nil->color = BLACK;
tree->nil->left = tree->nil->right = tree->nil->parent = NULL;
tree->root = tree->nil;
return tree;
}
The sentinel NIL node eliminates null-pointer checks throughout the code. Every leaf points to this single black sentinel instead of NULL. This simplifies the implementation significantly.
Search and Traversal Operations
Color properties exist purely for balance maintenance. They don’t affect the BST ordering invariant. Search works exactly like standard BST search: compare the target with the current node, go left if smaller, right if larger, repeat until found or hitting NIL.
Node* search(RBTree *tree, int key) {
Node *current = tree->root;
while (current != tree->nil && current->key != key) {
if (key < current->key) {
current = current->left;
} else {
current = current->right;
}
}
return current != tree->nil ? current : NULL;
}
// In-order traversal also unchanged
void inorder(RBTree *tree, Node *node) {
if (node == tree->nil) return;
inorder(tree, node->left);
printf("%d(%s) ", node->key, node->color == RED ? "R" : "B");
inorder(tree, node->right);
}
Time complexity remains O(log n) because the tree height is bounded. The color information adds constant space per node but doesn’t affect traversal logic.
Insertion and Rebalancing
Insertion follows a two-phase approach: first insert the node using standard BST insertion, then fix any red-black violations. New nodes are always inserted as red. Why red? Adding a red node can’t violate the black-height property (property 5). It might violate property 4 if the parent is also red, but that’s easier to fix than a black-height imbalance.
void insert(RBTree *tree, int key) {
Node *z = malloc(sizeof(Node));
z->key = key;
z->left = z->right = tree->nil;
z->color = RED;
// Standard BST insertion
Node *y = tree->nil;
Node *x = tree->root;
while (x != tree->nil) {
y = x;
if (z->key < x->key) {
x = x->left;
} else {
x = x->right;
}
}
z->parent = y;
if (y == tree->nil) {
tree->root = z;
} else if (z->key < y->key) {
y->left = z;
} else {
y->right = z;
}
insert_fixup(tree, z);
}
The fix-up procedure handles three cases based on the uncle’s color (the sibling of the parent):
void insert_fixup(RBTree *tree, Node *z) {
while (z->parent->color == RED) {
if (z->parent == z->parent->parent->left) {
Node *uncle = z->parent->parent->right;
if (uncle->color == RED) {
// Case 1: Uncle is red - recolor
z->parent->color = BLACK;
uncle->color = BLACK;
z->parent->parent->color = RED;
z = z->parent->parent;
} else {
if (z == z->parent->right) {
// Case 2: Uncle black, z is right child
z = z->parent;
left_rotate(tree, z);
}
// Case 3: Uncle black, z is left child
z->parent->color = BLACK;
z->parent->parent->color = RED;
right_rotate(tree, z->parent->parent);
}
} else {
// Symmetric cases for right side
Node *uncle = z->parent->parent->left;
if (uncle->color == RED) {
z->parent->color = BLACK;
uncle->color = BLACK;
z->parent->parent->color = RED;
z = z->parent->parent;
} else {
if (z == z->parent->left) {
z = z->parent;
right_rotate(tree, z);
}
z->parent->color = BLACK;
z->parent->parent->color = RED;
left_rotate(tree, z->parent->parent);
}
}
}
tree->root->color = BLACK;
}
Case 1 (red uncle) only recolors and moves up the tree. Cases 2 and 3 perform at most two rotations and terminate. This guarantees O(log n) time with at most two rotations per insertion.
Rotations: Left and Right
Rotations restructure the tree while preserving the BST property. A left rotation pivots a node down and to the left, promoting its right child. A right rotation does the opposite.
void left_rotate(RBTree *tree, Node *x) {
Node *y = x->right;
// Turn y's left subtree into x's right subtree
x->right = y->left;
if (y->left != tree->nil) {
y->left->parent = x;
}
// Link x's parent to y
y->parent = x->parent;
if (x->parent == tree->nil) {
tree->root = y;
} else if (x == x->parent->left) {
x->parent->left = y;
} else {
x->parent->right = y;
}
// Put x on y's left
y->left = x;
x->parent = y;
}
void right_rotate(RBTree *tree, Node *y) {
Node *x = y->left;
y->left = x->right;
if (x->right != tree->nil) {
x->right->parent = y;
}
x->parent = y->parent;
if (y->parent == tree->nil) {
tree->root = x;
} else if (y == y->parent->right) {
y->parent->right = x;
} else {
y->parent->left = x;
}
x->right = y;
y->parent = x;
}
Each rotation runs in O(1) time—just pointer manipulation. The BST property is preserved because rotations only change the structural relationship between a node and its child, not the relative ordering of keys.
Deletion and Double-Black Resolution
Deletion is more complex than insertion. When we remove a black node, we create a “double-black” situation where a node conceptually carries an extra black to maintain the black-height property. The fix-up procedure resolves this through four cases.
void rb_transplant(RBTree *tree, Node *u, Node *v) {
if (u->parent == tree->nil) {
tree->root = v;
} else if (u == u->parent->left) {
u->parent->left = v;
} else {
u->parent->right = v;
}
v->parent = u->parent;
}
void delete(RBTree *tree, Node *z) {
Node *y = z;
Node *x;
Color y_original_color = y->color;
if (z->left == tree->nil) {
x = z->right;
rb_transplant(tree, z, z->right);
} else if (z->right == tree->nil) {
x = z->left;
rb_transplant(tree, z, z->left);
} else {
y = tree_minimum(tree, z->right);
y_original_color = y->color;
x = y->right;
if (y->parent == z) {
x->parent = y;
} else {
rb_transplant(tree, y, y->right);
y->right = z->right;
y->right->parent = y;
}
rb_transplant(tree, z, y);
y->left = z->left;
y->left->parent = y;
y->color = z->color;
}
if (y_original_color == BLACK) {
delete_fixup(tree, x);
}
free(z);
}
The delete fix-up handles four cases involving the sibling’s color and the sibling’s children’s colors. It’s the most complex part of red-black tree implementation, potentially requiring O(log n) recolorings and up to three rotations.
Practical Applications and Trade-offs
Red-black trees power critical systems. The Linux Completely Fair Scheduler uses them to manage process scheduling with O(log n) insertion and extraction of the next process. Java’s TreeMap and TreeSet are red-black trees. C++’s std::map, std::set, and their multi-variants use red-black trees in most standard library implementations.
Red-black vs. AVL trees:
- AVL trees are more strictly balanced (height difference ≤ 1 vs. ≤ 2x)
- AVL trees perform better for read-heavy workloads
- Red-black trees perform fewer rotations on modifications
- Red-black trees are preferred for mixed or write-heavy workloads
When to use red-black trees:
- You need ordered data with frequent modifications
- Standard library implementations suffice (use
std::map,TreeMap) - You’re building a scheduler or memory allocator
When to consider alternatives:
- Read-heavy workloads: consider AVL trees
- Range queries and disk storage: consider B-trees
- Simple use cases: consider hash tables if ordering isn’t needed
For most applications, don’t implement your own. Use your language’s standard library ordered map. When you need custom behavior or are building systems software, understanding the mechanics helps you make informed decisions and debug performance issues.