Persistent Data Structures: Immutable with History
Persistent data structures preserve their previous versions when modified. Instead of changing data in place, every 'modification' produces a new version while keeping the old one intact and...
Key Insights
- Persistent data structures preserve all previous versions through structural sharing, making them memory-efficient rather than wasteful—most operations share 95%+ of their data with previous versions.
- The real power isn’t immutability alone but the ability to treat any historical state as a first-class value, enabling trivial undo systems, concurrent access without locks, and simplified debugging.
- Modern persistent data structure implementations (HAMTs, 32-way tries) achieve near-parity with mutable counterparts for most operations, with O(log32 n) effectively behaving as O(1) for practical dataset sizes.
What Are Persistent Data Structures?
Persistent data structures preserve their previous versions when modified. Instead of changing data in place, every “modification” produces a new version while keeping the old one intact and accessible.
This isn’t about disk persistence or databases. The term comes from the idea that old versions persist after updates. Contrast this with ephemeral (mutable) structures where modifications destroy previous state.
// Ephemeral: previous state is gone
const mutableArray = [1, 2, 3];
mutableArray.push(4);
// Original [1, 2, 3] no longer exists
// Persistent: both versions coexist
const persistentList = List([1, 2, 3]);
const newList = persistentList.push(4);
// persistentList still equals [1, 2, 3]
// newList equals [1, 2, 3, 4]
Functional programming languages like Lisp pioneered these structures in the 1950s. Clojure brought them mainstream in 2007 with its high-performance persistent collections. Today, they’re everywhere—from React’s state management to database internals.
The naive objection is obvious: “Copying everything on every change sounds expensive.” It would be, if that’s what happened. The key insight is that persistent structures share most of their data between versions.
Types of Persistence
Not all persistence is created equal. Understanding the three types helps you choose the right approach.
Partial persistence allows accessing any historical version but only modifying the newest one. Think of it as a linear history—you can look back but can’t branch.
Full persistence lets you modify any version, creating a branching tree of histories. Each modification of any version creates a new branch.
Confluent persistence goes further, allowing you to combine two historical versions into a new one, creating a DAG (directed acyclic graph) of versions.
Partial Persistence (linear):
v1 → v2 → v3 → v4 (current)
↑
can read, can't modify
Full Persistence (tree):
v1 → v2 → v3
↓
v2a → v2b
Confluent Persistence (DAG):
v1 → v2 → v3
↓ ↘
v2a → merge(v3, v2a)
Most practical applications need only partial or full persistence. Confluent persistence appears in specialized scenarios like version control merge operations.
Structural Sharing: The Key Technique
The magic behind efficient persistence is structural sharing. When you modify a persistent structure, you only copy the parts that change—everything else is shared with the original.
Consider a binary tree. When you update a leaf node, you only need to create new nodes along the path from root to that leaf. All other subtrees are shared.
class PersistentNode:
def __init__(self, value, left=None, right=None):
self.value = value
self.left = left
self.right = right
def set_left(self, new_left):
# Create new node sharing the right subtree
return PersistentNode(self.value, new_left, self.right)
def set_right(self, new_right):
# Create new node sharing the left subtree
return PersistentNode(self.value, self.left, new_right)
def update_value(self, new_value):
# Create new node sharing both subtrees
return PersistentNode(new_value, self.left, self.right)
def update_path(root, path, new_value):
"""Update a value at a path, returning new root with path copying."""
if not path:
return root.update_value(new_value)
direction, *rest = path
if direction == 'left':
new_left = update_path(root.left, rest, new_value)
return root.set_left(new_left)
else:
new_right = update_path(root.right, rest, new_value)
return root.set_right(new_right)
For a balanced tree with n nodes, updating any element requires O(log n) new nodes while sharing the remaining O(n) nodes. Memory overhead is logarithmic, not linear.
Common Persistent Data Structures
Different structures suit different access patterns. Here’s what you’ll encounter most often.
Persistent Lists (cons lists) offer O(1) prepend and O(n) random access. They’re the simplest persistent structure—prepending creates a new head pointing to the existing list.
Persistent Vectors use wide, shallow tries (typically 32-way branching) to achieve effectively O(1) random access and updates. A tree with branching factor 32 needs only 7 levels to hold 34 billion elements.
Persistent Hash Maps (HAMTs—Hash Array Mapped Tries) combine hashing with trie structure. They offer O(log32 n) operations, which for practical purposes is constant time.
// Simplified persistent vector using a trie structure
class PersistentVector {
constructor(root = null, count = 0, shift = 5) {
this.root = root;
this.count = count;
this.shift = shift; // bits per level (5 = 32-way branching)
}
get(index) {
if (index < 0 || index >= this.count) {
throw new RangeError('Index out of bounds');
}
return this._getNode(this.root, this.shift, index);
}
_getNode(node, shift, index) {
if (shift === 0) {
return node[index & 0x1f]; // bottom 5 bits
}
const subIndex = (index >> shift) & 0x1f;
return this._getNode(node[subIndex], shift - 5, index);
}
set(index, value) {
if (index < 0 || index >= this.count) {
throw new RangeError('Index out of bounds');
}
const newRoot = this._setNode(this.root, this.shift, index, value);
return new PersistentVector(newRoot, this.count, this.shift);
}
_setNode(node, shift, index, value) {
// Clone the current node (structural sharing happens here)
const newNode = [...node];
if (shift === 0) {
newNode[index & 0x1f] = value;
} else {
const subIndex = (index >> shift) & 0x1f;
newNode[subIndex] = this._setNode(
node[subIndex],
shift - 5,
index,
value
);
}
return newNode;
}
}
| Structure | Access | Update | Insert | Memory per Version |
|---|---|---|---|---|
| Persistent List | O(n) | O(n) | O(1) head | O(1) |
| Persistent Vector | O(log32 n) | O(log32 n) | O(log32 n) | O(log32 n) |
| Persistent HashMap | O(log32 n) | O(log32 n) | O(log32 n) | O(log32 n) |
Real-World Applications
Persistent data structures shine in several common scenarios.
Undo/Redo Systems become trivial. Instead of implementing command patterns or maintaining diff stacks, just keep references to previous states.
class Editor:
def __init__(self):
self.history = []
self.current_index = -1
self.state = PersistentDict()
def apply_change(self, key, value):
# Truncate any redo history
self.history = self.history[:self.current_index + 1]
# Create new state (old state preserved automatically)
self.state = self.state.set(key, value)
self.history.append(self.state)
self.current_index += 1
def undo(self):
if self.current_index > 0:
self.current_index -= 1
self.state = self.history[self.current_index]
def redo(self):
if self.current_index < len(self.history) - 1:
self.current_index += 1
self.state = self.history[self.current_index]
Database MVCC (Multi-Version Concurrency Control) uses persistent structure concepts. PostgreSQL, for instance, keeps old row versions to serve concurrent readers without blocking writers.
Redux and State Management libraries leverage immutability for predictable state updates and time-travel debugging. Each action produces a new state tree, with structural sharing keeping memory reasonable.
Concurrent Programming benefits enormously. When data can’t change, you don’t need locks. Multiple threads can safely read any version while one thread creates a new version.
Performance Considerations
Persistent structures aren’t free, but they’re cheaper than you’d expect.
The overhead comes from pointer indirection and memory allocation. Accessing an element in a persistent vector requires traversing tree nodes rather than computing an offset. Creating new versions allocates memory.
However, several factors work in your favor:
- Cache efficiency: Wide branching (32-way) keeps trees shallow, minimizing cache misses.
- GC friendliness: Structural sharing means less garbage when old versions are discarded.
- Avoiding defensive copies: Mutable code often copies data “just in case.” Persistent structures eliminate this.
// Benchmark: 10,000 updates
const Immutable = require('immutable');
// Mutable approach
console.time('mutable');
let mutableMap = new Map();
for (let i = 0; i < 10000; i++) {
mutableMap.set(`key${i}`, i);
}
console.timeEnd('mutable'); // ~3ms
// Persistent approach
console.time('persistent');
let persistentMap = Immutable.Map();
for (let i = 0; i < 10000; i++) {
persistentMap = persistentMap.set(`key${i}`, i);
}
console.timeEnd('persistent'); // ~15ms
// But: persistent version kept all 10,000 intermediate states
// Mutable version would need explicit copying to match
The persistent version is slower per operation, but it preserved history that would require explicit work to maintain with mutable structures.
Libraries and Language Support
Your language choice affects how natural persistent structures feel.
Clojure uses persistent structures by default. All core collections are persistent and optimized over 15+ years.
(def v1 [1 2 3])
(def v2 (conj v1 4))
;; v1 is still [1 2 3], v2 is [1 2 3 4]
Immutable.js brings Clojure-style collections to JavaScript with a familiar API.
const { Map } = require('immutable');
const map1 = Map({ a: 1, b: 2 });
const map2 = map1.set('c', 3);
// map1.get('c') === undefined
// map2.get('c') === 3
Immer takes a different approach, letting you write mutable-style code that produces persistent updates.
import { produce } from 'immer';
const state = { users: [{ name: 'Alice' }] };
const newState = produce(state, draft => {
draft.users.push({ name: 'Bob' });
});
// state.users.length === 1
// newState.users.length === 2
For most JavaScript projects, I recommend Immer for its ergonomics. If you’re building something performance-critical with complex nested updates, Immutable.js’s explicit structures give you more control.
For JVM languages, Clojure’s collections are available via clojure.lang.PersistentVector and friends, or use libraries like Paguro for Java or scala.collection.immutable for Scala.
The choice between explicit persistent collections and copy-on-write wrappers depends on your performance requirements and team familiarity. Start with the simpler option (Immer, Scala’s defaults) and optimize if profiling reveals issues.