Hash Map: Implementation from Scratch
A hash map is a data structure that stores key-value pairs and provides near-instant lookups, insertions, and deletions. Unlike arrays where you access elements by numeric index, hash maps let you...
Key Insights
- A hash map achieves O(1) average-case lookups by converting keys into array indices through a hash function, but collision handling and load factor management are what separate a toy implementation from a production-ready one.
- Separate chaining with linked lists offers simpler implementation and graceful degradation under high load, while open addressing provides better cache locality but requires more careful deletion handling.
- Dynamic resizing at a 0.75 load factor threshold prevents performance degradation, but the rehashing cost means you should pre-size your hash map when you know the approximate number of entries upfront.
What is a Hash Map?
A hash map is a data structure that stores key-value pairs and provides near-instant lookups, insertions, and deletions. Unlike arrays where you access elements by numeric index, hash maps let you use arbitrary keys—strings, objects, or any hashable type—to store and retrieve values.
The magic happens through a hash function that converts keys into array indices. When you store {"user_123": "Alice"}, the hash function transforms "user_123" into an index like 7, and the value gets stored at position 7 in an internal array. Retrieval works the same way: hash the key, jump directly to that index, grab the value.
Real-world applications are everywhere: database indexes, caching layers, request deduplication, symbol tables in compilers, and counting word frequencies. Any time you need fast lookups by a non-numeric key, a hash map is usually the answer.
Core Components: Hash Functions
The hash function is the heart of any hash map. It must satisfy three properties:
- Deterministic: The same key must always produce the same hash value
- Uniform distribution: Keys should spread evenly across available buckets to minimize collisions
- Fast computation: Hashing happens on every operation, so it can’t be a bottleneck
Here’s a simple but effective hash function for strings:
function hash(key, capacity) {
let hashValue = 0;
const PRIME = 31;
for (let i = 0; i < key.length; i++) {
hashValue = (hashValue * PRIME + key.charCodeAt(i)) % capacity;
}
return hashValue;
}
The prime multiplier (31) helps distribute characters’ contributions more uniformly. Using modulo ensures the result fits within our array bounds. This approach—polynomial rolling hash—is what Java’s String.hashCode() uses under the hood.
For production systems, you’d want something more robust like MurmurHash or xxHash, which provide better distribution and resistance to adversarial inputs. But for understanding the fundamentals, this simple function works well.
Handling Collisions: Chaining vs Open Addressing
Collisions are inevitable. With a finite array and infinite possible keys, multiple keys will eventually hash to the same index. The pigeonhole principle guarantees it.
Separate chaining stores a linked list at each bucket. When two keys collide, both entries live in the same bucket’s list:
class HashNode {
constructor(key, value) {
this.key = key;
this.value = value;
this.next = null;
}
}
// When inserting at bucket index 5:
// bucket[5] -> {key: "foo", value: 1} -> {key: "bar", value: 2} -> null
Open addressing keeps everything in the main array. When a collision occurs, you probe for the next available slot:
// Linear probing: try index, index+1, index+2, ...
function findSlot(key, buckets, capacity) {
let index = hash(key, capacity);
while (buckets[index] !== null && buckets[index].key !== key) {
index = (index + 1) % capacity;
}
return index;
}
I prefer separate chaining for most implementations. It’s simpler to reason about, handles deletions cleanly (open addressing requires tombstone markers), and degrades gracefully when the load factor spikes. Open addressing shines when memory locality matters and you can guarantee a low load factor.
Building the Hash Map Class
Let’s build a complete hash map using separate chaining:
class HashMap {
constructor(initialCapacity = 16) {
this.capacity = initialCapacity;
this.size = 0;
this.buckets = new Array(this.capacity).fill(null);
this.LOAD_FACTOR_THRESHOLD = 0.75;
}
_hash(key) {
let hashValue = 0;
const keyString = String(key);
const PRIME = 31;
for (let i = 0; i < keyString.length; i++) {
hashValue = (hashValue * PRIME + keyString.charCodeAt(i)) % this.capacity;
}
return hashValue;
}
set(key, value) {
if (this.size / this.capacity >= this.LOAD_FACTOR_THRESHOLD) {
this._resize();
}
const index = this._hash(key);
let current = this.buckets[index];
// Check if key already exists
while (current !== null) {
if (current.key === key) {
current.value = value; // Update existing
return;
}
current = current.next;
}
// Insert new node at head of chain
const newNode = new HashNode(key, value);
newNode.next = this.buckets[index];
this.buckets[index] = newNode;
this.size++;
}
get(key) {
const index = this._hash(key);
let current = this.buckets[index];
while (current !== null) {
if (current.key === key) {
return current.value;
}
current = current.next;
}
return undefined;
}
delete(key) {
const index = this._hash(key);
let current = this.buckets[index];
let previous = null;
while (current !== null) {
if (current.key === key) {
if (previous === null) {
this.buckets[index] = current.next;
} else {
previous.next = current.next;
}
this.size--;
return true;
}
previous = current;
current = current.next;
}
return false;
}
has(key) {
return this.get(key) !== undefined;
}
}
The set method handles both insertions and updates. We traverse the chain looking for an existing key first—this ensures we don’t create duplicates. Inserting at the head of the chain (rather than the tail) is an O(1) operation.
Dynamic Resizing and Load Factor
The load factor is the ratio of stored entries to total capacity. As it increases, chains get longer, and O(1) operations degrade toward O(n). The standard threshold is 0.75—a balance between space efficiency and lookup speed.
When we hit the threshold, we double the capacity and rehash every entry:
_resize() {
const oldBuckets = this.buckets;
const oldCapacity = this.capacity;
this.capacity *= 2;
this.buckets = new Array(this.capacity).fill(null);
this.size = 0;
// Rehash all existing entries
for (let i = 0; i < oldCapacity; i++) {
let current = oldBuckets[i];
while (current !== null) {
this.set(current.key, current.value);
current = current.next;
}
}
}
Resizing is expensive—O(n) to rehash everything. But it happens infrequently enough that the amortized cost per operation remains O(1). If you know you’ll store 10,000 entries, initialize with a capacity of at least 13,334 (10,000 / 0.75) to avoid any resizing.
Some implementations also shrink when the load factor drops below 0.25, but this adds complexity and is often unnecessary.
Performance Analysis
Time Complexity:
| Operation | Average Case | Worst Case |
|---|---|---|
| Insert | O(1) | O(n) |
| Lookup | O(1) | O(n) |
| Delete | O(1) | O(n) |
The worst case occurs when all keys hash to the same bucket, creating a single long chain. With a good hash function and reasonable load factor, this is rare in practice.
Space Complexity: O(n) for storing n entries, plus overhead for the bucket array and node pointers.
When hash maps aren’t ideal:
- When you need ordered iteration (use a tree map instead)
- When memory is extremely constrained (the overhead per entry is significant)
- When keys aren’t hashable or equality comparison is expensive
- When you need range queries (“find all keys between A and B”)
Enhancements and Production Considerations
Real-world hash map implementations include several refinements our basic version lacks.
Iterator support lets you traverse all entries:
keys() {
const result = [];
for (let i = 0; i < this.capacity; i++) {
let current = this.buckets[i];
while (current !== null) {
result.push(current.key);
current = current.next;
}
}
return result;
}
*[Symbol.iterator]() {
for (let i = 0; i < this.capacity; i++) {
let current = this.buckets[i];
while (current !== null) {
yield [current.key, current.value];
current = current.next;
}
}
}
Other production considerations:
- Null/undefined keys: Decide whether to allow them. Java’s HashMap permits one null key; our implementation would hash it to a consistent bucket.
- Thread safety: Our implementation isn’t thread-safe. ConcurrentHashMap in Java uses lock striping—separate locks per bucket segment—for concurrent access.
- Hash flooding protection: Randomize the hash seed at startup to prevent attackers from crafting inputs that all collide.
Python’s dict switched from open addressing to a compact, ordered implementation in 3.6. Java’s HashMap uses chaining but converts long chains (8+ entries) into balanced trees for O(log n) worst-case lookups.
Understanding these fundamentals helps you make informed decisions about when to use hash maps, how to configure them, and how to debug performance issues when your “O(1)” lookups aren’t behaving as expected.