Hash Table Collision Resolution: Chaining vs Open Addressing
Understand how hash tables handle collisions with separate chaining and open addressing. Compare linear probing, quadratic probing, and double hashing strategies with examples.
Detailed Explanation
Hash Table Collision Resolution
A hash collision occurs when two different keys produce the same hash value (map to the same bucket). Since the number of possible keys is typically much larger than the number of buckets, collisions are inevitable. How they are handled determines the hash table's performance.
Separate Chaining
Each bucket stores a linked list (or other collection) of all entries that hash to that bucket:
Bucket 0: -> ("apple", 5) -> ("grape", 8)
Bucket 1: -> ("banana", 3)
Bucket 2: (empty)
Bucket 3: -> ("cherry", 7) -> ("date", 2) -> ("fig", 9)
Advantages:
- Simple to implement
- Graceful degradation: performance degrades linearly with load factor
- Deletion is straightforward (just remove from the list)
- Can store more elements than the number of buckets
Disadvantages:
- Extra memory for pointers
- Poor cache performance (pointer chasing through linked lists)
Open Addressing
All entries are stored directly in the bucket array. When a collision occurs, the algorithm probes for the next available slot.
Linear probing: Check bucket h, then h+1, h+2, ...
Insert "apple" -> hash=3 -> bucket 3 (empty, placed)
Insert "grape" -> hash=3 -> bucket 3 (occupied)
-> bucket 4 (empty, placed)
Quadratic probing: Check bucket h, then h+1, h+4, h+9, ...
This reduces primary clustering where consecutive occupied slots form long runs.
Double hashing: Use a second hash function to determine the probe step:
probe(i) = (h1(key) + i * h2(key)) % size
This provides the best distribution but is more complex to implement.
Performance Comparison
| Metric | Chaining | Linear Probing | Double Hashing |
|---|---|---|---|
| Average search (low load) | O(1) | O(1) | O(1) |
| Average search (high load) | O(1 + n/m) | Degrades faster | Better than linear |
| Worst case | O(n) | O(n) | O(n) |
| Cache performance | Poor | Excellent | Good |
| Deletion | Easy | Requires tombstones | Requires tombstones |
Tombstone Deletion
In open addressing, you cannot simply empty a slot on deletion — it would break the probe sequence for other keys. Instead, deleted slots are marked as tombstones: treated as occupied during search but available for insertion.
Use Case
Every hash table implementation must choose a collision resolution strategy. Language standard libraries vary: Java's HashMap uses chaining (with tree upgrade for long chains), Python's dict uses open addressing with pseudo-random probing, and Go's map uses a variant of chaining with buckets. Understanding these strategies is crucial for system design and choosing the right data structure.