Consistent Hashing
Distribute data across nodes while minimizing redistribution when nodes change.
1The Clock Analogy
Add a server? Only keys between the new server and its predecessor move.
Remove a server? Only that server's keys move to the next one.
Unlike modulo hashing where adding a node reshuffles almost everything.
2The Problem with Modulo
Simple modulo: hash(key) % N servers. Add or remove one server? Almost ALL keys get remapped. Disaster for caches.
Example: 3 → 4 Servers
~75% of keys move when going from 3 to 4 servers!
3How Consistent Hashing Works
When a server is added/removed, only keys between it and its predecessor are affected-typically 1/N of all keys.
4Virtual Nodes
Virtual Nodes: Each physical server gets multiple positions on the ring (e.g., 100-200 virtual nodes). This ensures even distribution and handles heterogeneous servers.
Even Distribution
Without virtual nodes, random placement can cause imbalance. VNodes spread load evenly.
Gradual Scaling
Bigger servers can have more VNodes. Easier to handle heterogeneous hardware.
5Real-World Usage
| System | Use Case |
|---|---|
| Amazon DynamoDB | Partition data across storage nodes |
| Apache Cassandra | Distribute data across ring of nodes |
| Memcached/Redis Cluster | Distribute cache keys across instances |
| CDN Edge Servers | Route requests to appropriate edge |
6Key Takeaways
?Quiz
1. What percentage of keys typically move when adding 1 node with consistent hashing?
2. What problem do virtual nodes solve?