Module 2 - Traffic & Load

Consistent Hashing

Distribute data across nodes while minimizing redistribution when nodes change.

1The Clock Analogy

Simple Analogy
Imagine a clock face with servers at different hour positions. Each key hashes to a position on the clock and gets assigned to the next server clockwise.

Add a server? Only keys between the new server and its predecessor move.
Remove a server? Only that server's keys move to the next one.

Unlike modulo hashing where adding a node reshuffles almost everything.

2The Problem with Modulo

Simple modulo: hash(key) % N servers. Add or remove one server? Almost ALL keys get remapped. Disaster for caches.

Example: 3 → 4 Servers

Before (3 servers)
hash=0 → Server 0
hash=1 → Server 1
hash=2 → Server 2
hash=3 → Server 0
After (4 servers)
hash=0 → Server 0 ✓
hash=1 → Server 1 ✓
hash=2 → Server 2 ✓
hash=3 → Server 3 (moved!)

~75% of keys move when going from 3 to 4 servers!

3How Consistent Hashing Works

1
Create a Hash Ring
Imagine a circle with positions 0 to 2^32
2
Place Servers
Hash each server ID to a position on the ring
3
Place Keys
Hash each key to a position on the ring
4
Find Server
Walk clockwise from key's position to first server

When a server is added/removed, only keys between it and its predecessor are affected-typically 1/N of all keys.

4Virtual Nodes

Virtual Nodes: Each physical server gets multiple positions on the ring (e.g., 100-200 virtual nodes). This ensures even distribution and handles heterogeneous servers.

Even Distribution

Without virtual nodes, random placement can cause imbalance. VNodes spread load evenly.

Gradual Scaling

Bigger servers can have more VNodes. Easier to handle heterogeneous hardware.

5Real-World Usage

SystemUse Case
Amazon DynamoDBPartition data across storage nodes
Apache CassandraDistribute data across ring of nodes
Memcached/Redis ClusterDistribute cache keys across instances
CDN Edge ServersRoute requests to appropriate edge

6Key Takeaways

1Consistent hashing minimizes key redistribution when nodes change
2Only 1/N of keys move when adding/removing a node (vs nearly all with modulo)
3Virtual nodes ensure even distribution across the ring
4Essential for distributed caches, databases, and load balancers
5Used by DynamoDB, Cassandra, Memcached and many distributed systems

?Quiz

1. What percentage of keys typically move when adding 1 node with consistent hashing?

2. What problem do virtual nodes solve?