Module 2 - Traffic & Load

Consistent Hashing

Distribute data across nodes while minimizing redistribution when nodes change.

1The Clock Analogy

Simple Analogy

Imagine a clock face with servers at different hour positions. Each key hashes to a position on the clock and gets assigned to the next server clockwise.

Add a server? Only keys between the new server and its predecessor move.
Remove a server? Only that server's keys move to the next one.

Unlike modulo hashing where adding a node reshuffles almost everything.

2The Problem with Modulo

Simple modulo: hash(key) % N servers. Add or remove one server? Almost ALL keys get remapped. Disaster for caches.

Example: 3 → 4 Servers

Before (3 servers)

hash=0 → Server 0

hash=1 → Server 1

hash=2 → Server 2

hash=3 → Server 0

After (4 servers)

hash=0 → Server 0 ✓

hash=1 → Server 1 ✓

hash=2 → Server 2 ✓

hash=3 → Server 3 (moved!)

~75% of keys move when going from 3 to 4 servers!

3How Consistent Hashing Works

Create a Hash Ring

Imagine a circle with positions 0 to 2^32

Place Servers

Hash each server ID to a position on the ring

Place Keys

Hash each key to a position on the ring

Find Server

Walk clockwise from key's position to first server

When a server is added/removed, only keys between it and its predecessor are affected-typically 1/N of all keys.

4Virtual Nodes

Virtual Nodes: Each physical server gets multiple positions on the ring (e.g., 100-200 virtual nodes). This ensures even distribution and handles heterogeneous servers.

Even Distribution

Without virtual nodes, random placement can cause imbalance. VNodes spread load evenly.

Gradual Scaling

Bigger servers can have more VNodes. Easier to handle heterogeneous hardware.

5Real-World Usage

System	Use Case
Amazon DynamoDB	Partition data across storage nodes
Apache Cassandra	Distribute data across ring of nodes
Memcached/Redis Cluster	Distribute cache keys across instances
CDN Edge Servers	Route requests to appropriate edge

6Key Takeaways

1Consistent hashing minimizes key redistribution when nodes change

2Only 1/N of keys move when adding/removing a node (vs nearly all with modulo)

3Virtual nodes ensure even distribution across the ring

4Essential for distributed caches, databases, and load balancers

5Used by DynamoDB, Cassandra, Memcached and many distributed systems

?Quiz

1. What percentage of keys typically move when adding 1 node with consistent hashing?

2. What problem do virtual nodes solve?