Module 4 - Scaling

Sharding & Partitioning

Splitting data across multiple databases to scale writes and storage.

1The Library Analogy

Simple Analogy

A library has too many books for one building. Solution: split by genre-fiction in Building A, non-fiction in Building B. Each building operates independently. That's sharding.

Sharding (horizontal partitioning) splits data across multiple database instances. Each shard holds a subset of the data.

2Sharding Strategies

Range-Based

Shard by ranges (users 1-1M on shard 1, 1M-2M on shard 2).

+ Simple queries- Hot spots if ranges uneven

Hash-Based

Hash the key, mod by shard count. Even distribution.

+ Even load- Hard to add shards

Directory-Based

Lookup table maps keys to shards. Flexible but adds hop.

+ Flexible- Lookup overhead

Geographic

Shard by region (US data on US shard).

+ Low latency- Uneven if regions differ

3Choosing a Shard Key

Good Shard Keys

✓ High cardinality (user_id)
✓ Even distribution
✓ Query patterns align

Bad Shard Keys

✗ Low cardinality (country)
✗ Monotonic (timestamp)
✗ Frequently changing

4Challenges

Cross-Shard Queries

JOINs across shards are expensive or impossible.

Resharding

Adding shards requires data migration. Plan ahead.

Hot Spots

One shard gets more traffic. Use consistent hashing.

Transactions

ACID across shards is very complex.

5Key Takeaways

1Sharding splits data across databases to scale writes

2Choose shard key carefully-high cardinality, even distribution

3Hash-based distributes evenly; range-based enables range queries

4Cross-shard queries are expensive-design to avoid them

5Use consistent hashing to minimize resharding pain

?Quiz

1. You shard by user_id. Query all orders for one user hits:

2. Sharding by timestamp causes what problem?