Module 4 - Scaling
Sharding & Partitioning
Splitting data across multiple databases to scale writes and storage.
1The Library Analogy
Simple Analogy
A library has too many books for one building. Solution: split by genre-fiction in Building A, non-fiction in Building B. Each building operates independently. That's sharding.
Sharding (horizontal partitioning) splits data across multiple database instances. Each shard holds a subset of the data.
2Sharding Strategies
Range-Based
Shard by ranges (users 1-1M on shard 1, 1M-2M on shard 2).
+ Simple queries- Hot spots if ranges uneven
Hash-Based
Hash the key, mod by shard count. Even distribution.
+ Even load- Hard to add shards
Directory-Based
Lookup table maps keys to shards. Flexible but adds hop.
+ Flexible- Lookup overhead
Geographic
Shard by region (US data on US shard).
+ Low latency- Uneven if regions differ
4Challenges
Cross-Shard Queries
JOINs across shards are expensive or impossible.
Resharding
Adding shards requires data migration. Plan ahead.
Hot Spots
One shard gets more traffic. Use consistent hashing.
Transactions
ACID across shards is very complex.
5Key Takeaways
1Sharding splits data across databases to scale writes
2Choose shard key carefully-high cardinality, even distribution
3Hash-based distributes evenly; range-based enables range queries
4Cross-shard queries are expensive-design to avoid them
5Use consistent hashing to minimize resharding pain
?Quiz
1. You shard by user_id. Query all orders for one user hits:
2. Sharding by timestamp causes what problem?