Module 4 β Scaling
Database Replication
Keep copies of data on multiple machines for availability and read scaling.
1The Library Branch Analogy
π‘ Simple Analogy
A city library (primary) has all books. It creates copies for branch libraries (replicas) across town.
People borrow books from their nearest branch (read from replica).New books arrive at main library first, then distributed (write to primary).
If main library burns down, a branch can become the new main library (failover).
People borrow books from their nearest branch (read from replica).New books arrive at main library first, then distributed (write to primary).
If main library burns down, a branch can become the new main library (failover).
2Replication Topologies
Single Leader (Master-Slave)
One primary handles writes, replicas serve reads.
Primary
Writesreplicate β
Replica 1
Replica 2
Pros: Simple to understand, No write conflicts, Read scaling
Cons: Single point of failure for writes, Replication lag
Multi-Leader (Master-Master)
Multiple nodes accept writes, sync with each other.
Leader 1
β sync β
Leader 2
Pros: Write scaling, Geographic distribution, No single point of failure
Cons: Conflict resolution complex, Eventually consistent
Leaderless
All nodes equal. Writes go to multiple nodes, reads query multiple.
Node 1
Node 2
Node 3
Pros: High availability, No leader election, Partition tolerant
Cons: Read/write quorums needed, Conflict resolution
3Synchronous vs Asynchronous
Synchronous Replication
Write waits until replica confirms. Strong consistency.
β No data loss on primary failure
β Read-your-writes guaranteed
β Higher latency
β Replica failure blocks writes
Asynchronous Replication
Write returns immediately. Replica catches up later.
β Fast writes
β Replica failure doesn't block
β Possible data loss on failure
β Replication lag (stale reads)
4Replication Lag
Replication lag is the delay between a write on primary and that write appearing on replicas. Can be milliseconds (same datacenter) to seconds (cross-region).
Handling Replication Lag
Read-your-writes:After writing, read from primary for X seconds
Monotonic reads:User always reads from same replica
Causal consistency:Track dependencies, ensure read sees prior writes
5Failover
When primary fails, a replica must be promoted. This is called failover.
1
Detect failure: Heartbeat timeout (usually 30s)
2
Choose new leader: Most up-to-date replica, or consensus
3
Reconfigure: Update routing to new primary
4
Clients reconnect: May need to retry in-flight requests
Split Brain
If old primary comes back online not knowing it's no longer leader, you get two primaries (split brain). Use fencing to ensure only one primary can accept writes.
6Key Takeaways
1Replication copies data to multiple nodes for availability and read scaling.
2Single-leader is simplest: one primary for writes, replicas for reads.
3Sync replication = strong consistency, higher latency.
4Async replication = low latency, eventual consistency.
5Replication lag causes stale readsβdesign your app to handle it.
6Failover promotes replica when primary dies. Watch for split-brain.