Module 4 β€” Scaling

Database Replication

Keep copies of data on multiple machines for availability and read scaling.

1The Library Branch Analogy

πŸ’‘ Simple Analogy
A city library (primary) has all books. It creates copies for branch libraries (replicas) across town.

People borrow books from their nearest branch (read from replica).New books arrive at main library first, then distributed (write to primary).

If main library burns down, a branch can become the new main library (failover).

2Replication Topologies

Single Leader (Master-Slave)

One primary handles writes, replicas serve reads.

Primary
Writes
replicate β†’
Replica 1
Replica 2
Pros: Simple to understand, No write conflicts, Read scaling
Cons: Single point of failure for writes, Replication lag

Multi-Leader (Master-Master)

Multiple nodes accept writes, sync with each other.

Leader 1
↔ sync ↔
Leader 2
Pros: Write scaling, Geographic distribution, No single point of failure
Cons: Conflict resolution complex, Eventually consistent

Leaderless

All nodes equal. Writes go to multiple nodes, reads query multiple.

Node 1
Node 2
Node 3
Pros: High availability, No leader election, Partition tolerant
Cons: Read/write quorums needed, Conflict resolution

3Synchronous vs Asynchronous

Synchronous Replication

Write waits until replica confirms. Strong consistency.

βœ“ No data loss on primary failure
βœ“ Read-your-writes guaranteed
βœ— Higher latency
βœ— Replica failure blocks writes

Asynchronous Replication

Write returns immediately. Replica catches up later.

βœ“ Fast writes
βœ“ Replica failure doesn't block
βœ— Possible data loss on failure
βœ— Replication lag (stale reads)

4Replication Lag

Replication lag is the delay between a write on primary and that write appearing on replicas. Can be milliseconds (same datacenter) to seconds (cross-region).

Handling Replication Lag

Read-your-writes:After writing, read from primary for X seconds
Monotonic reads:User always reads from same replica
Causal consistency:Track dependencies, ensure read sees prior writes

5Failover

When primary fails, a replica must be promoted. This is called failover.

1
Detect failure: Heartbeat timeout (usually 30s)
2
Choose new leader: Most up-to-date replica, or consensus
3
Reconfigure: Update routing to new primary
4
Clients reconnect: May need to retry in-flight requests
Split Brain

If old primary comes back online not knowing it's no longer leader, you get two primaries (split brain). Use fencing to ensure only one primary can accept writes.

6Key Takeaways

1Replication copies data to multiple nodes for availability and read scaling.
2Single-leader is simplest: one primary for writes, replicas for reads.
3Sync replication = strong consistency, higher latency.
4Async replication = low latency, eventual consistency.
5Replication lag causes stale readsβ€”design your app to handle it.
6Failover promotes replica when primary dies. Watch for split-brain.