Module 6 — Reliability

Consistency Models

How distributed systems agree on the "truth". From strict to eventual consistency.

1What is Consistency?

Consistency in distributed systems refers to the guarantee that all nodes see the same data at the same time. Different consistency models offer different guarantees about when and how this happens.
Simple Analogy
Think of a shared Google Doc with 3 people editing:
Strong Consistency
Everyone sees the same text instantly. If Alice types, Bob and Carol see it immediately.
Eventual Consistency
Changes sync "eventually". Alice types, Bob sees it 2 seconds later. All converge eventually.
No Consistency
Changes may never sync. Everyone sees different versions forever. (This is bad!)

2Why Does This Matter?

The Problem: Single Database Doesn't Scale

Users
Single DB
(easy!)

One database = one source of truth. Reading and writing always consistent. But what happens at scale?

10M
Users
DB 1
US
DB 2
EU
DB 3
Asia

Multiple databases = better performance but... how do we keep them in sync?

Without Consistency Guarantees

  • • User changes password in US, EU still has old password
  • • Bank shows $100 in NYC, $0 in LA for same account
  • • Two users buy the "last" item simultaneously
  • • Message sent at 2pm appears before message sent at 1pm

With Proper Consistency

  • • Password change propagates to all regions
  • • Bank balance is accurate everywhere (or clearly pending)
  • • Only one user can buy the last item
  • • Messages appear in correct order

3Consistency Levels (Spectrum)

Consistency exists on a spectrum from strongest to weakest. Stronger = safer but slower.

Strong Consistency (Linearizability)

strongest

Every read returns the most recent write. All operations appear to execute in a single, total order.

Write(A=1)
Read(A)
→ 1
Read always returns the latest write, even if write is on different node
Guarantee: If write completes before read starts → read sees write
+ Intuitive behavior
+ No stale reads
+ Safe for critical data
- Slowest (must wait for coordination)
- Lowest availability (can't work during partition)
Examples: Google Spanner, CockroachDB, single-node databases
Use cases: Financial transactions, inventory counts, leader election

Sequential Consistency

strong

Operations appear in some sequential order, same order seen by all. But not necessarily real-time order.

Real time: A writes at T1, B reads at T2 where T2 > T1
Observed
B:Read
A:Write
Order preserved per-process, but may not match wall-clock time
Guarantee: All processes see operations in same order
+ Easier to implement than linearizable
+ Still strong guarantee
- Operations may appear out of real-time order
- Still coordination overhead
Examples: Zookeeper (for some operations)
Use cases: Distributed locks, consensus protocols

Causal Consistency

medium

Operations that are causally related must be seen in order. Concurrent operations can be seen in any order.

Post: "Hi!"
→ causes →
Reply: "Hello!"
Everyone sees Post before Reply (causal)
But two independent posts can appear in any order
Guarantee: If A causes B, everyone sees A before B
+ Good performance
+ Intuitive for social apps
+ Works across partitions
- Concurrent writes may have different order per node
- More complex to implement
Examples: MongoDB (with read concern), COPS
Use cases: Social media feeds, chat applications, collaborative editing

Eventual Consistency

weak

If no new writes, all replicas will eventually converge to the same value. No guarantee how long.

T=0: Write A=1
1
0
0
T=5s: Propagating...
1
1
0
T=10s: Converged!
1
1
1
Guarantee: Given enough time with no updates, all replicas converge
+ Highest availability
+ Best performance
+ Works during partitions
- Can read stale data
- Conflicting writes need resolution
- Harder to reason about
Examples: DynamoDB, Cassandra, DNS, CDN caches
Use cases: Social media likes, view counts, shopping cart, DNS

Comparison Table

ModelGuaranteeAvailabilityLatencyUse When
StrongLatest value alwaysLowHighCorrectness critical
CausalOrdered if relatedMediumMediumReplies, threads
EventualEventually sameHighLowHigh scale, staleness OK

4Read Your Own Writes

Read-Your-Writes Consistency: A user always sees their own updates immediately, even if other users see a stale version temporarily.

The Problem Without It

1User updates profile picture (writes to primary in US)
2User refreshes page (reads from replica in EU - stale)
3User sees OLD profile picture. "My update didn't work!?"

Solutions

1. Read from Primary After Write
For N seconds after a write, route that user's reads to primary.
Trade-off: More primary load
2. Sticky Sessions
Route user to same replica for entire session.
Trade-off: Uneven load distribution
3. Version Vectors
Client tracks last write version, only accepts reads ≥ that version.
Trade-off: Complexity
4. Client-Side Caching
Client optimistically shows their own writes while server syncs.
Trade-off: Can show inconsistent state

5Conflict Resolution

With eventual consistency, two users might update the same data simultaneously. Who wins?

The Conflict Scenario

Node US
User sets name = "Alice"
at T=100
Network Partition
Nodes can't communicate
Node EU
Admin sets name = "Alicia"
at T=101
When partition heals: What should name be?
Last Write Wins (LWW)

Highest timestamp wins. Simple but can lose data.

Example: T=101 > T=100 → name = "Alicia" wins, "Alice" is lost forever
+ Simple to implement
+ Deterministic
- Data loss
- Clock sync issues
- Later ≠ more important
First Write Wins

Earliest timestamp wins. Preserves original.

Example: T=100 < T=101 → name = "Alice" wins
+ Preserves original intent
+ Deterministic
- Later updates never apply
- Same clock issues
Merge / CRDT

Automatically merge without conflict. Works for certain data types.

Example: Counter: US=5, EU=3, merged=8 (not 5 or 3)
+ No data loss
+ Automatic resolution
- Only works for certain data types
- Complex to implement
Application-Level Resolution

Keep both versions, let application or user resolve.

Example: Git merge conflicts: keep both, human resolves
+ No data loss
+ Human intelligence for complex cases
- Requires user action
- Bad UX

6Real-World Examples

Banking Systems
Strong Consistency

Can't have $100 in one branch and $0 in another. Account balance must be accurate everywhere. Worth the latency hit.

Social Media Likes
Eventual Consistency

If like count shows 999 instead of 1000 for a few seconds, no one cares. Scale and availability matter more.

Shopping Cart
Eventual Consistency

Items might briefly appear/disappear as replicas sync. Acceptable trade-off for availability.

Inventory / Tickets
Strong for Checkout

View count can be eventual, but actual purchase must be consistent to avoid overselling.

Chat Messages
Causal Consistency

Replies must appear after the message they reply to. But order of unrelated messages is flexible.

7Key Takeaways

1Consistency = agreement on data across distributed nodes. Stronger = safer, slower.
2Strong consistency: all reads return latest write. Use for financial data, inventory.
3Eventual consistency: replicas converge over time. Use for high scale, where staleness is OK.
4Causal consistency: middle ground. Related operations ordered, others flexible.
5Read-your-writes: users should always see their own updates immediately.
6Conflicts need resolution: LWW, merge, or application-level handling.