Module 1 - Data Storage

Cache Invalidation

One of the two hard things in computer science: knowing when to throw away cached data.

1The Menu Board Analogy

Simple Analogy
A restaurant has a menu board (cache) and the actual menu in the kitchen (database). When prices change, someone must update the board. Forget to update? Customers see wrong prices. Update too often? Waste time rewriting.

Cache Invalidation: The process of removing or updating cached data when the source data changes, ensuring users don't see stale information.

2Invalidation Strategies

Time-Based (TTL)

Data expires after a set time. Simple but can serve stale data until expiry.

+ Simple to implement- Stale data until TTL expires

Event-Based

Invalidate when data changes. Requires application to know about cache.

+ Always fresh- Complex, tight coupling

Write-Through

Write to cache and database together. Cache always has fresh data.

+ Strong consistency- Write latency, complexity

Write-Behind

Write to cache immediately, sync to DB asynchronously.

+ Fast writes- Risk of data loss

3Common Patterns

Versioned Keys

user:123:v5 - increment version on update. Old versions naturally expire.

Tag-Based

Tag related cache entries. Invalidate all entries with tag 'product:123'.

Pub/Sub

Publish invalidation events. All cache nodes subscribe and clear.

Cache-Aside + TTL

Combine lazy loading with short TTL for eventual consistency.

4Challenges

Distributed Invalidation
How do you tell all 50 cache nodes to invalidate? Use pub/sub or accept eventual consistency.
Thundering Herd
TTL expires, 1000 requests hit database simultaneously. Use locking or staggered TTLs.
Race Conditions
Read old value, cache it, write completes. Now cache has stale data. Use versioning.
Partial Updates
Update user name but cache has old avatar. Invalidate entire object or use fine-grained keys.

5Key Takeaways

1TTL is simplest but allows stale data until expiry
2Event-based is freshest but requires more complexity
3Use versioned keys to avoid race conditions
4Thundering herd can be mitigated with locking or staggered TTL
5There is no perfect solution-choose based on staleness tolerance

?Quiz

1. Cache expires, 1000 requests hit DB at once. This is called:

2. Best way to avoid stale data in cache after update?