Module 1 - Data Storage

Denormalization

Trading data redundancy for read performance in high-scale systems.

1The Library Analogy

Simple Analogy

Normalized (Central Card Catalog): One master catalog. To find a book's author info, check the author card. Clean and organized, but more trips.

Denormalized (Author Info on Every Book): Print author bio on each book's cover. Redundant, but no trips to the catalog. Faster to read, harder to update.

2What is Denormalization?

Denormalization: Intentionally adding redundant data to reduce expensive JOINs and speed up reads. The opposite of database normalization.

Benefits

✓ Faster reads (no JOINs)
✓ Simpler queries
✓ Better scalability
✓ Reduced database load

Trade-offs

✗ Data redundancy
✗ Harder to update
✗ Potential inconsistency
✗ More storage needed

3Common Techniques

Embed Related Data

Store user's name directly in orders table instead of user_id only

orders.user_name instead of JOIN users

Pre-compute Aggregates

Store calculated totals instead of computing on every read

Store product.review_count instead of COUNT(*)

Duplicate Across Tables

Copy frequently accessed fields to avoid JOINs

Store author_name in posts table

Materialized Views

Pre-built query results stored as a table

CREATE MATERIALIZED VIEW sales_summary AS ...

4When to Denormalize

✓Read-heavy workload (90%+ reads)

✓Slow JOINs across large tables

✓Data changes infrequently

✓Consistency can be eventual

✓Query patterns are well-known

Don't denormalize prematurely. Start normalized, measure performance, then denormalize specific hot paths.

5Key Takeaways

1Denormalization trades write complexity for read performance

2Reduces JOINs by duplicating data across tables

3Best for read-heavy, write-light workloads

4Requires careful update logic to maintain consistency

5Measure first-only denormalize proven bottlenecks

?Quiz

1. A social media feed shows user name on every post. To avoid JOINs, you would:

2. Which workload benefits most from denormalization?