Module 1 - Data Storage

Denormalization

Trading data redundancy for read performance in high-scale systems.

1The Library Analogy

Simple Analogy
Normalized (Central Card Catalog): One master catalog. To find a book's author info, check the author card. Clean and organized, but more trips.

Denormalized (Author Info on Every Book): Print author bio on each book's cover. Redundant, but no trips to the catalog. Faster to read, harder to update.

2What is Denormalization?

Denormalization: Intentionally adding redundant data to reduce expensive JOINs and speed up reads. The opposite of database normalization.

Benefits

  • ✓ Faster reads (no JOINs)
  • ✓ Simpler queries
  • ✓ Better scalability
  • ✓ Reduced database load

Trade-offs

  • ✗ Data redundancy
  • ✗ Harder to update
  • ✗ Potential inconsistency
  • ✗ More storage needed

3Common Techniques

Embed Related Data

Store user's name directly in orders table instead of user_id only

orders.user_name instead of JOIN users

Pre-compute Aggregates

Store calculated totals instead of computing on every read

Store product.review_count instead of COUNT(*)

Duplicate Across Tables

Copy frequently accessed fields to avoid JOINs

Store author_name in posts table

Materialized Views

Pre-built query results stored as a table

CREATE MATERIALIZED VIEW sales_summary AS ...

4When to Denormalize

Read-heavy workload (90%+ reads)
Slow JOINs across large tables
Data changes infrequently
Consistency can be eventual
Query patterns are well-known

Don't denormalize prematurely. Start normalized, measure performance, then denormalize specific hot paths.

5Key Takeaways

1Denormalization trades write complexity for read performance
2Reduces JOINs by duplicating data across tables
3Best for read-heavy, write-light workloads
4Requires careful update logic to maintain consistency
5Measure first-only denormalize proven bottlenecks

?Quiz

1. A social media feed shows user name on every post. To avoid JOINs, you would:

2. Which workload benefits most from denormalization?