Module 1 - Data Storage
Denormalization
Trading data redundancy for read performance in high-scale systems.
1The Library Analogy
Simple Analogy
Normalized (Central Card Catalog): One master catalog. To find a book's author info, check the author card. Clean and organized, but more trips.
Denormalized (Author Info on Every Book): Print author bio on each book's cover. Redundant, but no trips to the catalog. Faster to read, harder to update.
Denormalized (Author Info on Every Book): Print author bio on each book's cover. Redundant, but no trips to the catalog. Faster to read, harder to update.
2What is Denormalization?
Denormalization: Intentionally adding redundant data to reduce expensive JOINs and speed up reads. The opposite of database normalization.
Benefits
- ✓ Faster reads (no JOINs)
- ✓ Simpler queries
- ✓ Better scalability
- ✓ Reduced database load
Trade-offs
- ✗ Data redundancy
- ✗ Harder to update
- ✗ Potential inconsistency
- ✗ More storage needed
3Common Techniques
Embed Related Data
Store user's name directly in orders table instead of user_id only
orders.user_name instead of JOIN usersPre-compute Aggregates
Store calculated totals instead of computing on every read
Store product.review_count instead of COUNT(*)Duplicate Across Tables
Copy frequently accessed fields to avoid JOINs
Store author_name in posts tableMaterialized Views
Pre-built query results stored as a table
CREATE MATERIALIZED VIEW sales_summary AS ...4When to Denormalize
✓Read-heavy workload (90%+ reads)
✓Slow JOINs across large tables
✓Data changes infrequently
✓Consistency can be eventual
✓Query patterns are well-known
Don't denormalize prematurely. Start normalized, measure performance, then denormalize specific hot paths.
5Key Takeaways
1Denormalization trades write complexity for read performance
2Reduces JOINs by duplicating data across tables
3Best for read-heavy, write-light workloads
4Requires careful update logic to maintain consistency
5Measure first-only denormalize proven bottlenecks
?Quiz
1. A social media feed shows user name on every post. To avoid JOINs, you would:
2. Which workload benefits most from denormalization?