Module 11 - Interview Prep

Estimation Techniques

Back-of-envelope math to size your system-quick and confident.

1Why Estimation Matters

Simple Analogy
Before building a bridge, engineers estimate the load it needs to carry. You wouldn't design a footbridge and then learn it needs to carry 18-wheelers. System design is the same-estimate traffic, storage, and bandwidth before choosing architecture.

Back-of-envelope estimation uses rough approximations to determine system scale. Precision isn't the goal-order of magnitude (10x) accuracy is.

2Numbers to Memorize

Power of 2Exact ValueApproximation
2^101,024~1 Thousand (1 KB)
2^201,048,576~1 Million (1 MB)
2^301,073,741,824~1 Billion (1 GB)
2^401.1 trillion~1 Trillion (1 TB)
TimeSeconds
1 day86,400 ≈ 100,000 (10^5)
1 month2.6M ≈ 2.5 × 10^6
1 year31.5M ≈ 30 × 10^6

3The Estimation Framework

1
Start with Users
DAU (daily active users) is your foundation. Everything derives from this.
2
Actions per User
How many reads/writes per user per day? Be specific.
3
Calculate QPS
QPS = (DAU × actions/user) / 86,400. Peak = 2-3x average.
4
Storage per Action
How much data per tweet, image, message? Include metadata.
5
Total Storage
Storage/year = actions/day × 365 × size/action
6
Bandwidth
Bandwidth = QPS × response size (for reads)

4Worked Example: Twitter

Given: 500M DAU, 300M tweets/day
Write QPS (tweets)
300M / 86,400 ≈ 3,500 QPS
Peak: ~10K QPS
Read QPS (feed views)
500M users × 10 views/day / 86,400
~60K QPS, Peak: ~180K
Tweet storage
280 chars + metadata ≈ 500 bytes
500 bytes/tweet
Daily storage
300M tweets × 500 bytes
~150 GB/day
Yearly storage (text)
150 GB × 365
~55 TB/year
With media (images)
50% have images, avg 500KB
+ 27 PB/year

5Common Gotchas

Forgetting Peak Load

Average is not enough. Peak can be 2-10x average.

Ignoring Read/Write Ratio

Most systems are read-heavy (100:1). Design for the dominant pattern.

Forgetting Replication

3x replication = 3x storage. Factor this into estimates.

Not Including Metadata

A 'tweet' is not just text. Include user info, timestamps, IDs.

Ignoring Growth

Design for 3-5 year growth. Current scale is not enough.

6Quick Reference Cheat Sheet

Text

  • Tweet: ~500 bytes
  • Message: ~1 KB
  • Email: ~50 KB
  • Article: ~100 KB

Media

  • Profile pic: ~50 KB
  • Photo: ~500 KB - 2 MB
  • Video minute: ~50 MB
  • Video (compressed): ~5 MB/min

Server Capacity

  • Web server: ~10K QPS
  • DB (writes): ~1K QPS
  • DB (reads): ~10K QPS
  • Redis: ~100K QPS

Network

  • Intra-datacenter: 1 Gbps+
  • Internet: 100 Mbps user
  • CDN: unlimited (effectively)
  • Mobile: 10-50 Mbps

7Key Takeaways

1Memorize powers of 2 and time conversions (86,400 sec/day)
2Start with DAU → actions/user → QPS → storage → bandwidth
3Peak = 2-3x average. Design for peak, not average.
4Show your work. Process matters more than exact numbers.
5Aim for order of magnitude accuracy, not precision.

?Quiz

1. 500M DAU, each user sends 2 messages/day. Average QPS?

2. 100M tweets/day, 500 bytes each. Daily storage?