HLD Problem
Design Netflix
Design a video streaming platform with content delivery, adaptive streaming, personalized recommendations, and offline downloads at global scale.
50 min readHard
1Requirements Gathering
Functional Requirements
- •Browse and search content catalog
- •Stream video in multiple qualities
- •Adaptive bitrate streaming
- •User profiles (multiple per account)
- •Watchlist and viewing history
- •Personalized recommendations
- •Offline download (mobile)
- •Resume playback across devices
- •Subtitles and multiple audio tracks
Non-Functional Requirements
- •High availability (99.99%)
- •Low latency playback start (< 2s)
- •Support 200M+ subscribers
- •Handle peak load (millions concurrent)
- •Global content delivery
- •Handle 15% of global internet traffic
- •Minimize buffering
2Capacity Estimation
Scale Numbers
230M+
Subscribers
15,000+
Titles
15%
Global Internet Traffic
200M+
Hours Watched/Day
Storage Estimates
Average video size (all qualities)~10 GB per title
Total content storage (15K titles)~150 PB
CDN cache (hot content per region)~10 PB per region
Bandwidth Estimates
Peak concurrent viewers: ~10 million
Average bitrate: 5 Mbps
Peak bandwidth: ~50 Tbps globally
3High-Level Architecture
System Architecture
Smart TV
Mobile
Web
Gaming Console
↓
Open Connect CDN
1000s of servers in ISPs worldwide
Video content cached at edge
↓
Control Plane (AWS)
API Gateway
Client requests
Playback Service
Stream orchestration
User Service
Auth, profiles
Catalog Service
Content metadata
Recommendation
ML predictions
Search Service
Elasticsearch
Billing Service
Subscriptions
Download Service
Offline content
↓
Cassandra
Viewing history
MySQL
User accounts
Redis
Session, cache
S3
Video masters
Open Connect: Netflix's Secret Weapon
Netflix built its own CDN called Open Connect. They place servers (OCAs - Open Connect Appliances) directly inside ISP data centers. This serves 95%+ of video traffic without crossing the internet, reducing bandwidth costs and improving quality.
4Video Processing Pipeline
4.1 Video Encoding
1
Ingest
Receive master video file (4K, ProRes)
2
Analysis
Shot detection, complexity analysis for encoding
3
Encode
Encode to multiple bitrates/resolutions (1000+ encodes per title)
4
Package
Create chunks (4-second segments) for adaptive streaming
5
Encrypt
DRM encryption (Widevine, PlayReady, FairPlay)
6
Distribute
Push to S3, then to Open Connect OCAs globally
4.2 Encoding Ladder
| Resolution | Bitrate | Use Case |
|---|---|---|
| 4K (2160p) | 15-25 Mbps | Premium TVs |
| 1080p | 5-8 Mbps | HD displays |
| 720p | 3-5 Mbps | Tablets, laptops |
| 480p | 1-2 Mbps | Mobile (cellular) |
| 240p | 0.3-0.5 Mbps | Low bandwidth |
Netflix uses per-title encoding - each title gets optimized encoding based on its complexity. Animation needs less bitrate than action movies.
5Adaptive Bitrate Streaming
How ABR Works
1. Video split into small chunks (4 seconds each)
2. Each chunk available in multiple qualities
3. Client monitors buffer and bandwidth
4. Client requests appropriate quality for next chunk
5. Quality can change every 4 seconds based on conditions
DASH (Dynamic Adaptive Streaming)
- Industry standard
- Works on Android, browsers
- MPD manifest file
HLS (HTTP Live Streaming)
- Apple's protocol
- Required for iOS/Safari
- m3u8 playlist file
Manifest File Structure (simplified):
video/
├── manifest.mpd
├── 240p/
│ ├── chunk_0001.m4s
│ ├── chunk_0002.m4s
│ └── ...
├── 480p/
│ └── ...
├── 720p/
│ └── ...
├── 1080p/
│ └── ...
└── 4k/
└── ...6Recommendation System
Recommendation Signals
Explicit Signals:
- Thumbs up/down ratings
- Add to watchlist
- Search queries
Implicit Signals:
- Watch history and duration
- Time of day watching
- Device used
- Pause/rewind behavior
Collaborative Filtering
Users who watched X also watched Y
Content-Based
Similar genre, actors, directors
Matrix Factorization
SVD to find latent factors
7Scaling Strategies
Open Connect CDN
- Servers inside ISP networks
- 95%+ traffic served from edge
- Proactive content caching
- Netflix controls hardware
Microservices on AWS
- 500+ microservices
- Zuul for API gateway
- Eureka for service discovery
- Hystrix for circuit breaking
Chaos Engineering
- Chaos Monkey kills instances
- Tests resilience continuously
- Simulates AWS region failure
- Built-in fault tolerance
Caching Strategy
- EVCache (memcached) clusters
- Cache video metadata
- Session caching
- 30M+ requests/sec from cache
8Key Takeaways
1Open Connect CDN - servers inside ISPs serve 95%+ of traffic.
2Adaptive bitrate streaming - quality changes based on bandwidth.
3Per-title encoding - optimized encoding for each content type.
4Microservices + Chaos - 500+ services, tested with Chaos Monkey.
5Pre-positioning content - popular content cached before peak hours.
6Control plane on AWS, data plane on Open Connect.