HLD Problem

Design Netflix

Design a video streaming platform with content delivery, adaptive streaming, personalized recommendations, and offline downloads at global scale.

50 min readHard

1Requirements Gathering

Functional Requirements
  • Browse and search content catalog
  • Stream video in multiple qualities
  • Adaptive bitrate streaming
  • User profiles (multiple per account)
  • Watchlist and viewing history
  • Personalized recommendations
  • Offline download (mobile)
  • Resume playback across devices
  • Subtitles and multiple audio tracks
Non-Functional Requirements
  • High availability (99.99%)
  • Low latency playback start (< 2s)
  • Support 200M+ subscribers
  • Handle peak load (millions concurrent)
  • Global content delivery
  • Handle 15% of global internet traffic
  • Minimize buffering

2Capacity Estimation

Scale Numbers

230M+
Subscribers
15,000+
Titles
15%
Global Internet Traffic
200M+
Hours Watched/Day

Storage Estimates

Average video size (all qualities)~10 GB per title
Total content storage (15K titles)~150 PB
CDN cache (hot content per region)~10 PB per region

Bandwidth Estimates

Peak concurrent viewers: ~10 million

Average bitrate: 5 Mbps

Peak bandwidth: ~50 Tbps globally

3High-Level Architecture

System Architecture

Smart TV
Mobile
Web
Gaming Console
Open Connect CDN
1000s of servers in ISPs worldwide
Video content cached at edge
Control Plane (AWS)
API Gateway
Client requests
Playback Service
Stream orchestration
User Service
Auth, profiles
Catalog Service
Content metadata
Recommendation
ML predictions
Search Service
Elasticsearch
Billing Service
Subscriptions
Download Service
Offline content
Cassandra
Viewing history
MySQL
User accounts
Redis
Session, cache
S3
Video masters
Open Connect: Netflix's Secret Weapon
Netflix built its own CDN called Open Connect. They place servers (OCAs - Open Connect Appliances) directly inside ISP data centers. This serves 95%+ of video traffic without crossing the internet, reducing bandwidth costs and improving quality.

4Video Processing Pipeline

4.1 Video Encoding

1
Ingest
Receive master video file (4K, ProRes)
2
Analysis
Shot detection, complexity analysis for encoding
3
Encode
Encode to multiple bitrates/resolutions (1000+ encodes per title)
4
Package
Create chunks (4-second segments) for adaptive streaming
5
Encrypt
DRM encryption (Widevine, PlayReady, FairPlay)
6
Distribute
Push to S3, then to Open Connect OCAs globally

4.2 Encoding Ladder

ResolutionBitrateUse Case
4K (2160p)15-25 MbpsPremium TVs
1080p5-8 MbpsHD displays
720p3-5 MbpsTablets, laptops
480p1-2 MbpsMobile (cellular)
240p0.3-0.5 MbpsLow bandwidth
Netflix uses per-title encoding - each title gets optimized encoding based on its complexity. Animation needs less bitrate than action movies.

5Adaptive Bitrate Streaming

How ABR Works

1. Video split into small chunks (4 seconds each)

2. Each chunk available in multiple qualities

3. Client monitors buffer and bandwidth

4. Client requests appropriate quality for next chunk

5. Quality can change every 4 seconds based on conditions

DASH (Dynamic Adaptive Streaming)
  • Industry standard
  • Works on Android, browsers
  • MPD manifest file
HLS (HTTP Live Streaming)
  • Apple's protocol
  • Required for iOS/Safari
  • m3u8 playlist file
Manifest File Structure (simplified):
video/
├── manifest.mpd
├── 240p/
│   ├── chunk_0001.m4s
│   ├── chunk_0002.m4s
│   └── ...
├── 480p/
│   └── ...
├── 720p/
│   └── ...
├── 1080p/
│   └── ...
└── 4k/
    └── ...

6Recommendation System

Recommendation Signals

Explicit Signals:
  • Thumbs up/down ratings
  • Add to watchlist
  • Search queries
Implicit Signals:
  • Watch history and duration
  • Time of day watching
  • Device used
  • Pause/rewind behavior
Collaborative Filtering
Users who watched X also watched Y
Content-Based
Similar genre, actors, directors
Matrix Factorization
SVD to find latent factors

7Scaling Strategies

Open Connect CDN
  • Servers inside ISP networks
  • 95%+ traffic served from edge
  • Proactive content caching
  • Netflix controls hardware
Microservices on AWS
  • 500+ microservices
  • Zuul for API gateway
  • Eureka for service discovery
  • Hystrix for circuit breaking
Chaos Engineering
  • Chaos Monkey kills instances
  • Tests resilience continuously
  • Simulates AWS region failure
  • Built-in fault tolerance
Caching Strategy
  • EVCache (memcached) clusters
  • Cache video metadata
  • Session caching
  • 30M+ requests/sec from cache

8Key Takeaways

1Open Connect CDN - servers inside ISPs serve 95%+ of traffic.
2Adaptive bitrate streaming - quality changes based on bandwidth.
3Per-title encoding - optimized encoding for each content type.
4Microservices + Chaos - 500+ services, tested with Chaos Monkey.
5Pre-positioning content - popular content cached before peak hours.
6Control plane on AWS, data plane on Open Connect.