HLD Problem
Design YouTube
Design a video sharing platform with uploads, transcoding, streaming, recommendations, comments, and live streaming at global scale.
50 min readHard
1Requirements Gathering
Functional Requirements
- •Upload videos (up to several hours long)
- •Stream/watch videos
- •Search videos by title, tags
- •Like, comment, subscribe
- •Video recommendations
- •View count and analytics
- •Multiple video qualities
- •Thumbnails and previews
- •Live streaming
Non-Functional Requirements
- •High availability (99.99%)
- •Low latency streaming (< 200ms start)
- •Support 2B+ users, 1B hours/day
- •Handle 500+ hours uploaded/minute
- •Global content delivery
- •Scalable video processing
- •Cost-effective storage
2Capacity Estimation
Scale Numbers
2B+
Monthly Users
1B
Hours Watched/Day
500
Hours Uploaded/Min
800M
Videos Total
Storage Estimates
Videos uploaded per day720,000 hours
Average video size (all qualities)~2 GB per hour
Daily storage increase~1.5 PB/day
Total storage (estimated)~1 EB (Exabyte)
Bandwidth Estimates
Peak concurrent viewers: ~100 million
Average bitrate: 5 Mbps
Peak egress: ~500 Tbps globally
3High-Level Architecture
System Architecture
Web
Mobile
Smart TV
Gaming
↓
Global CDN
Google's edge network - video streaming
↓
API Gateway
Load Balancer
↓
Upload Service
Video ingestion
Transcoding
Video processing
Video Service
Metadata, CRUD
User Service
Auth, channels
Search Service
Video discovery
Recommendation
ML suggestions
Comment Service
Discussions
Analytics
Views, metrics
↓
Cloud Storage
Video files
Bigtable
Video metadata
MySQL
Users, channels
Elasticsearch
Search index
4Video Upload Pipeline
1
Upload starts
Client uploads to Upload Service via resumable protocol
2
Store original
Raw video stored in cloud storage (GCS/S3)
3
Queue for processing
Message sent to transcoding queue
4
Transcoding
Parallel encode to multiple resolutions (144p to 4K)
5
Generate assets
Create thumbnails, captions, chapters
6
Store outputs
Encoded videos stored, CDN notified
7
Video live
Metadata updated, video searchable and playable
4.1 Video Transcoding
| Resolution | Bitrate | Codec |
|---|---|---|
| 4K (2160p) | 20-50 Mbps | VP9, AV1 |
| 1080p | 5-10 Mbps | H.264, VP9 |
| 720p | 2.5-5 Mbps | H.264, VP9 |
| 480p | 1-2 Mbps | H.264 |
| 360p | 0.5-1 Mbps | H.264 |
| 144p | 0.1-0.3 Mbps | H.264 |
YouTube uses VP9 and AV1 for better compression. AV1 provides 30% better compression than VP9 but requires more encoding time.
5Video Streaming
DASH (Dynamic Adaptive Streaming)
- Industry standard
- Segment-based streaming
- MPD manifest file
- Works on most platforms
Adaptive Bitrate
- Client monitors buffer
- Switches quality dynamically
- Minimizes buffering
- Optimizes for bandwidth
Segment Structure:
video_abc123/ ├── manifest.mpd ├── init.mp4 # initialization segment ├── 144p/ │ ├── seg_001.m4s │ ├── seg_002.m4s │ └── ... ├── 360p/ ├── 720p/ ├── 1080p/ └── 4k/
6Recommendation System
Recommendation Signals
Engagement:
- Watch time
- Likes/dislikes
- Comments
- Shares
Content:
- Video metadata
- Transcripts
- Visual features
- Category/tags
User Context:
- Watch history
- Search queries
- Subscriptions
- Demographics
Candidate Generation
Deep neural networks filter millions of videos to thousands of candidates
Ranking
Rank candidates by predicted watch time, CTR, and engagement
7Scaling Strategies
Video Storage
- Distributed storage (GCS/Colossus)
- Erasure coding for redundancy
- Tiered storage (hot/cold)
- Delete rarely-watched encodings
Transcoding
- Distributed job queue (Pub/Sub)
- Horizontal scaling of workers
- Priority queues (premium vs free)
- GPU acceleration for encoding
CDN & Delivery
- Google's global edge network
- Cache popular videos at edge
- Predictive pre-caching
- Regional origin servers
View Counting
- Batch counting (not real-time)
- Approximate counting (HyperLogLog)
- Deduplicate by user session
- Anti-fraud for fake views
8Key Takeaways
1Async video processing - transcoding happens in background after upload.
2Multiple resolutions - encode to 6+ quality levels for adaptive streaming.
3CDN is critical - serve videos from edge locations globally.
4DASH/HLS for adaptive bitrate streaming with segment-based delivery.
5View counting is eventually consistent, batched, and deduplicated.
6Recommendation uses deep learning with candidate generation + ranking.