HLD Problem

Design YouTube

Design a video sharing platform with uploads, transcoding, streaming, recommendations, comments, and live streaming at global scale.

50 min readHard

1Requirements Gathering

Functional Requirements
  • Upload videos (up to several hours long)
  • Stream/watch videos
  • Search videos by title, tags
  • Like, comment, subscribe
  • Video recommendations
  • View count and analytics
  • Multiple video qualities
  • Thumbnails and previews
  • Live streaming
Non-Functional Requirements
  • High availability (99.99%)
  • Low latency streaming (< 200ms start)
  • Support 2B+ users, 1B hours/day
  • Handle 500+ hours uploaded/minute
  • Global content delivery
  • Scalable video processing
  • Cost-effective storage

2Capacity Estimation

Scale Numbers

2B+
Monthly Users
1B
Hours Watched/Day
500
Hours Uploaded/Min
800M
Videos Total

Storage Estimates

Videos uploaded per day720,000 hours
Average video size (all qualities)~2 GB per hour
Daily storage increase~1.5 PB/day
Total storage (estimated)~1 EB (Exabyte)

Bandwidth Estimates

Peak concurrent viewers: ~100 million

Average bitrate: 5 Mbps

Peak egress: ~500 Tbps globally

3High-Level Architecture

System Architecture

Web
Mobile
Smart TV
Gaming
Global CDN
Google's edge network - video streaming
API Gateway
Load Balancer
Upload Service
Video ingestion
Transcoding
Video processing
Video Service
Metadata, CRUD
User Service
Auth, channels
Search Service
Video discovery
Recommendation
ML suggestions
Comment Service
Discussions
Analytics
Views, metrics
Cloud Storage
Video files
Bigtable
Video metadata
MySQL
Users, channels
Elasticsearch
Search index

4Video Upload Pipeline

1
Upload starts
Client uploads to Upload Service via resumable protocol
2
Store original
Raw video stored in cloud storage (GCS/S3)
3
Queue for processing
Message sent to transcoding queue
4
Transcoding
Parallel encode to multiple resolutions (144p to 4K)
5
Generate assets
Create thumbnails, captions, chapters
6
Store outputs
Encoded videos stored, CDN notified
7
Video live
Metadata updated, video searchable and playable

4.1 Video Transcoding

ResolutionBitrateCodec
4K (2160p)20-50 MbpsVP9, AV1
1080p5-10 MbpsH.264, VP9
720p2.5-5 MbpsH.264, VP9
480p1-2 MbpsH.264
360p0.5-1 MbpsH.264
144p0.1-0.3 MbpsH.264
YouTube uses VP9 and AV1 for better compression. AV1 provides 30% better compression than VP9 but requires more encoding time.

5Video Streaming

DASH (Dynamic Adaptive Streaming)
  • Industry standard
  • Segment-based streaming
  • MPD manifest file
  • Works on most platforms
Adaptive Bitrate
  • Client monitors buffer
  • Switches quality dynamically
  • Minimizes buffering
  • Optimizes for bandwidth
Segment Structure:
video_abc123/
├── manifest.mpd
├── init.mp4           # initialization segment
├── 144p/
│   ├── seg_001.m4s
│   ├── seg_002.m4s
│   └── ...
├── 360p/
├── 720p/
├── 1080p/
└── 4k/

6Recommendation System

Recommendation Signals

Engagement:
  • Watch time
  • Likes/dislikes
  • Comments
  • Shares
Content:
  • Video metadata
  • Transcripts
  • Visual features
  • Category/tags
User Context:
  • Watch history
  • Search queries
  • Subscriptions
  • Demographics
Candidate Generation
Deep neural networks filter millions of videos to thousands of candidates
Ranking
Rank candidates by predicted watch time, CTR, and engagement

7Scaling Strategies

Video Storage
  • Distributed storage (GCS/Colossus)
  • Erasure coding for redundancy
  • Tiered storage (hot/cold)
  • Delete rarely-watched encodings
Transcoding
  • Distributed job queue (Pub/Sub)
  • Horizontal scaling of workers
  • Priority queues (premium vs free)
  • GPU acceleration for encoding
CDN & Delivery
  • Google's global edge network
  • Cache popular videos at edge
  • Predictive pre-caching
  • Regional origin servers
View Counting
  • Batch counting (not real-time)
  • Approximate counting (HyperLogLog)
  • Deduplicate by user session
  • Anti-fraud for fake views

8Key Takeaways

1Async video processing - transcoding happens in background after upload.
2Multiple resolutions - encode to 6+ quality levels for adaptive streaming.
3CDN is critical - serve videos from edge locations globally.
4DASH/HLS for adaptive bitrate streaming with segment-based delivery.
5View counting is eventually consistent, batched, and deduplicated.
6Recommendation uses deep learning with candidate generation + ranking.