HLD Problem

Design WhatsApp

Design a messaging platform with real-time chat, group messaging, media sharing, voice/video calls, and end-to-end encryption at massive scale.

45 min readHard

1Requirements Gathering

Functional Requirements
  • One-on-one messaging
  • Group chats (up to 1024 members)
  • Online/last seen status
  • Read receipts (delivered, read)
  • Media sharing (images, videos, docs)
  • End-to-end encryption
  • Voice and video calls
  • Message search
  • Push notifications
Non-Functional Requirements
  • High availability (99.99%)
  • Low latency (<100ms message delivery)
  • Support 2B+ users
  • Handle 100B+ messages/day
  • Message ordering guaranteed
  • Reliable delivery (at-least-once)
  • Offline message sync

2Capacity Estimation

Scale Numbers

2B+
Users
100B+
Messages/Day
1M+
Messages/Sec Peak
65B
Messages Stored

Storage Estimates

Average message size100 bytes
Daily message storage (100B × 100B)10 TB/day
Media storage (20% with media)500 TB/day

3High-Level Architecture

System Architecture

iOS App
Android App
Web App
↓ WebSocket (persistent)
WebSocket Gateway
Millions of persistent connections
Chat Service
Message routing
User Service
Auth, profiles
Group Service
Group management
Presence Service
Online status
Media Service
File uploads
Notification
Push/offline
Sync Service
Offline sync
Encryption
E2E keys
Cassandra
Messages
Redis
Sessions, presence
MySQL
Users, groups
S3/Blob
Media files

4Core Components Deep Dive

4.1 Message Delivery Flow

1
User A sends message
Message sent via WebSocket to gateway
2
Gateway routes to Chat Service
Validates, stores message in DB
3
Check recipient status
Query Presence Service - is User B online?
4
If online
Push via WebSocket to User B's device
5
If offline
Store in pending queue, send push notification
6
Acknowledgment
User B sends read receipt back to User A

4.2 Message States

Sending
Uploading to server
Sent
Server received
✓✓
Delivered
Recipient's device got it
✓✓
Read
Recipient opened chat

4.3 Group Messaging

Small Groups (<100)
  • Fan-out on write to all members
  • Store message once, send to each
  • Acceptable latency overhead
Large Groups (100-1024)
  • Store message once
  • Members pull on demand
  • Hybrid push for active members

4.4 End-to-End Encryption

WhatsApp uses Signal Protocol for E2E encryption. Server cannot read messages.
Key Exchange: Double Ratchet Algorithm - new key per message
Identity Keys: Long-term public key stored on server
Pre-Keys: One-time keys for offline key exchange
Session Key: Derived shared secret, rotates frequently
Why E2E Matters
With E2E encryption, even WhatsApp servers cannot read messages. Only sender and recipient devices have the keys. This adds complexity but ensures privacy.

5Database Design

Messages Table (Cassandra)
messages ├── chat_id (partition key) ├── message_id (clustering key) ├── sender_id ├── content (encrypted) ├── media_url ├── message_type ├── created_at └── status
Partition by chat_id for fast retrieval of conversation history.
User Sessions (Redis)
user:123:sessions → { "device1": "gateway-server-5", "device2": "gateway-server-12" } user:123:presence → { "status": "online", "last_seen": 1705234567 }

6Scaling Strategies

WebSocket Connections
  • Each server handles ~1M connections
  • Thousands of gateway servers
  • Sticky sessions via consistent hashing
  • Fallback to long-polling if needed
Message Storage
  • Cassandra for horizontal scaling
  • Partition by chat_id
  • Time-series compaction
  • Messages deleted after sync to all devices
Presence Service
  • Redis cluster for low latency
  • Heartbeat every 30 seconds
  • Pub/sub for status changes
  • Eventual consistency acceptable
Media Delivery
  • Upload to blob storage (S3)
  • CDN for downloads
  • Media encrypted client-side
  • Links expire after download

7Key Takeaways

1WebSocket for persistent, bi-directional real-time communication.
2Cassandra for message storage - partitioned by chat_id.
3E2E encryption using Signal Protocol - server cannot read messages.
4Presence via Redis with heartbeats and pub/sub.
5Fan-out for small groups, pull for large groups.
6At-least-once delivery with deduplication on client.