Design a URL Shortener
Like bit.ly or TinyURL - convert long URLs to short, memorable links with analytics.
1Problem Statement
- Converts long URLs to short, unique URLs
- Redirects short URLs to original URLs
- Tracks click analytics (count, location, device)
- Supports custom short URLs
- Handles URL expiration
- Scales to billions of URLs
2Requirements
Functional Requirements
- ✓Shorten a long URL to a short URL
- ✓Redirect short URL to original URL
- ✓Custom short URL aliases
- ✓URL expiration (optional TTL)
- ✓Click analytics and statistics
- ✓API access for programmatic use
Non-Functional Requirements
- ✓High availability (99.99%)
- ✓Low latency redirects (< 100ms)
- ✓Scalable to 100M URLs/day
- ✓URL should be unpredictable
- ✓Minimal storage footprint
- ✓Globally distributed
3Back-of-Envelope Estimation
Traffic Estimates
- New URLs/day: 100M
- Read:Write ratio: 100:1
- Redirects/day: 10B
- Redirects/second: ~115K QPS
- New URLs/second: ~1.15K QPS
Storage Estimates
- URL retention: 5 years
- Total URLs: 100M × 365 × 5 = 182.5B
- Avg URL size: 500 bytes
- Total storage: ~90 TB
- With replication (3x): ~270 TB
Short URL Length
Using Base62 (a-z, A-Z, 0-9): 62^7 = 3.5 trillion unique URLs
7 characters is sufficient for 182.5B URLs with room to spare
4High-Level Design
System Architecture
┌─────────────────┐
│ CDN / Edge │
│ (Cloudflare) │
└────────┬────────┘
│
▼
┌─────────────┐ ┌─────────────────────────┐
│ Client │─────────────►│ Load Balancer │
│ (Browser) │ │ (AWS ALB/Nginx) │
└─────────────┘ └────────────┬────────────┘
│
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ API Server │ │ API Server │ │ API Server │
│ (Node.js) │ │ (Node.js) │ │ (Node.js) │
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
│ │ │
└──────────────┬──────┴──────────────┬──────┘
│ │
┌─────────────▼─────────────┐ │
│ Redis Cache │ │
│ (Hot URLs + Rate Limit) │ │
└─────────────┬─────────────┘ │
│ │
┌─────────────▼─────────────────────▼─────────────┐
│ Database Cluster │
│ (PostgreSQL / Cassandra / DynamoDB) │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Primary │ │ Replica │ │ Replica │ │
│ └─────────┘ └─────────┘ └─────────┘ │
└──────────────────────────────────────────────────┘
│
▼
┌─────────────────────┐
│ Analytics Queue │
│ (Kafka) │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Analytics Service │
│ (ClickHouse) │
└─────────────────────┘
5Key Flows
1. URL Shortening Flow
2. URL Redirect Flow
6Short URL Generation
Approach 1: Base62 Encoding of Counter
// Use distributed counter (Redis INCR or DB sequence)
long counter = getNextCounter(); // e.g., 123456789
String shortCode = base62Encode(counter); // e.g., "8M0kX"
// Base62 encoding
private static final String ALPHABET =
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
public String base62Encode(long num) {
StringBuilder sb = new StringBuilder();
while (num > 0) {
sb.append(ALPHABET.charAt((int)(num % 62)));
num /= 62;
}
return sb.reverse().toString();
}Approach 2: MD5/SHA Hash + Base62
// Hash the long URL, take first 7 chars of Base62 encoding
String hash = md5(longUrl); // 128-bit hash
String shortCode = base62Encode(hash.substring(0, 12)); // First 48 bits
shortCode = shortCode.substring(0, 7); // Take 7 characters
// Handle collision: append counter or regenerate
if (exists(shortCode)) {
shortCode = base62Encode(hash + counter++).substring(0, 7);
}Approach 3: Pre-generated Keys (Key Generation Service)
// Separate KGS generates keys offline and stores in DB
// Two tables: unused_keys and used_keys
// KGS pre-generates millions of unique 7-char keys
// API servers request batch of keys (e.g., 1000 at a time)
// Keys moved from unused to used on assignment
List<String> keyBatch = kgs.getKeyBatch(1000);
// Use from local cache, request new batch when low7Database Schema
CREATE TABLE urls (
id BIGSERIAL PRIMARY KEY,
short_code VARCHAR(10) UNIQUE NOT NULL,
long_url TEXT NOT NULL,
user_id BIGINT,
created_at TIMESTAMP DEFAULT NOW(),
expires_at TIMESTAMP,
click_count BIGINT DEFAULT 0,
is_active BOOLEAN DEFAULT TRUE,
INDEX idx_short_code (short_code),
INDEX idx_user_id (user_id),
INDEX idx_expires_at (expires_at)
);
-- For high-scale: Partition by created_at or short_code hash
-- Consider NoSQL (DynamoDB, Cassandra) for better horizontal scalingCREATE TABLE url_clicks (
click_id UUID,
short_code String,
clicked_at DateTime,
ip_address String,
user_agent String,
country String,
city String,
device_type String,
referrer String
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(clicked_at)
ORDER BY (short_code, clicked_at);
-- Aggregated view for fast queries
CREATE MATERIALIZED VIEW url_stats AS
SELECT
short_code,
toDate(clicked_at) as date,
count() as clicks,
uniqExact(ip_address) as unique_visitors
FROM url_clicks
GROUP BY short_code, date;8API Design
Core APIs
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/shorten | Create short URL from long URL |
| GET | /{shortCode} | Redirect to original URL |
| GET | /api/v1/urls/{shortCode} | Get URL details and stats |
| DELETE | /api/v1/urls/{shortCode} | Delete/deactivate short URL |
| GET | /api/v1/urls/{shortCode}/analytics | Get click analytics |
// POST /api/v1/shorten
{
"longUrl": "https://example.com/very/long/path?param=value",
"customAlias": "mylink", // optional
"expiresAt": "2025-12-31T23:59:59Z" // optional
}
// Response
{
"shortUrl": "https://short.ly/mylink",
"shortCode": "mylink",
"longUrl": "https://example.com/very/long/path?param=value",
"createdAt": "2024-01-15T10:30:00Z",
"expiresAt": "2025-12-31T23:59:59Z"
}
// GET /api/v1/urls/mylink/analytics
{
"shortCode": "mylink",
"totalClicks": 15234,
"uniqueVisitors": 8921,
"clicksByDay": [
{ "date": "2024-01-15", "clicks": 1200 },
{ "date": "2024-01-14", "clicks": 980 }
],
"topCountries": [
{ "country": "US", "clicks": 5000 },
{ "country": "IN", "clicks": 3500 }
],
"topDevices": [
{ "device": "mobile", "clicks": 9000 },
{ "device": "desktop", "clicks": 6234 }
]
}9Caching Strategy
Redis Cache
Cache hot URLs for fast redirects
- • Key: short_code
- • Value: long_url + metadata
- • TTL: 24 hours or until expiry
- • Eviction: LRU
- • Expected hit rate: 80%+
Local Cache (Application)
In-memory cache for ultra-hot URLs
- • Top 1000 URLs in Caffeine/Guava cache
- • TTL: 5 minutes
- • Refresh async on access
- • Reduces Redis calls by 30%
Cache Flow
Request ──► Local Cache ──[HIT]──► Return URL
│
[MISS]
│
▼
Redis Cache ──[HIT]──► Return URL (update local)
│
[MISS]
│
▼
Database ──► Return URL (update Redis + local)
10Scaling Strategies
Database Sharding
- →Shard by first 2 chars of short_code (62² = 3844 shards)
- →Or hash-based sharding for even distribution
- →Each shard handles ~50M URLs
- →Use Vitess or CockroachDB for auto-sharding
Read Replicas
- →Redirect reads go to replicas (100:1 read:write)
- →Async replication with < 100ms lag
- →Region-specific replicas for low latency
- →Writes only to primary, reads from nearest replica
CDN / Edge Caching
- →Cache redirects at edge (Cloudflare Workers)
- →~50% of requests served from edge
- →TTL: 5 minutes for popular URLs
- →Invalidate on URL update/delete
Rate Limiting
- →100 shortens/hour per user (free tier)
- →1000 shortens/hour per API key (paid)
- →Use Redis sliding window counter
- →Return 429 with Retry-After header
11301 vs 302 Redirect
| Aspect | 301 (Permanent) | 302 (Temporary) |
|---|---|---|
| Browser Caching | Cached indefinitely | Not cached |
| Server Load | Lower (browser skips server) | Higher (every click hits server) |
| Analytics | Misses repeat visits | Captures all clicks |
| URL Updates | Problematic (stale cache) | Works instantly |
| SEO | Passes link juice | Doesn't pass link juice |
12Security Considerations
Malicious URL Prevention
- ⚠Integrate with Google Safe Browsing API
- ⚠Block known phishing domains
- ⚠Scan URLs before shortening
- ⚠Allow users to report malicious links
Abuse Prevention
- ⚠Rate limiting per IP and user
- ⚠CAPTCHA for anonymous users
- ⚠Block disposable email domains
- ⚠Honeypot fields in forms
URL Validation
- ⚠Validate URL format and protocol
- ⚠Block localhost and internal IPs
- ⚠Prevent redirect loops
- ⚠Sanitize URL parameters
Access Control
- ⚠Private URLs with password protection
- ⚠User authentication for analytics
- ⚠API key rotation and scoping
- ⚠Audit logs for all operations
13Interview Follow-up Questions
How to handle expired URLs?
Background job runs hourly to soft-delete expired URLs. Redis TTL auto-evicts. Return 410 Gone for expired links.
Same long URL = same short URL?
Optional. Hash the long URL to check if already exists. Trade-off: extra DB lookup vs storage savings.
How to ensure uniqueness at scale?
Use distributed counter (Redis INCR) or pre-generated keys (KGS). Avoid hashing alone due to collisions.
Multi-region deployment?
DNS-based routing to nearest region. Each region has full stack. Async replication for URL data.
Analytics at scale?
Async processing via Kafka. Use ClickHouse for OLAP queries. Pre-aggregate common metrics.
Custom domain support?
Allow users to bring their own domain. CNAME to our servers. SSL certificate provisioning via Let's Encrypt.
14Key Takeaways
?Quiz
1. How many unique URLs can 7 Base62 characters represent?
2. Which redirect status code is better for analytics?
3. What's the best approach to avoid collisions in short URL generation?