Rate Limiting
Control how many requests a client can make. Protect your APIs from abuse and overload.
1What is Rate Limiting?
Why Rate Limiting?
Stop bad actors from overwhelming your service with requests
One user shouldn't monopolize resources at others' expense
Prevent cascading failures when demand exceeds capacity
Limit expensive operations like AI calls or third-party APIs
2Rate Limiting Algorithms
There are several algorithms to implement rate limiting, each with different trade-offs:
1. Token Bucket
- 1. Bucket holds tokens (e.g., max 10 tokens)
- 2. Tokens added at constant rate (e.g., 1 per second)
- 3. Each request takes 1 token from bucket
- 4. No tokens? Request rejected (429)
- 5. Bucket never exceeds max capacity
class TokenBucket:
def __init__(self, capacity=10, refill_rate=1):
self.capacity = capacity # Max tokens
self.tokens = capacity # Current tokens
self.refill_rate = refill_rate # Tokens per second
self.last_refill = time.now()
def allow_request(self):
self._refill()
if self.tokens >= 1:
self.tokens -= 1
return True # Request allowed
return False # Rate limited (429)
def _refill(self):
now = time.now()
elapsed = now - self.last_refill
self.tokens = min(self.capacity, self.tokens + elapsed * self.refill_rate)
self.last_refill = now2. Sliding Window Log
- 1. Store timestamp of every request
- 2. On new request, remove timestamps older than window
- 3. Count remaining timestamps
- 4. If count < limit, allow. Else reject.
3. Fixed Window Counter
- 1. Divide time into fixed windows (e.g., 1 minute)
- 2. Counter for each window
- 3. Increment counter on each request
- 4. If counter > limit, reject until window ends
4. Sliding Window Counter
- 1. Track counts for current and previous window
- 2. Calculate weighted sum based on position in window
- 3. At 10:00:15: prev_count × 0.75 + curr_count × 0.25
- 4. Compare weighted sum to limit
Algorithm Comparison
| Algorithm | Memory | Accuracy | Burst Handling | Complexity |
|---|---|---|---|---|
| Token Bucket | Low | High | Allows bursts | Medium |
| Sliding Window Log | High | Perfect | Smooth | Medium |
| Fixed Window | Very Low | Low | Boundary burst | Simple |
| Sliding Window Counter | Low | Good | Smooth | Medium |
3What to Rate Limit By
Each authenticated user gets their own limit
user:12345 → 100 requests/hourEach API key (usually per application) gets a limit
api_key:sk_live_xxx → 1000 requests/hourLimit by client IP address
ip:192.168.1.1 → 50 requests/minuteDifferent limits for different API endpoints
/search → 10/min, /login → 5/min, /profile → 100/minUse multiple rate limit keys together: user_id + endpoint or api_key + IP. This prevents a single user from hammering one expensive endpoint.
4Where to Implement
Rate limit before traffic reaches your servers
Rate limit within your application code
Centralized rate limiting service with Redis/Memcached
Distributed Rate Limiting with Redis
-- Sliding window counter in Redis (Lua script for atomicity)
local key = KEYS[1]
local window = tonumber(ARGV[1]) -- window size in seconds
local limit = tonumber(ARGV[2]) -- max requests
local now = tonumber(ARGV[3]) -- current timestamp
-- Remove old entries
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
-- Count current requests
local count = redis.call('ZCARD', key)
if count < limit then
-- Add current request
redis.call('ZADD', key, now, now .. '-' .. math.random())
redis.call('EXPIRE', key, window)
return 1 -- Allowed
else
return 0 -- Rate limited
end5Rate Limit Response
Standard HTTP Response Headers
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1640000000
{
"error": "rate_limit_exceeded",
"message": "Too many requests. Please retry after 30 seconds.",
"retry_after": 30
}429 Too Many RequestsRetry-After: 30X-RateLimit-Limit: 100X-RateLimit-Remaining: 0Be careful about information disclosure. Detailed rate limit info helps legitimate clients but also helps attackers know exactly how fast they can abuse your API.