Module 2 — Traffic & Load Management
Throttling & Load Shedding
When your system is overloaded, gracefully degrade service instead of crashing completely.
8 min readSurvival Strategies
1The Emergency Room Analogy
Simple Analogy
Imagine an emergency room during a crisis. When overwhelmed with patients, they don't try to treat everyone at once (that would kill everyone). Instead, they triage: critical patients first, less urgent cases wait, and some might be redirected to other hospitals.
This is exactly what throttling (slow down) and load shedding (reject some) do for your servers.
This is exactly what throttling (slow down) and load shedding (reject some) do for your servers.
2Throttling vs Rate Limiting vs Load Shedding
| Technique | What It Does | When Used |
|---|---|---|
| Rate Limiting | Limits requests per client | Always (per-client fairness) |
| Throttling | Slows down request processing | When system is stressed |
| Load Shedding | Rejects requests entirely | Emergency: prevent total failure |
System Load Response
Normal
40%
All requests served
High
70%
Throttling kicks in
Critical
90%
Load shedding active
3Throttling Strategies
Response Delay
Add artificial delay to responses when load is high. Reduces throughput smoothly.
if (load > 70%) delay(100ms)Queue Requests
Put excess requests in a queue instead of processing immediately.
queue.add(request); process_later()Reduce Quality
Return cached/stale data or lower resolution images when stressed.
if (stressed) return cached_responseDisable Features
Temporarily disable non-critical features (recommendations, analytics).
if (overloaded) skip_recommendations()4Load Shedding Strategies
When throttling isn't enough, you need to reject requests to survive:
1
Random Shedding
Randomly reject X% of requests. Simple but doesn't consider request importance.
2
Priority-Based Shedding
Reject low-priority requests first (analytics before checkout). Keep critical paths alive.
3
LIFO (Last In, First Out)
Reject newest requests. Older requests already waited, might as well finish them.
4
Client-Based Shedding
Shed requests from clients already over their quota, or non-paying users first.
Priority-Based Load Shedding
function handle_request(req):
current_load = get_system_load()
if current_load > 95%:
# Critical: only process essential requests
if req.priority != "critical":
return 503_Service_Unavailable
elif current_load > 80%:
# High load: shed low-priority requests
if req.priority == "low":
return 503_Service_Unavailable
# Process the request
return process(req)5Graceful Degradation
Instead of complete failure, provide reduced functionality:
NetflixNormal: HD streaming with recommendationsDegraded: SD streaming, no recommendations
AmazonNormal: Real-time inventory, personalized resultsDegraded: Cached inventory, generic results
TwitterNormal: Full timeline with mediaDegraded: Text-only timeline, cached data
UberNormal: Precise ETA, surge pricingDegraded: Estimated ETA, fixed pricing
Design Principle
Design your system with degradation in mind from the start. Know which features are essential vs nice-to-have.
6Implementation Checklist
✓
Monitor system load (CPU, memory, queue depth, latency)✓
Define load thresholds (70% = throttle, 90% = shed)✓
Classify requests by priority (critical, normal, low)✓
Implement circuit breakers for downstream services✓
Return proper 503 with Retry-After header✓
Alert on-call when shedding is active✓
Test load shedding regularly (chaos engineering)7Key Takeaways
1Throttling slows down; Load shedding rejects. Both prevent total failure.
2Use priority-based shedding—protect critical paths (checkout) over nice-to-haves (recommendations).
3Design for graceful degradation from day one. Know what to sacrifice.
4Return 503 with Retry-After so clients know to back off.
5Monitor and alert when shedding is active—it's a sign you need to scale.
6Test your shedding logic regularly with chaos engineering.