Module 2 — Traffic & Load Management

Throttling & Load Shedding

When your system is overloaded, gracefully degrade service instead of crashing completely.

8 min readSurvival Strategies

1The Emergency Room Analogy

Simple Analogy
Imagine an emergency room during a crisis. When overwhelmed with patients, they don't try to treat everyone at once (that would kill everyone). Instead, they triage: critical patients first, less urgent cases wait, and some might be redirected to other hospitals.

This is exactly what throttling (slow down) and load shedding (reject some) do for your servers.

2Throttling vs Rate Limiting vs Load Shedding

TechniqueWhat It DoesWhen Used
Rate LimitingLimits requests per clientAlways (per-client fairness)
ThrottlingSlows down request processingWhen system is stressed
Load SheddingRejects requests entirelyEmergency: prevent total failure
System Load Response
Normal
40%
All requests served
High
70%
Throttling kicks in
Critical
90%
Load shedding active

3Throttling Strategies

Response Delay

Add artificial delay to responses when load is high. Reduces throughput smoothly.

if (load > 70%) delay(100ms)
Queue Requests

Put excess requests in a queue instead of processing immediately.

queue.add(request); process_later()
Reduce Quality

Return cached/stale data or lower resolution images when stressed.

if (stressed) return cached_response
Disable Features

Temporarily disable non-critical features (recommendations, analytics).

if (overloaded) skip_recommendations()

4Load Shedding Strategies

When throttling isn't enough, you need to reject requests to survive:

1
Random Shedding
Randomly reject X% of requests. Simple but doesn't consider request importance.
2
Priority-Based Shedding
Reject low-priority requests first (analytics before checkout). Keep critical paths alive.
3
LIFO (Last In, First Out)
Reject newest requests. Older requests already waited, might as well finish them.
4
Client-Based Shedding
Shed requests from clients already over their quota, or non-paying users first.
Priority-Based Load Shedding
function handle_request(req):
    current_load = get_system_load()
    
    if current_load > 95%:
        # Critical: only process essential requests
        if req.priority != "critical":
            return 503_Service_Unavailable
    
    elif current_load > 80%:
        # High load: shed low-priority requests
        if req.priority == "low":
            return 503_Service_Unavailable
    
    # Process the request
    return process(req)

5Graceful Degradation

Instead of complete failure, provide reduced functionality:

NetflixNormal: HD streaming with recommendationsDegraded: SD streaming, no recommendations
AmazonNormal: Real-time inventory, personalized resultsDegraded: Cached inventory, generic results
TwitterNormal: Full timeline with mediaDegraded: Text-only timeline, cached data
UberNormal: Precise ETA, surge pricingDegraded: Estimated ETA, fixed pricing
Design Principle

Design your system with degradation in mind from the start. Know which features are essential vs nice-to-have.

6Implementation Checklist

Monitor system load (CPU, memory, queue depth, latency)
Define load thresholds (70% = throttle, 90% = shed)
Classify requests by priority (critical, normal, low)
Implement circuit breakers for downstream services
Return proper 503 with Retry-After header
Alert on-call when shedding is active
Test load shedding regularly (chaos engineering)

7Key Takeaways

1Throttling slows down; Load shedding rejects. Both prevent total failure.
2Use priority-based shedding—protect critical paths (checkout) over nice-to-haves (recommendations).
3Design for graceful degradation from day one. Know what to sacrifice.
4Return 503 with Retry-After so clients know to back off.
5Monitor and alert when shedding is active—it's a sign you need to scale.
6Test your shedding logic regularly with chaos engineering.