Module 2 - Traffic & Load Management

Throttling & Load Shedding

When your system is overloaded, gracefully degrade service instead of crashing completely.

8 min readSurvival Strategies

1The Emergency Room Analogy

Simple Analogy

Imagine an emergency room during a crisis. When overwhelmed with patients, they don't try to treat everyone at once (that would kill everyone). Instead, they triage: critical patients first, less urgent cases wait, and some might be redirected to other hospitals.

This is exactly what throttling (slow down) and load shedding (reject some) do for your servers.

2Throttling vs Rate Limiting vs Load Shedding

Technique	What It Does	When Used
Rate Limiting	Limits requests per client	Always (per-client fairness)
Throttling	Slows down request processing	When system is stressed
Load Shedding	Rejects requests entirely	Emergency: prevent total failure

System Load Response

Normal

40%

All requests served

High

70%

Throttling kicks in

Critical

90%

Load shedding active

3Throttling Strategies

Response Delay

Add artificial delay to responses when load is high. Reduces throughput smoothly.

if (load > 70%) delay(100ms)

Queue Requests

Put excess requests in a queue instead of processing immediately.

queue.add(request); process_later()

Reduce Quality

Return cached/stale data or lower resolution images when stressed.

if (stressed) return cached_response

Disable Features

Temporarily disable non-critical features (recommendations, analytics).

if (overloaded) skip_recommendations()

4Load Shedding Strategies

When throttling isn't enough, you need to reject requests to survive:

Random Shedding

Randomly reject X% of requests. Simple but doesn't consider request importance.

Priority-Based Shedding

Reject low-priority requests first (analytics before checkout). Keep critical paths alive.

LIFO (Last In, First Out)

Reject newest requests. Older requests already waited, might as well finish them.

Client-Based Shedding

Shed requests from clients already over their quota, or non-paying users first.

Priority-Based Load Shedding

function handle_request(req):
    current_load = get_system_load()
    
    if current_load > 95%:
        # Critical: only process essential requests
        if req.priority != "critical":
            return 503_Service_Unavailable
    
    elif current_load > 80%:
        # High load: shed low-priority requests
        if req.priority == "low":
            return 503_Service_Unavailable
    
    # Process the request
    return process(req)

5Graceful Degradation

Instead of complete failure, provide reduced functionality:

NetflixNormal: HD streaming with recommendationsDegraded: SD streaming, no recommendations

AmazonNormal: Real-time inventory, personalized resultsDegraded: Cached inventory, generic results

TwitterNormal: Full timeline with mediaDegraded: Text-only timeline, cached data

UberNormal: Precise ETA, surge pricingDegraded: Estimated ETA, fixed pricing

Design Principle

Design your system with degradation in mind from the start. Know which features are essential vs nice-to-have.

6Implementation Checklist

✓

Monitor system load (CPU, memory, queue depth, latency)

✓

Define load thresholds (70% = throttle, 90% = shed)

✓

Classify requests by priority (critical, normal, low)

✓

Implement circuit breakers for downstream services

✓

Return proper 503 with Retry-After header

✓

Alert on-call when shedding is active

✓

Test load shedding regularly (chaos engineering)

7Key Takeaways

1Throttling slows down; Load shedding rejects. Both prevent total failure.

2Use priority-based shedding-protect critical paths (checkout) over nice-to-haves (recommendations).

3Design for graceful degradation from day one. Know what to sacrifice.

4Return 503 with Retry-After so clients know to back off.

5Monitor and alert when shedding is active-it's a sign you need to scale.

6Test your shedding logic regularly with chaos engineering.