Module 8 - Networking and APIs

Retries & Timeouts

Handle transient failures gracefully-without making things worse.

1The Phone Call Analogy

Simple Analogy
Call drops? You redial. But if the person's phone is off, redialing 100 times won't help-it just wastes your time. And if you wait forever for them to pick up, you're stuck. Retries handle temporary issues; timeouts prevent waiting forever.

Retries automatically repeat failed requests for transient errors. Timeouts limit how long you wait, preventing indefinite hangs.

2When to Retry

Retry These
  • Network timeout
  • 503 Service Unavailable
  • 429 Too Many Requests
  • Connection refused
  • DNS resolution failed
Don't Retry These
  • 400 Bad Request
  • 401 Unauthorized
  • 403 Forbidden
  • 404 Not Found
  • 422 Validation Error

Rule of thumb: Retry 5xx errors (server issues). Don't retry 4xx errors (client issues)-the request is wrong and will fail again.

3Retry Strategies

Immediate Retry

Wait 0, retry

Problem: Can overwhelm already struggling server

When: Almost never. Maybe for local cache miss.

Fixed Delay

Wait 1s, retry

Problem: All clients retry at same time (thundering herd)

When: Simple cases, low traffic

Exponential Backoff

Wait 1s, 2s, 4s, 8s...

Problem: Can get very long waits

When: Standard approach. Use with max retries.

Exponential + Jitter

Wait (1s, 2s, 4s...) + random(0-1s)

Problem: Slightly more complex

When: Best practice. Prevents synchronized retries.

4Timeout Types

Connection Timeout

How long to wait to establish TCP connection

Typical: 1-5 seconds

Read Timeout

How long to wait for response after request sent

Typical: 5-30 seconds (depends on operation)

Request Timeout

Total time for entire request (connect + read)

Typical: 10-60 seconds

Idle Timeout

How long to keep connection open when unused

Typical: 30-120 seconds

No Timeout = Danger

Without timeouts, a slow downstream service can hang your entire application. Always set timeouts.

5Implementation Example

async function fetchWithRetry(url, options = {}) {
  const maxRetries = 3;
  const baseDelay = 1000; // 1 second
  
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      const controller = new AbortController();
      const timeout = setTimeout(() => controller.abort(), 5000); // 5s timeout
      
      const response = await fetch(url, {
        ...options,
        signal: controller.signal
      });
      
      clearTimeout(timeout);
      
      if (response.status === 429 || response.status >= 500) {
        throw new Error(`Retryable: ${response.status}`);
      }
      
      return response;
    } catch (error) {
      if (attempt === maxRetries) throw error;
      
      // Exponential backoff with jitter
      const delay = baseDelay * Math.pow(2, attempt);
      const jitter = Math.random() * 1000;
      await sleep(delay + jitter);
    }
  }
}

6The Retry Storm Problem

1Server gets slow (maybe overloaded)
2Requests timeout
3All clients retry immediately
4Server now has 2x the load
5Server gets slower, more timeouts
6Death spiral: 4x, 8x, 16x load

Solution: Exponential backoff + jitter spreads retries over time. Also consider circuit breakers to stop retrying entirely when downstream is unhealthy.

7Key Takeaways

1Retry 5xx errors (server). Don't retry 4xx errors (client).
2Use exponential backoff + jitter to prevent retry storms.
3Always set timeouts. No timeout = potential infinite hang.
4Set max retries (usually 3-5). Don't retry forever.
5Consider circuit breakers for persistent failures.

?Quiz

1. GET /users returns 404. What should you do?

2. Best retry strategy to prevent thundering herd?