Retry Strategies
Handle transient failures gracefully with smart retry logic that balances persistence with system health.
1The Persistent Caller Analogy
A) Give up immediately
B) Call 1000 times per second until they answer (annoying!)
C) Wait a bit, try again, wait longer, try again...
Option C is exponential backoff—the smart approach!
Retry Strategy defines how a system handles failed operations—when to retry, how long to wait, and when to give up. Good strategies prevent thundering herds while maximizing success.
2Types of Failures
Transient (Retry-able)
- • Network timeout
- • Service temporarily unavailable (503)
- • Database connection dropped
- • Rate limited (429)
Permanent (Don't Retry)
- • Invalid input (400)
- • Not found (404)
- • Unauthorized (401, 403)
- • Business logic error
Retry 5xx errors and timeouts. Don't retry 4xx errors (except 429)—they'll fail again.
3Retry Strategies
1. Immediate Retry
Never do this! Creates thundering herd when service recovers.
2. Fixed Delay
Simple but not adaptive. May be too aggressive or too slow.
3. Exponential Backoff
Gives overwhelmed services time to recover. Industry standard.
4. Exponential Backoff + Jitter
Best approach. Jitter prevents synchronized retries from many clients.
4The Jitter Problem
Without jitter, all clients retry at exactly the same time, creating traffic spikes:
Without Jitter
10:00:01 - 1000 clients retry (spike!)
10:00:03 - 1000 clients retry (spike!)
10:00:07 - 1000 clients retry (spike!)
With Jitter
10:00:01~02 - clients retry spread
10:00:03~05 - clients retry spread
10:00:07~12 - clients retry spread
function getRetryDelay(attempt, baseDelay = 1000) {
// Exponential: 1s, 2s, 4s, 8s, 16s...
const exponentialDelay = baseDelay * Math.pow(2, attempt);
// Cap at 30 seconds
const cappedDelay = Math.min(exponentialDelay, 30000);
// Add jitter: random 0-100% of delay
const jitter = Math.random() * cappedDelay;
return cappedDelay + jitter;
}
// Usage
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await makeRequest();
} catch (error) {
if (!isRetryable(error)) throw error;
await sleep(getRetryDelay(attempt));
}
}
throw new Error('Max retries exceeded');5Circuit Breaker Pattern
After too many failures, stop trying temporarily. Prevents wasting resources on a dead service.
Fail fast instead of waiting for timeouts. Give downstream services time to recover. Prevent cascade failures across your system.
6Dead Letter Queues
After max retries, don't lose the message! Send to a Dead Letter Queue for investigation.
After 3 failures → Dead Letter Queue for manual review