Module 5 — Architecture
Circuit Breaker Pattern
Prevent cascading failures by failing fast when a service is struggling.
1The Electrical Analogy
Simple Analogy
Like a circuit breaker in your house:
Normal
Power flows freely. Everything works.
Overload
Too much current detected.
Tripped
Breaker cuts power to prevent fire. You flip it back later.
Without it: overload → house fire. With it: breaker trips → minor inconvenience.
2Why Do We Need It?
When a downstream service fails, without circuit breaker: your service keeps trying → threads pile up → memory exhausted → your service crashes → services calling you crash → cascading failure.
Cascading Failure Without Circuit Breaker
API
Gateway
Gateway
Service
A
A
Service
B
B
Service
C (DOWN)
C (DOWN)
1Service C goes down
2Service B keeps retrying C, threads blocked, memory fills
3Service B slows down, then crashes
4Service A crashes, then Gateway crashes. Total outage.
3The Three States
A circuit breaker moves between three states based on success/failure rates:
CLOSED
Normal operation. Requests flow through.
- • Counts failures
- • All requests pass through
- • If failures exceed threshold → OPEN
OPEN
Fail fast. No requests sent downstream.
- • Return error immediately
- • Don't even try downstream
- • After timeout → HALF-OPEN
HALF-OPEN
Testing. Allow limited requests.
- • Let a few requests through
- • If successful → CLOSED
- • If fails → OPEN again
State Machine
CLOSED
failures > threshold→
OPEN
timeout expires→
HALF-
OPEN
OPEN
↰ test succeeds (to CLOSED)↲ test fails (to OPEN)
4Interactive Demo
Circuit Breaker Simulator
Downstream Service
Click to toggle
Circuit State
closed
Failures: 0/3
Total: 0
Click "Send Request" to start...
5Configuration Parameters
failureThresholdNumber of failures before opening circuittypical: 5
💡 Too low = opens too often. Too high = doesn't protect.
successThresholdConsecutive successes in half-open to close circuittypical: 3
💡 Higher = more confidence before closing.
timeoutHow long to stay open before trying half-opentypical: 30s
💡 Give downstream time to recover.
failureRateThresholdPercentage of failures that triggers opentypical: 50%
💡 Better than count alone for variable traffic.
// Example: Resilience4j configuration
CircuitBreakerConfig config = CircuitBreakerConfig.custom()
.failureRateThreshold(50) // Open if 50% fail
.waitDurationInOpenState(Duration.ofSeconds(30))
.permittedNumberOfCallsInHalfOpenState(3)
.minimumNumberOfCalls(10) // Need 10 calls before calculating
.slidingWindowSize(100)
.build();
CircuitBreaker cb = CircuitBreaker.of("paymentService", config);
// Usage
Supplier<String> decoratedSupplier = CircuitBreaker
.decorateSupplier(cb, () -> paymentService.charge());
Try<String> result = Try.ofSupplier(decoratedSupplier);6What to Do When Circuit Opens
When circuit is open, you return a fallback instead of an error:
Return Cached Data
Example: Product catalog from Redis
When: Read operations, slightly stale OK
Return Default Value
Example: Empty recommendations list
When: Non-critical features
Queue for Later
Example: Email notifications
When: Async operations
Graceful Degradation
Example: Show basic UI without personalization
When: Feature can be disabled
Don't Do This
Return misleading data. If payment service is down, don't say "payment successful" with cached data. Be honest about degraded functionality.
7Real-World Implementation
| Library/Tool | Language | Features |
|---|---|---|
| Resilience4j | Java | CB, rate limiting, retry, bulkhead |
| Hystrix | Java (deprecated) | Netflix's original, use Resilience4j now |
| Polly | .NET | CB, retry, timeout, bulkhead |
| opossum | Node.js | Simple CB implementation |
| go-breaker | Go | Sony's implementation |
| Istio | Service Mesh | CB at infrastructure level |
8Key Takeaways
1Circuit breaker prevents cascading failures by failing fast.
2Three states: CLOSED (normal), OPEN (failing fast), HALF-OPEN (testing).
3Configure carefully: failure threshold, timeout, success threshold.
4Always provide fallback behavior: cached data, defaults, graceful degradation.
5Use established libraries: Resilience4j (Java), Polly (.NET), Istio (mesh).
6In interviews: mention alongside retry, timeout, and bulkhead patterns.