Bottleneck Analysis
Finding and fixing the weakest link in your system.
1The Highway Traffic Analogy
Bottleneck is the component that limits overall system throughput. The system is only as fast as its slowest part. Identifying and addressing bottlenecks is the key to scaling.
2Common Bottleneck Types
Database Bottleneck
Symptoms
• Slow queries
• Connection pool exhausted
• High CPU on DB server
• Lock contention
Solutions
• Add read replicas
• Implement caching
• Optimize queries/indexes
• Shard the database
Network Bottleneck
Symptoms
• High latency between services
• Bandwidth saturation
• Cross-region calls
Solutions
• CDN for static content
• Compress responses
• Move services closer
• Batch requests
CPU Bottleneck
Symptoms
• 100% CPU utilization
• Slow response under load
• Request queuing
Solutions
• Horizontal scaling
• Optimize algorithms
• Async processing
• Caching computed results
Memory Bottleneck
Symptoms
• OOM errors
• Excessive GC pauses
• Swapping to disk
Solutions
• Increase instance size
• Reduce in-memory data
• Streaming processing
• Pagination
3Identifying Bottlenecks
Rule of thumb: Start at the database. 80% of the time, that's where the bottleneck is. Then check network, then compute.
4Worked Example: E-commerce Checkout
Problem: Checkout takes 5 seconds under load
Analysis
Bottleneck: Payment service at 100 QPS, taking 800ms per request.
Solutions: (1) Add more payment service instances, (2) Make payment async-confirm order first, process payment in background, (3) Use payment gateway that batches requests.
5Interviewer Questions
"What's the bottleneck in your design?"
Look at your HLD. Which component handles the most load? Which scales least well?
"How would you scale 10x?"
Identify current bottleneck, solve it, then find the next one. It's iterative.
"What breaks first under load?"
Usually: database → external APIs → app servers → load balancer
"How would you find bottlenecks in production?"
Monitoring: latency per component, saturation metrics, distributed tracing.
6Resolution Strategies
Scale Up
Bigger machines. Quick fix but limited ceiling.
When: Small systems, vertical limits not reached
Scale Out
More machines. Requires stateless design.
When: Compute-bound, embarrassingly parallel
Cache
Reduce load on slow components.
When: Read-heavy, data doesn't change often
Async
Move work off critical path.
When: Work can be done later, user doesn't need immediate result
Shard
Partition data/load across nodes.
When: Data is too large for single node
Optimize
Better algorithms, queries, code.
When: Before scaling, always check for inefficiencies
7Key Takeaways
?Quiz
1. App servers at 30% CPU, database at 95% CPU. Where's the bottleneck?
2. Best way to reduce database load for read-heavy workload?