Module 11 - Interview Prep

Bottleneck Analysis

Finding and fixing the weakest link in your system.

1The Highway Traffic Analogy

Simple Analogy

A 6-lane highway narrowing to 2 lanes creates a traffic jam-no matter how fast you drive before or after that point. Systems work the same way. If your database handles 1K QPS but your app servers push 10K QPS, the database is your bottleneck. Optimizing app servers won't help.

Bottleneck is the component that limits overall system throughput. The system is only as fast as its slowest part. Identifying and addressing bottlenecks is the key to scaling.

2Common Bottleneck Types

Database Bottleneck

Symptoms

• Slow queries

• Connection pool exhausted

• High CPU on DB server

• Lock contention

Solutions

• Add read replicas

• Implement caching

• Optimize queries/indexes

• Shard the database

Network Bottleneck

Symptoms

• High latency between services

• Bandwidth saturation

• Cross-region calls

Solutions

• CDN for static content

• Compress responses

• Move services closer

• Batch requests

CPU Bottleneck

Symptoms

• 100% CPU utilization

• Slow response under load

• Request queuing

Solutions

• Horizontal scaling

• Optimize algorithms

• Async processing

• Caching computed results

Memory Bottleneck

Symptoms

• OOM errors

• Excessive GC pauses

• Swapping to disk

Solutions

• Increase instance size

• Reduce in-memory data

• Streaming processing

• Pagination

3Identifying Bottlenecks

Look at the Data Flow

Trace a request from client to response. Where does time go?

Find the Synchronous Path

What's on the critical path? What must complete before response?

Check Utilization

Which component is at 100%? That's likely your bottleneck.

Calculate Capacity

What's the theoretical max throughput of each component?

Rule of thumb: Start at the database. 80% of the time, that's where the bottleneck is. Then check network, then compute.

4Worked Example: E-commerce Checkout

Problem: Checkout takes 5 seconds under load

Load Balancer10ms100K QPSOK

App Server200ms1K QPS per server (x10)OK

Inventory Check500ms500 QPSWARNING

Payment Service800ms100 QPSCRITICAL

Database Write100ms1K QPSOK

Analysis

Bottleneck: Payment service at 100 QPS, taking 800ms per request.

Solutions: (1) Add more payment service instances, (2) Make payment async-confirm order first, process payment in background, (3) Use payment gateway that batches requests.

5Interviewer Questions

"What's the bottleneck in your design?"

Look at your HLD. Which component handles the most load? Which scales least well?

"How would you scale 10x?"

Identify current bottleneck, solve it, then find the next one. It's iterative.

"What breaks first under load?"

Usually: database → external APIs → app servers → load balancer

"How would you find bottlenecks in production?"

Monitoring: latency per component, saturation metrics, distributed tracing.

6Resolution Strategies

Scale Up

Bigger machines. Quick fix but limited ceiling.

When: Small systems, vertical limits not reached

Scale Out

More machines. Requires stateless design.

When: Compute-bound, embarrassingly parallel

Cache

Reduce load on slow components.

When: Read-heavy, data doesn't change often

Async

Move work off critical path.

When: Work can be done later, user doesn't need immediate result

Shard

Partition data/load across nodes.

When: Data is too large for single node

Optimize

Better algorithms, queries, code.

When: Before scaling, always check for inefficiencies

7Key Takeaways

1Bottleneck = component limiting overall throughput. System speed = slowest part.

2Database is usually first bottleneck. Then external APIs, then compute.

3Trace the request path. Find where time is spent.

4Solve iteratively. Fix one bottleneck, find the next.

5In interviews, proactively identify bottlenecks in your design.

?Quiz

1. App servers at 30% CPU, database at 95% CPU. Where's the bottleneck?

2. Best way to reduce database load for read-heavy workload?