Module 4 — Scaling
Auto-Scaling
Automatically add or remove servers based on demand. Pay for what you use.
1The Uber Driver Analogy
💡 Simple Analogy
Uber doesn't have fixed drivers. During rush hour or events, more drivers come online. At 3 AM, fewer drivers needed.
Auto-scaling does the same for servers: spin up when busy, shut down when quiet. You pay only for servers you're using.
Auto-scaling does the same for servers: spin up when busy, shut down when quiet. You pay only for servers you're using.
2Scaling Triggers
Auto-scaling watches metrics and reacts:
CPU Utilization
Scale up when avg > 70% for 5 min
Compute-heavy workloads
Memory Usage
Scale up when > 80%
In-memory processing, caching
Request Count
Scale up when > 1000 RPS
API gateways, web servers
Queue Depth
Scale up when queue > 1000 messages
Worker processes
Custom Metrics
Scale based on business logic
Active users, processing time
3Scaling Policies
Target Tracking
Maintain a target metric value. AWS adjusts capacity to keep CPU at 50%.
Example: Keep average CPU at 50%
Best for: Simple, hands-off scaling
Step Scaling
Add/remove specific number of instances based on alarm thresholds.
Example: Add 2 when CPU > 70%, add 4 when > 90%
Best for: Predictable scaling responses
Scheduled Scaling
Pre-configured scaling at specific times.
Example: Scale to 10 instances at 9 AM, down to 2 at 10 PM
Best for: Predictable traffic patterns
Predictive Scaling
ML predicts future load based on historical patterns.
Example: Learn from last week, pre-warm for expected spike
Best for: Avoiding cold start delays
4Scale-In Considerations
Removing servers needs care to avoid disrupting users:
Connection Draining
Stop new requests, let existing finish before terminating
Cooldown Period
Wait 5 min after scaling before scaling again
Minimum Instances
Never go below 2 for redundancy
Graceful Shutdown
Handle SIGTERM, complete work, cleanup
5Cloud Provider Services
| Provider | Service | Works With |
|---|---|---|
| AWS | Auto Scaling Groups | EC2, ECS, EKS |
| GCP | Instance Groups + Autoscaler | Compute Engine, GKE |
| Azure | VM Scale Sets | VMs, AKS |
| Kubernetes | HPA / VPA / Cluster Autoscaler | Pods, Nodes |
6Key Takeaways
1Auto-scaling adjusts capacity based on demand. Pay for what you use.
2Common triggers: CPU, memory, request count, queue depth.
3Target tracking is simplest—set target CPU, let cloud handle rest.
4Scheduled scaling for predictable patterns (business hours).
5Scale-in needs connection draining and cooldown periods.
6In interviews: discuss which metrics to watch and scaling policies.