Module 4 - Scaling
Auto-scaling
Automatically adjust capacity based on demand.
1The Taxi Fleet Analogy
Simple Analogy
A taxi company adds cars during rush hour and reduces them at night. They monitor demand (passengers waiting) and adjust supply (taxis) automatically. That's auto-scaling.
Auto-scaling automatically adjusts compute resources based on load metrics. Scale out when busy, scale in when idle.
2Scaling Types
Reactive (Target Tracking)
Scale when metric crosses threshold. E.g., CPU > 70% → add instances.
Scheduled
Scale at known times. E.g., add capacity before daily peak.
Predictive
ML-based forecasting. Scale before demand hits.
3Common Metrics
CPU Utilization
Most common. Scale at 60-80%.
Memory Usage
For memory-intensive apps.
Request Count
Scale by traffic volume.
Queue Depth
For async workers. Scale when backlog grows.
Response Time
Scale when latency increases.
Custom Metrics
Business metrics like active users.
4Key Concepts
Cooldown PeriodWait time between scaling actions. Prevents thrashing.
Min/Max InstancesSet bounds to control costs and ensure availability.
Warm-up TimeTime for new instance to be ready. Account for boot time.
Scale-in ProtectionPrevent scaling in during critical operations.
5Best Practices
✓Start with CPU, add custom metrics as needed
✓Set cooldown to prevent rapid scale up/down
✓Use multiple metrics for smarter decisions
✓Test scaling under load before production
✓Monitor scaling events and costs
6Key Takeaways
1Auto-scaling adjusts capacity based on demand
2Reactive (threshold), scheduled, and predictive types
3Common metrics: CPU, memory, request count, queue depth
4Set cooldown periods to prevent thrashing
5Always set min/max bounds for cost control
?Quiz
1. Your app scales up/down every minute. What's the fix?
2. Best metric for background job workers?