Module 4 - Scaling

Auto-scaling

Automatically adjust capacity based on demand.

1The Taxi Fleet Analogy

Simple Analogy
A taxi company adds cars during rush hour and reduces them at night. They monitor demand (passengers waiting) and adjust supply (taxis) automatically. That's auto-scaling.

Auto-scaling automatically adjusts compute resources based on load metrics. Scale out when busy, scale in when idle.

2Scaling Types

Reactive (Target Tracking)

Scale when metric crosses threshold. E.g., CPU > 70% → add instances.

Scheduled

Scale at known times. E.g., add capacity before daily peak.

Predictive

ML-based forecasting. Scale before demand hits.

3Common Metrics

CPU Utilization

Most common. Scale at 60-80%.

Memory Usage

For memory-intensive apps.

Request Count

Scale by traffic volume.

Queue Depth

For async workers. Scale when backlog grows.

Response Time

Scale when latency increases.

Custom Metrics

Business metrics like active users.

4Key Concepts

Cooldown PeriodWait time between scaling actions. Prevents thrashing.
Min/Max InstancesSet bounds to control costs and ensure availability.
Warm-up TimeTime for new instance to be ready. Account for boot time.
Scale-in ProtectionPrevent scaling in during critical operations.

5Best Practices

Start with CPU, add custom metrics as needed
Set cooldown to prevent rapid scale up/down
Use multiple metrics for smarter decisions
Test scaling under load before production
Monitor scaling events and costs

6Key Takeaways

1Auto-scaling adjusts capacity based on demand
2Reactive (threshold), scheduled, and predictive types
3Common metrics: CPU, memory, request count, queue depth
4Set cooldown periods to prevent thrashing
5Always set min/max bounds for cost control

?Quiz

1. Your app scales up/down every minute. What's the fix?

2. Best metric for background job workers?