Module 4 — Scaling

Auto-Scaling

Automatically add or remove servers based on demand. Pay for what you use.

1The Uber Driver Analogy

💡 Simple Analogy
Uber doesn't have fixed drivers. During rush hour or events, more drivers come online. At 3 AM, fewer drivers needed.

Auto-scaling does the same for servers: spin up when busy, shut down when quiet. You pay only for servers you're using.

2Scaling Triggers

Auto-scaling watches metrics and reacts:

CPU Utilization
Scale up when avg > 70% for 5 min
Compute-heavy workloads
Memory Usage
Scale up when > 80%
In-memory processing, caching
Request Count
Scale up when > 1000 RPS
API gateways, web servers
Queue Depth
Scale up when queue > 1000 messages
Worker processes
Custom Metrics
Scale based on business logic
Active users, processing time

3Scaling Policies

Target Tracking

Maintain a target metric value. AWS adjusts capacity to keep CPU at 50%.

Example: Keep average CPU at 50%
Best for: Simple, hands-off scaling

Step Scaling

Add/remove specific number of instances based on alarm thresholds.

Example: Add 2 when CPU > 70%, add 4 when > 90%
Best for: Predictable scaling responses

Scheduled Scaling

Pre-configured scaling at specific times.

Example: Scale to 10 instances at 9 AM, down to 2 at 10 PM
Best for: Predictable traffic patterns

Predictive Scaling

ML predicts future load based on historical patterns.

Example: Learn from last week, pre-warm for expected spike
Best for: Avoiding cold start delays

4Scale-In Considerations

Removing servers needs care to avoid disrupting users:

Connection Draining
Stop new requests, let existing finish before terminating
Cooldown Period
Wait 5 min after scaling before scaling again
Minimum Instances
Never go below 2 for redundancy
Graceful Shutdown
Handle SIGTERM, complete work, cleanup

5Cloud Provider Services

ProviderServiceWorks With
AWSAuto Scaling GroupsEC2, ECS, EKS
GCPInstance Groups + AutoscalerCompute Engine, GKE
AzureVM Scale SetsVMs, AKS
KubernetesHPA / VPA / Cluster AutoscalerPods, Nodes

6Key Takeaways

1Auto-scaling adjusts capacity based on demand. Pay for what you use.
2Common triggers: CPU, memory, request count, queue depth.
3Target tracking is simplest—set target CPU, let cloud handle rest.
4Scheduled scaling for predictable patterns (business hours).
5Scale-in needs connection draining and cooldown periods.
6In interviews: discuss which metrics to watch and scaling policies.