Load Balancers
A critical infrastructure component that distributes incoming network traffic across multiple servers, ensuring no single server becomes a bottleneck while maintaining high availability and performance.
1What is a Load Balancer?
Imagine a busy coffee shop with 5 cashiers. Without a manager, customers randomly pick a line - some cashiers get overwhelmed while others have no one. Now add a manager at the door who directs each customer to the shortest line. That manager is your Load Balancer - it sends each request (customer) to the right server (cashier) so everyone gets served quickly.
Load Balancer (LB) is the first point of contact within a data center after the firewall. It receives all incoming requests and intelligently distributes them across a pool of backend servers (worker nodes). This prevents any single server from being overwhelmed, enabling applications to handle millions of requests per second.
The Problem Without Load Balancing
- Single server handles ALL traffic
- Server crash = entire app down
- Can't scale beyond one machine
- No maintenance without downtime
- Single point of failure
- Traffic distributed across servers
- Server crash = others handle load
- Add servers as traffic grows
- Zero-downtime maintenance
- High availability ensured
Why Do We Need Load Balancers?
When Amazon runs Prime Day sales, traffic spikes 10-100x normal levels. Load balancers automatically distribute this surge across thousands of servers, preventing crashes and ensuring a seamless shopping experience.
2How Load Balancing Works
Understanding the step-by-step process of how a load balancer handles requests is crucial for system design interviews.
3See It In Action
Click on users to send requests. Watch how the load balancer distributes traffic across servers!
4Types of Load Balancers
Layer 4 vs Layer 7 Load Balancing
Load balancers operate at different OSI layers, each with distinct capabilities and trade-offs.
- Routes based on IP address + TCP/UDP port
- Does NOT inspect packet content
- Ultra-fast: millions of requests/sec
- Protocol agnostic (HTTP, WebSocket, DB, etc.)
- Lower latency, higher throughput
- Cannot do content-based routing
- Best for: Database connections, gaming, high-throughput
- Inspects HTTP headers, cookies, URLs, body
- Smart content-based routing decisions
- Performs SSL/TLS termination
- Can modify requests and responses
- Supports A/B testing, canary deployments
- WAF capabilities, rate limiting
- Best for: Web apps, APIs, microservices
// Layer 4: Only sees network data
Source: 192.168.1.100:54321 → Destination: lb.example.com:443
Decision: Route to Server A at 10.0.1.5:443 (based on IP/port only)
// Layer 7: Understands application data
GET /api/users HTTP/1.1
Host: example.com
Cookie: session_id=abc123
Decision: /api/* → API servers, /static/* → CDN, Cookie → Same serverHardware vs Software vs Cloud
| Type | Examples | Pros | Cons |
|---|---|---|---|
| Hardware | F5 BIG-IP, Citrix ADC | Ultra-high performance, dedicated | Expensive, inflexible, vendor lock-in |
| Software | Nginx, HAProxy, Envoy | Flexible, programmable, open-source | Requires server management |
| Cloud (LBaaS) | AWS ELB, GCP LB, Azure LB | Fully managed, auto-scaling, global | Vendor lock-in, ongoing costs |
5Where to Place Load Balancers
In a typical 3-tier architecture, load balancers can be placed at multiple points for maximum scalability:
Load balancers can potentially be used between ANY two services with multiple instances. In microservices architectures, each service may have its own LB (service mesh pattern).
6Global vs Local Load Balancing
- Distributes traffic across geographic regions
- Routes users to nearest data center
- Handles region/zone failover
- Uses DNS-based routing or Anycast
- Considers user location, DC health, latency
- Example: Route US users to US-East, EU users to EU-West
- Operates within a single data center
- Acts as a reverse proxy
- Uses Virtual IP (VIP) for clients
- Focuses on server health and capacity
- Handles actual request distribution
- Example: Distribute across 50 servers in US-East DC
Caching Issues: ISPs cache DNS responses, so TTL determines how quickly traffic shifts. No Health Awareness: DNS keeps returning dead server IPs until TTL expires. For production GSLB, use dedicated solutions like AWS Route 53, Cloudflare, or dedicated ADCs.
7Key Features & Services
Modern load balancers provide much more than just traffic distribution:
8Load Balancing Algorithms (Overview)
The algorithm determines HOW requests are distributed. Here's a quick overview—see the detailed algorithms page for deep dive.
| Algorithm | How It Works | Best For |
|---|---|---|
| Round Robin | A → B → C → A → B → C... | Equal servers, uniform requests |
| Weighted Round Robin | A(3)→A→A→B(2)→B→C(1) | Servers with different capacities |
| Least Connections | Route to server with fewest active | Long-lived connections, WebSockets |
| IP Hash | hash(client_ip) % servers | Session persistence |
Round Robin or Least Connections is sufficient. Start simple, optimize when you have data showing problems. Don't over-engineer.
9Real-World Examples
Amazon Web Services (AWS)
- Application Load Balancer (ALB): Layer 7, HTTP/HTTPS routing by URL path, headers, cookies
- Network Load Balancer (NLB): Layer 4, ultra-high performance (millions req/sec)
- Classic Load Balancer: Legacy, both L4 and L7 features
- Auto-scaling integration, health checks every 5-300 seconds
Netflix - Zuul Gateway
- Custom Layer 7 load balancer + API gateway
- Routes to 1000+ microservices
- Handles 100,000+ requests/second per instance
- Supports canary deployments (1% traffic to new version)
- Dynamic routing rules, authentication, rate limiting
Google Cloud Load Balancing
- Global HTTP(S) LB: Single anycast IP serves users worldwide
- Automatic cross-region failover
- Content-based routing by URL, headers, cookies
- Integrated with Cloud CDN for edge caching
Nginx and HAProxy (Self-Hosted)
- Nginx: Most popular web server + LB, 60%+ market share
- HAProxy: High-performance TCP/HTTP LB, used by GitHub, Stack Overflow, Reddit
- Both support: Round Robin, Least Connections, IP Hash, Health Checks
- Can handle millions of concurrent connections
10Knowledge Check
Your e-commerce application has 3 servers with different specs: Server A (32 cores), Server B (16 cores), Server C (8 cores). Which algorithm should you use to ensure traffic is distributed proportionally to their capacity?