System Design Fundamentals - Module 2

Load Balancers

A critical infrastructure component that distributes incoming network traffic across multiple servers, ensuring no single server becomes a bottleneck while maintaining high availability and performance.

15 min readComprehensive Guide

1What is a Load Balancer?

Simple Analogy

Imagine a busy coffee shop with 5 cashiers. Without a manager, customers randomly pick a line - some cashiers get overwhelmed while others have no one. Now add a manager at the door who directs each customer to the shortest line. That manager is your Load Balancer - it sends each request (customer) to the right server (cashier) so everyone gets served quickly.

Load Balancer (LB) is the first point of contact within a data center after the firewall. It receives all incoming requests and intelligently distributes them across a pool of backend servers (worker nodes). This prevents any single server from being overwhelmed, enabling applications to handle millions of requests per second.

The Problem Without Load Balancing

Without Load Balancer
  • Single server handles ALL traffic
  • Server crash = entire app down
  • Can't scale beyond one machine
  • No maintenance without downtime
  • Single point of failure
With Load Balancer
  • Traffic distributed across servers
  • Server crash = others handle load
  • Add servers as traffic grows
  • Zero-downtime maintenance
  • High availability ensured

Why Do We Need Load Balancers?

1
Scalability
Add more servers behind the LB as traffic grows. Upscaling/downscaling is transparent to end users. No architecture redesign needed.
2
High Availability
If servers crash, LB hides failures and routes traffic to healthy servers. System remains available even during partial outages.
3
Performance
Routes requests to servers with lower load for faster response times. Improves resource utilization across all servers.
4
Maintainability
Take individual servers offline for updates, patches, or maintenance without affecting the entire application.
Real-World Impact

When Amazon runs Prime Day sales, traffic spikes 10-100x normal levels. Load balancers automatically distribute this surge across thousands of servers, preventing crashes and ensuring a seamless shopping experience.

2How Load Balancing Works

Understanding the step-by-step process of how a load balancer handles requests is crucial for system design interviews.

1
Traffic Reception
All incoming requests arrive at the load balancer's public IP or domain (e.g., www.myapp.com resolves to the LB's IP). The LB acts as a reverse proxy—clients never know about backend servers.
2
Decision Logic (Routing Algorithm)
The LB decides which server gets the request using a configured algorithm: Round Robin (sequential), Least Connections (fewest active), IP Hash (sticky sessions), Weighted (based on capacity), or Least Response Time.
3
Health Checks
LB continuously monitors server health via heartbeat requests (e.g., GET /health). If a server doesn't respond within the threshold, it's marked unhealthy and removed from the pool. When it recovers, it's automatically reintroduced.
4
Request Forwarding
LB forwards the request to the selected healthy server. Some LBs modify headers, terminate SSL, or add tracing information before forwarding.
5
Response Handling
The backend server processes the request and returns the response to the LB, which then forwards it to the client. Some responses may be cached or compressed by the LB.

3See It In Action

Click on users to send requests. Watch how the load balancer distributes traffic across servers!

Load Balancer Simulator
LBLoad Balancer
Server AOK
CPU: 20%Req: 0
Server BOK
CPU: 35%Req: 0
Server COK
CPU: 15%Req: 0
0
Total Requests
15ms
Avg Latency
3/3
Healthy Servers
RR
Round Robin

4Types of Load Balancers

Layer 4 vs Layer 7 Load Balancing

Load balancers operate at different OSI layers, each with distinct capabilities and trade-offs.

L4Layer 4 (Transport Layer)
  • Routes based on IP address + TCP/UDP port
  • Does NOT inspect packet content
  • Ultra-fast: millions of requests/sec
  • Protocol agnostic (HTTP, WebSocket, DB, etc.)
  • Lower latency, higher throughput
  • Cannot do content-based routing
  • Best for: Database connections, gaming, high-throughput
L7Layer 7 (Application Layer)
  • Inspects HTTP headers, cookies, URLs, body
  • Smart content-based routing decisions
  • Performs SSL/TLS termination
  • Can modify requests and responses
  • Supports A/B testing, canary deployments
  • WAF capabilities, rate limiting
  • Best for: Web apps, APIs, microservices
Layer 4 vs Layer 7 - Decision Example
// Layer 4: Only sees network data
Source: 192.168.1.100:54321 → Destination: lb.example.com:443
Decision: Route to Server A at 10.0.1.5:443 (based on IP/port only)

// Layer 7: Understands application data
GET /api/users HTTP/1.1
Host: example.com
Cookie: session_id=abc123

Decision: /api/* → API servers, /static/* → CDN, Cookie → Same server

Hardware vs Software vs Cloud

TypeExamplesProsCons
HardwareF5 BIG-IP, Citrix ADCUltra-high performance, dedicatedExpensive, inflexible, vendor lock-in
SoftwareNginx, HAProxy, EnvoyFlexible, programmable, open-sourceRequires server management
Cloud (LBaaS)AWS ELB, GCP LB, Azure LBFully managed, auto-scaling, globalVendor lock-in, ongoing costs

5Where to Place Load Balancers

In a typical 3-tier architecture, load balancers can be placed at multiple points for maximum scalability:

1
Between Users and Web Servers
The primary LB that faces the internet. Handles SSL termination, routes HTTP/HTTPS traffic to web servers or API gateways. This is the most common placement.
2
Between Web and Application Servers
Distributes internal requests from web servers to application servers that run business logic. Prevents any single app server from being overwhelmed.
3
Between Application and Database Servers
Routes database queries across read replicas. Ensures database connections are evenly distributed. Often uses Layer 4 LB for performance.
Key Insight

Load balancers can potentially be used between ANY two services with multiple instances. In microservices architectures, each service may have its own LB (service mesh pattern).

6Global vs Local Load Balancing

GGlobal (GSLB)
  • Distributes traffic across geographic regions
  • Routes users to nearest data center
  • Handles region/zone failover
  • Uses DNS-based routing or Anycast
  • Considers user location, DC health, latency
  • Example: Route US users to US-East, EU users to EU-West
LLocal Load Balancing
  • Operates within a single data center
  • Acts as a reverse proxy
  • Uses Virtual IP (VIP) for clients
  • Focuses on server health and capacity
  • Handles actual request distribution
  • Example: Distribute across 50 servers in US-East DC
DNS Limitations

Caching Issues: ISPs cache DNS responses, so TTL determines how quickly traffic shifts. No Health Awareness: DNS keeps returning dead server IPs until TTL expires. For production GSLB, use dedicated solutions like AWS Route 53, Cloudflare, or dedicated ADCs.

7Key Features & Services

Modern load balancers provide much more than just traffic distribution:

1
Health Checking
Active checks: LB sends periodic requests (e.g., GET /health every 10s). Passive checks: Monitor real traffic for errors. Unhealthy servers are removed automatically and re-added when healthy.
2
SSL/TLS Termination
LB handles encryption/decryption, sending plain HTTP to backend servers. Reduces server CPU load by 20-40%. Centralizes certificate management. Also called "SSL offloading".
3
Sticky Sessions (Session Persistence)
Ensures a client's subsequent requests always route to the same server (via cookie or IP hash). Useful for stateful apps, but better practice: use external session store (Redis) for stateless design.
4
Auto Scaling Integration
Cloud LBs integrate with auto-scaling groups. As traffic increases, new servers spin up and automatically register. When traffic drops, servers deregister and terminate.
5
Service Discovery
LB queries service registry to find healthy instances. Essential in dynamic container environments (Kubernetes, Docker Swarm) where IPs change frequently.
6
Security Features
Web Application Firewall (WAF) capabilities, DDoS protection at L3/L4/L7, rate limiting per client/IP, IP whitelisting/blacklisting. First line of defense.

8Load Balancing Algorithms (Overview)

The algorithm determines HOW requests are distributed. Here's a quick overview—see the detailed algorithms page for deep dive.

AlgorithmHow It WorksBest For
Round RobinA → B → C → A → B → C...Equal servers, uniform requests
Weighted Round RobinA(3)→A→A→B(2)→B→C(1)Servers with different capacities
Least ConnectionsRoute to server with fewest activeLong-lived connections, WebSockets
IP Hashhash(client_ip) % serversSession persistence
90% of Cases

Round Robin or Least Connections is sufficient. Start simple, optimize when you have data showing problems. Don't over-engineer.

9Real-World Examples

Amazon Web Services (AWS)

  • Application Load Balancer (ALB): Layer 7, HTTP/HTTPS routing by URL path, headers, cookies
  • Network Load Balancer (NLB): Layer 4, ultra-high performance (millions req/sec)
  • Classic Load Balancer: Legacy, both L4 and L7 features
  • Auto-scaling integration, health checks every 5-300 seconds

Netflix - Zuul Gateway

  • Custom Layer 7 load balancer + API gateway
  • Routes to 1000+ microservices
  • Handles 100,000+ requests/second per instance
  • Supports canary deployments (1% traffic to new version)
  • Dynamic routing rules, authentication, rate limiting

Google Cloud Load Balancing

  • Global HTTP(S) LB: Single anycast IP serves users worldwide
  • Automatic cross-region failover
  • Content-based routing by URL, headers, cookies
  • Integrated with Cloud CDN for edge caching

Nginx and HAProxy (Self-Hosted)

  • Nginx: Most popular web server + LB, 60%+ market share
  • HAProxy: High-performance TCP/HTTP LB, used by GitHub, Stack Overflow, Reddit
  • Both support: Round Robin, Least Connections, IP Hash, Health Checks
  • Can handle millions of concurrent connections

10Knowledge Check

Your e-commerce application has 3 servers with different specs: Server A (32 cores), Server B (16 cores), Server C (8 cores). Which algorithm should you use to ensure traffic is distributed proportionally to their capacity?

11Key Takeaways

1Load Balancer = traffic cop distributing requests across servers to prevent overload and ensure high availability.
2Key benefits: Scalability (add servers), Availability (handle failures), Performance (optimize distribution), Maintainability (zero-downtime updates).
3Layer 4 = fast, routes by IP/port. Layer 7 = smart, routes by HTTP content. Most web apps use L7.
4Place LBs between: Users↔Web servers, Web↔App servers, App↔Database servers.
5GSLB = global routing across data centers. Local LB = distribution within a data center.
6Health checks are critical—remove failed servers automatically.
7Modern LBs provide: SSL termination, sticky sessions, auto-scaling, service discovery, security.
8Start simple with Round Robin or Least Connections.