System Design Fundamentals - Module 2

Load Balancers

A critical infrastructure component that distributes incoming network traffic across multiple servers, ensuring no single server becomes a bottleneck while maintaining high availability and performance.

15 min readComprehensive Guide

1What is a Load Balancer?

💡 Simple Analogy

Imagine a busy coffee shop with 5 cashiers. Without a manager, customers randomly pick a line - some cashiers get overwhelmed while others have no one. Now add a manager at the door who directs each customer to the shortest line. That manager is your Load Balancer - it sends each request (customer) to the right server (cashier) so everyone gets served quickly.

Load Balancer (LB) is the first point of contact within a data center after the firewall. It receives all incoming requests and intelligently distributes them across a pool of backend servers (worker nodes). This prevents any single server from being overwhelmed, enabling applications to handle millions of requests per second.

The Problem Without Load Balancing

✗ Without Load Balancer

Single server handles ALL traffic
Server crash = entire app down
Can't scale beyond one machine
No maintenance without downtime
Single point of failure

✓ With Load Balancer

Traffic distributed across servers
Server crash = others handle load
Add servers as traffic grows
Zero-downtime maintenance
High availability ensured

Why Do We Need Load Balancers?

Scalability

Add more servers behind the LB as traffic grows. Upscaling/downscaling is transparent to end users. No architecture redesign needed.

High Availability

If servers crash, LB hides failures and routes traffic to healthy servers. System remains available even during partial outages.

Performance

Routes requests to servers with lower load for faster response times. Improves resource utilization across all servers.

Maintainability

Take individual servers offline for updates, patches, or maintenance without affecting the entire application.

Real-World Impact

When Amazon runs Prime Day sales, traffic spikes 10-100x normal levels. Load balancers automatically distribute this surge across thousands of servers, preventing crashes and ensuring a seamless shopping experience.

2How Load Balancing Works

Understanding the step-by-step process of how a load balancer handles requests is crucial for system design interviews.

Traffic Reception

All incoming requests arrive at the load balancer's public IP or domain (e.g., www.myapp.com resolves to the LB's IP). The LB acts as a reverse proxy-clients never know about backend servers.

Decision Logic (Routing Algorithm)

The LB decides which server gets the request using a configured algorithm: Round Robin (sequential), Least Connections (fewest active), IP Hash (sticky sessions), Weighted (based on capacity), or Least Response Time.

Health Checks

LB continuously monitors server health via heartbeat requests (e.g., GET /health). If a server doesn't respond within the threshold, it's marked unhealthy and removed from the pool. When it recovers, it's automatically reintroduced.

Request Forwarding

LB forwards the request to the selected healthy server. Some LBs modify headers, terminate SSL, or add tracing information before forwarding.

Response Handling

The backend server processes the request and returns the response to the LB, which then forwards it to the client. Some responses may be cached or compressed by the LB.

3See It In Action

Click on users to send requests. Watch how the load balancer distributes traffic across servers!

Load Balancer Simulator

Users:

▼

LBLoad Balancer

▼

Server AOK

CPU: 20%Requests: 0

Server BOK

CPU: 35%Requests: 0

Server COK

CPU: 15%Requests: 0

LBLoad Balancer

Server AOK

CPU: 20%Req: 0

Server BOK

CPU: 35%Req: 0

Server COK

CPU: 15%Req: 0

Total Requests

15ms

Avg Latency

3/3

Healthy Servers

Round Robin

4Types of Load Balancers

Layer 4 vs Layer 7 Load Balancing

Load balancers operate at different OSI layers, each with distinct capabilities and trade-offs.

Layer 4 (Transport Layer)

Routes based on IP address + TCP/UDP port
Does NOT inspect packet content
Ultra-fast: millions of requests/sec
Protocol agnostic (HTTP, WebSocket, DB, etc.)
Lower latency, higher throughput
Cannot do content-based routing
Best for: Database connections, gaming, high-throughput

Layer 7 (Application Layer)

Inspects HTTP headers, cookies, URLs, body
Smart content-based routing decisions
Performs SSL/TLS termination
Can modify requests and responses
Supports A/B testing, canary deployments
WAF capabilities, rate limiting
Best for: Web apps, APIs, microservices

Layer 4 vs Layer 7 - Decision Example

http

// Layer 4: Only sees network data
Source: 192.168.1.100:54321 → Destination: lb.example.com:443
Decision: Route to Server A at 10.0.1.5:443 (based on IP/port only)

// Layer 7: Understands application data
GET /api/users HTTP/1.1
Host: example.com
Cookie: session_id=abc123

Decision: /api/* → API servers, /static/* → CDN, Cookie → Same server

Hardware vs Software vs Cloud

Type	Examples	Pros	Cons
Hardware	F5 BIG-IP, Citrix ADC	Ultra-high performance, dedicated	Expensive, inflexible, vendor lock-in
Software	Nginx, HAProxy, Envoy	Flexible, programmable, open-source	Requires server management
Cloud (LBaaS)	AWS ELB, GCP LB, Azure LB	Fully managed, auto-scaling, global	Vendor lock-in, ongoing costs

5Where to Place Load Balancers

In a typical 3-tier architecture, load balancers can be placed at multiple points for maximum scalability:

Between Users and Web Servers

The primary LB that faces the internet. Handles SSL termination, routes HTTP/HTTPS traffic to web servers or API gateways. This is the most common placement.

Between Web and Application Servers

Distributes internal requests from web servers to application servers that run business logic. Prevents any single app server from being overwhelmed.

Between Application and Database Servers

Routes database queries across read replicas. Ensures database connections are evenly distributed. Often uses Layer 4 LB for performance.

Key Insight

Load balancers can potentially be used between ANY two services with multiple instances. In microservices architectures, each service may have its own LB (service mesh pattern).

6Global vs Local Load Balancing

Global (GSLB)

Distributes traffic across geographic regions
Routes users to nearest data center
Handles region/zone failover
Uses DNS-based routing or Anycast
Considers user location, DC health, latency
Example: Route US users to US-East, EU users to EU-West

Local Load Balancing

Operates within a single data center
Acts as a reverse proxy
Uses Virtual IP (VIP) for clients
Focuses on server health and capacity
Handles actual request distribution
Example: Distribute across 50 servers in US-East DC

DNS Limitations

Caching Issues: ISPs cache DNS responses, so TTL determines how quickly traffic shifts. No Health Awareness: DNS keeps returning dead server IPs until TTL expires. For production GSLB, use dedicated solutions like AWS Route 53, Cloudflare, or dedicated ADCs.

7Key Features & Services

Modern load balancers provide much more than just traffic distribution:

Health Checking

Active checks: LB sends periodic requests (e.g., GET /health every 10s). Passive checks: Monitor real traffic for errors. Unhealthy servers are removed automatically and re-added when healthy.

SSL/TLS Termination

LB handles encryption/decryption, sending plain HTTP to backend servers. Reduces server CPU load by 20-40%. Centralizes certificate management. Also called "SSL offloading".

Sticky Sessions (Session Persistence)

Ensures a client's subsequent requests always route to the same server (via cookie or IP hash). Useful for stateful apps, but better practice: use external session store (Redis) for stateless design.

Auto Scaling Integration

Cloud LBs integrate with auto-scaling groups. As traffic increases, new servers spin up and automatically register. When traffic drops, servers deregister and terminate.

Service Discovery

LB queries service registry to find healthy instances. Essential in dynamic container environments (Kubernetes, Docker Swarm) where IPs change frequently.

Security Features

Web Application Firewall (WAF) capabilities, DDoS protection at L3/L4/L7, rate limiting per client/IP, IP whitelisting/blacklisting. First line of defense.

8Load Balancing Algorithms (Overview)

The algorithm determines HOW requests are distributed. Here's a quick overview-see the detailed algorithms page for deep dive.

Algorithm	How It Works	Best For
Round Robin	A → B → C → A...	Equal servers, uniform requests
Weighted RR	A(3)→A→A→B(2)→B→C(1)	Different server capacities
Least Connections	Route to fewest active	Long-lived connections
IP Hash	hash(client_ip) % servers	Session persistence

90% of Cases

Round Robin or Least Connections is sufficient. Start simple, optimize when you have data showing problems. Don't over-engineer.

9Real-World Examples

Amazon Web Services (AWS)

ALB: Layer 7, HTTP/HTTPS routing by URL, headers, cookies
NLB: Layer 4, ultra-high performance (millions req/sec)
Classic LB: Legacy, both L4 and L7 features
Auto-scaling integration, health checks every 5-300 seconds

Netflix - Zuul Gateway

Custom Layer 7 load balancer + API gateway
Routes to 1000+ microservices
Handles 100K+ requests/second per instance
Supports canary deployments (1% traffic to new version)
Dynamic routing rules, authentication, rate limiting

Google Cloud Load Balancing

Global HTTP(S) LB: Single anycast IP worldwide
Automatic cross-region failover
Content-based routing by URL, headers, cookies
Integrated with Cloud CDN for edge caching

Nginx & HAProxy (Self-Hosted)

Nginx: Most popular web server + LB, 60%+ market share
HAProxy: High-performance TCP/HTTP LB, used by GitHub, Reddit
Both support: Round Robin, Least Connections, IP Hash, Health Checks
Can handle millions of concurrent connections

10Knowledge Check

Your e-commerce application has 3 servers with different specs: Server A (32 cores), Server B (16 cores), Server C (8 cores). Which algorithm should you use to ensure traffic is distributed proportionally to their capacity?

11Key Takeaways

1Load Balancer = traffic cop distributing requests across servers to prevent overload and ensure high availability.

2Key benefits: Scalability (add servers), Availability (handle failures), Performance (optimize distribution), Maintainability (zero-downtime updates).

3Layer 4 = fast, routes by IP/port. Layer 7 = smart, routes by HTTP content. Most web apps use L7.

4Place LBs between: Users↔Web servers, Web↔App servers, App↔Database servers.

5GSLB = global routing across data centers. Local LB = distribution within a data center.

6Health checks are critical-remove failed servers automatically.

7Modern LBs provide: SSL termination, sticky sessions, auto-scaling, service discovery, security.

8Start simple with Round Robin or Least Connections.