Module 2 — Traffic & Load Management

Load Balancer Algorithms

The algorithm determines HOW incoming requests are distributed across servers. Choosing the right one can significantly impact performance and reliability.

10 min read6 Algorithms

1The Restaurant Host Analogy

Simple Analogy
Think of a restaurant with 5 tables and a host at the door. How does the host decide which table to seat the next customer?
  • Round Robin: Table 1, then 2, then 3... repeat
  • Least Connections: Seat at the table with fewest people
  • Weighted: Big tables get more people than small ones
  • IP Hash: Regulars always get their favorite table

Load Balancing Algorithm is the decision-making logic that determines which backend server receives each incoming request. The choice affects response times, resource utilization, and system reliability.

2Round Robin

The simplest algorithm. Requests are distributed sequentially to each server in a loop.

How Round Robin Works
Req 1 → A
Req 2 → B
Req 3 → C
Req 4 → A
Req 5 → B
Req 6 → C
Pseudocode
server_index = 0
servers = [A, B, C]

function get_next_server():
    server = servers[server_index]
    server_index = (server_index + 1) % len(servers)
    return server
✓ Pros
  • Dead simple to implement
  • No server state needed
  • Fair distribution with equal servers
  • Predictable behavior
✗ Cons
  • Ignores server capacity differences
  • Ignores current server load
  • Can overload slow servers
  • No session persistence
Best Use Case

Stateless services with identical servers and uniform request processing times. Perfect for read-heavy APIs.

3Weighted Round Robin

Like Round Robin, but servers with higher weights receive proportionally more requests.

Servers with Different Capacities
Server A
32 cores
×4
weight
Server B
16 cores
×2
weight
Server C
8 cores
×1
weight

Pattern: A → A → A → A → B → B → C → (repeat)

Pseudocode
servers = [
  {name: 'A', weight: 4},  // 4x traffic
  {name: 'B', weight: 2},  // 2x traffic  
  {name: 'C', weight: 1}   // 1x traffic
]

// Expand into: [A,A,A,A,B,B,C]
// Then round-robin through expanded list
When to Use

When you have servers with different hardware specs (CPU, RAM) or when some servers are dedicated to specific tasks.

4Least Connections

Routes each request to the server with the fewest active connections. Smart for varying request durations.

Dynamic Load Awareness
10
Server A
3
Server B ← NEW
12
Server C

New request → Server B (fewest connections)

✓ Pros
  • Adapts to actual load
  • Great for varying request times
  • Handles slow requests well
  • More even utilization
✗ Cons
  • Requires tracking connections
  • More complex than Round Robin
  • May thrash with many short requests
  • Slight overhead
Best Use Case

Long-lived connections (WebSockets), file uploads, database connections, or any workload with varying request durations.

5IP Hash (Sticky Sessions)

Uses a hash of the client's IP address to consistently route them to the same server. Enables session persistence.

Consistent Routing
192.168.1.100hash(...) % 3 = 0Server A
10.0.0.55hash(...) % 3 = 2Server C
192.168.1.100hash(...) % 3 = 0Server A (same!)
✓ Pros
  • Session persistence without cookies
  • Stateful app support
  • Cache-friendly routing
  • Simple to implement
✗ Cons
  • Uneven distribution possible
  • Server changes break sessions
  • IP can change (mobile)
  • NAT issues (many users → 1 IP)
Better Alternative

For session persistence, consider using external session storage (Redis) instead. This keeps your application stateless and more scalable.

6Least Response Time

Routes to the server with the fastest response time AND fewest connections. Best of both worlds.

Decision Factors
ServerConnectionsAvg ResponseScore
A1050ms500
B ←830ms240
C580ms400

Score = Connections × Response Time (lower is better)

Best Use Case

Performance-critical applications where response time matters more than even distribution.

7Algorithm Comparison

AlgorithmComplexityStateBest For
Round RobinO(1)Index onlyEqual servers, uniform requests
Weighted RRO(1)Weights + IndexDifferent server capacities
Least ConnectionsO(n)Connection countsVarying request durations
IP HashO(1)NoneSession persistence
Least ResponseO(n)Times + CountsPerformance-critical apps

8Key Takeaways

1Round Robin is the default choice—simple and effective for most cases.
2Weighted Round Robin when servers have different capacities.
3Least Connections for workloads with varying request times (uploads, WebSockets).
4IP Hash for session stickiness, but prefer external session stores.
5Least Response Time for latency-sensitive applications.
6Start simple, measure, then optimize. Don't over-engineer.