Module 0 — Core Concepts

Core Performance Metrics

The key numbers that define system health: Latency, Throughput, Availability, and Error Rate.

1Latency

Latency is the time between when a request is sent and when a response is received. Lower is better. Users notice anything above 100ms.
Simple Analogy
Like ordering food: latency is the time from placing your order until food arrives at your table. It includes: waiter walking to kitchen + cooking time + waiter bringing food back.

Types of Latency

🌐Network Latency

Time for data to travel across the network

Physical distance
Number of hops
Network congestion
Bandwidth
Example: NYC → London: ~70ms minimum (speed of light)
⚙️Application Latency

Time spent processing in your code

Algorithm complexity
CPU speed
Memory access
Code efficiency
Example: JSON parsing, business logic, serialization
🗄️Database Latency

Time to execute queries and return data

Query complexity
Index usage
Data volume
Connection pool
Example: Simple lookup: 1-5ms, Complex join: 50-500ms
📥Queue/IO Latency

Time waiting in queues or for I/O operations

Queue depth
Disk speed
Thread pool size
Contention
Example: Disk read: 1-10ms (SSD), 10-100ms (HDD)

Latency Breakdown: Typical Web Request

DNS Lookup
Domain → IP address
1-50ms
TCP Handshake
3-way handshake
10-100ms
TLS Handshake
SSL certificate exchange
30-100ms
Request Transfer
Send HTTP request
5-50ms
Server Processing
Your application code
10-500ms
Response Transfer
Return response
10-100ms
Total: 66ms - 900ms typical (varies widely based on distance and complexity)

Measuring Latency: Percentiles Matter

Average latency hides problems. Use percentiles:

p50
Median - half are faster
45ms
p90
90% are faster
120ms
p99
99% are faster
450ms
p99.9
99.9% are faster
1200ms
Why p99 Matters

At 1M requests/day, p99 = 450ms means 10,000 users experience 450ms+ latency daily. For Amazon, every 100ms latency costs 1% in sales. High percentiles = real user pain.

How to Reduce Latency

Caching
Store computed results
10x-100x improvement (Redis: 0.5ms vs DB: 50ms)
CDN
Serve content closer to users
50-200ms saved (Edge servers worldwide)
Connection Pooling
Reuse database connections
20-50ms saved per request (Avoid TCP handshake)
Async Processing
Don't wait for slow operations
Remove from critical path (Queue emails, notifications)
Database Indexing
Speed up queries
10x-1000x faster queries (B-tree index on search columns)
Compression
Reduce data transfer size
2x-10x less data (gzip responses)

2Throughput

Throughput is the number of requests a system can handle per unit of time. Higher is better. Measured in requests per second (RPS) or transactions per second (TPS).
Simple Analogy
Like a highway: throughput is how many cars pass through per hour. More lanes = higher throughput. Accidents (errors) = lower throughput.

Throughput vs Latency

⏱️
Latency
How fast is ONE request?
50ms per request
📊
Throughput
How MANY requests per second?
1,000 RPS
Key Insight: They're Related but Different
• Low latency doesn't guarantee high throughput (single-threaded server)
• High throughput can increase latency (queuing under load)
• Optimize for what matters most for your use case

Factors Affecting Throughput

1
2
3
4
CPU Cores
More cores = more parallel processing
4 cores ≈ 4x throughput (ideal)
Thread/Connection Pool Size
More concurrent handlers = more simultaneous requests
Pool of 100 = handle 100 concurrent requests
App
DB
Database Connections
DB is often the bottleneck
100 DB connections = max 100 concurrent queries
Memory
Caching and object allocation
More memory = more cache hits = higher throughput

Real-World Throughput Numbers

SystemThroughputNotes
Single Node.js1K-10K RPSEvent loop, single thread
Single Go Server10K-100K RPSGoroutines, multi-core
Redis100K+ RPSIn-memory, simple ops
PostgreSQL1K-50K QPSDepends on query complexity
Kafka1M+ messages/secPer broker, sequential I/O

3Availability

Availability is the percentage of time a system is operational and accessible. Expressed as "nines": 99.9% = "three nines".
Simple Analogy
Like a store's opening hours: if it's supposed to be open 24/7 but closes for 1 hour/week for cleaning, that's 99.4% availability. Customers arriving during that hour = downtime.

The Nines Table

AvailabilityDowntime/YearDowntime/MonthTypical Use
99% (two 9s)3.65 days7.3 hoursInternal tools, batch jobs
99.9% (three 9s)8.76 hours43.8 minutesStandard SaaS products
99.99% (four 9s)52.6 minutes4.4 minutesE-commerce, financial
99.999% (five 9s)5.26 minutes26 secondsCritical infrastructure

Calculating System Availability

Components in Series (all must work):
LB: 99.99%
×
App: 99.9%
×
DB: 99.95%
=
99.84%
System availability = product of all component availabilities
Components in Parallel (any can work):
Server 1: 99%
Server 2: 99%
= 1 - (0.01 × 0.01) =
99.99%
Redundancy dramatically increases availability

How to Achieve High Availability

Redundancy
Multiple instances of every component
3 app servers, 2 DB replicas
Load Balancing
Distribute traffic, detect failures
Route around failed servers
Health Checks
Continuously monitor component health
Every 10s: /health endpoint
Auto-failover
Automatically switch to backup
DB primary fails → replica promoted
Multi-region
Survive entire datacenter failures
US-East + US-West + EU
Graceful Degradation
Partial functionality > total failure
Cache fails → still serve from DB

4Error Rate

Error Rate is the percentage of requests that result in errors. Lower is better. Includes both client errors (4xx) and server errors (5xx).

Types of Errors

4xx Client Errors

400 Bad Request - malformed input
401 Unauthorized - not logged in
403 Forbidden - no permission
404 Not Found - resource doesn't exist
429 Too Many Requests - rate limited
Usually client's fault. Monitor but don't alert heavily.

5xx Server Errors

500 Internal Server Error - bug/crash
502 Bad Gateway - upstream failed
503 Service Unavailable - overloaded
504 Gateway Timeout - upstream slow
Your fault. Alert immediately. Debug urgently.

Error Budget Concept

If your SLO is 99.9% availability, you have an "error budget" of 0.1%:

99.9% uptime
0.1% budget
Under Budget
Deploy freely, experiment
Near Budget
Be careful, slow down
Exhausted Budget
Freeze deploys, fix reliability

Healthy Error Rate Targets

MetricGoodAcceptableCritical
5xx Rate< 0.01%< 0.1%> 1%
Timeout Rate< 0.1%< 0.5%> 2%
4xx Rate< 1%< 5%> 10%

5Putting It All Together

The Trade-offs

⚖️
Latency vs Throughput
Batching improves throughput but increases latency for individual requests
Example: Send 1 email immediately (low latency) vs batch 100 emails (high throughput)
⚖️
Availability vs Consistency
Replication for availability can lead to stale reads
Example: Read from replica (faster, always available) vs read from primary (consistent)
⚖️
Latency vs Cost
Lower latency often requires more resources
Example: Global CDN (low latency, high cost) vs single region (higher latency, lower cost)

Typical SLOs by Service Type

Service TypeLatency (p99)AvailabilityError Rate
User-facing API< 200ms99.9%< 0.1%
Payment Service< 500ms99.99%< 0.01%
Search Service< 100ms99.9%< 0.5%
Batch ProcessingN/A (throughput matters)99%< 1%

6Key Takeaways

1Latency = time for one request. Measure in percentiles (p50, p90, p99), not averages.
2Throughput = requests per second. Depends on CPU, connections, memory.
3Availability = uptime percentage. Redundancy + failover = more 9s.
4Error Rate = failures / total. 5xx = your problem, 4xx = usually client.
5These metrics trade off against each other. Optimize for what matters.
6Define SLOs (Service Level Objectives) for each metric.