Module 0 — Core Concepts
Core Performance Metrics
The key numbers that define system health: Latency, Throughput, Availability, and Error Rate.
1Latency
Latency is the time between when a request is sent and when a response is received. Lower is better. Users notice anything above 100ms.
Simple Analogy
Like ordering food: latency is the time from placing your order until food arrives at your table. It includes: waiter walking to kitchen + cooking time + waiter bringing food back.
Types of Latency
🌐Network Latency
Time for data to travel across the network
• Physical distance
• Number of hops
• Network congestion
• Bandwidth
Example: NYC → London: ~70ms minimum (speed of light)
⚙️Application Latency
Time spent processing in your code
• Algorithm complexity
• CPU speed
• Memory access
• Code efficiency
Example: JSON parsing, business logic, serialization
🗄️Database Latency
Time to execute queries and return data
• Query complexity
• Index usage
• Data volume
• Connection pool
Example: Simple lookup: 1-5ms, Complex join: 50-500ms
📥Queue/IO Latency
Time waiting in queues or for I/O operations
• Queue depth
• Disk speed
• Thread pool size
• Contention
Example: Disk read: 1-10ms (SSD), 10-100ms (HDD)
Latency Breakdown: Typical Web Request
DNS Lookup
Domain → IP address
1-50ms
TCP Handshake
3-way handshake
10-100ms
TLS Handshake
SSL certificate exchange
30-100ms
Request Transfer
Send HTTP request
5-50ms
Server Processing
Your application code
10-500ms
Response Transfer
Return response
10-100ms
Total: 66ms - 900ms typical (varies widely based on distance and complexity)
Measuring Latency: Percentiles Matter
Average latency hides problems. Use percentiles:
p50
Median - half are faster
45ms
p90
90% are faster
120ms
p99
99% are faster
450ms
p99.9
99.9% are faster
1200ms
Why p99 Matters
At 1M requests/day, p99 = 450ms means 10,000 users experience 450ms+ latency daily. For Amazon, every 100ms latency costs 1% in sales. High percentiles = real user pain.
How to Reduce Latency
Caching
Store computed results
10x-100x improvement (Redis: 0.5ms vs DB: 50ms)
CDN
Serve content closer to users
50-200ms saved (Edge servers worldwide)
Connection Pooling
Reuse database connections
20-50ms saved per request (Avoid TCP handshake)
Async Processing
Don't wait for slow operations
Remove from critical path (Queue emails, notifications)
Database Indexing
Speed up queries
10x-1000x faster queries (B-tree index on search columns)
Compression
Reduce data transfer size
2x-10x less data (gzip responses)
2Throughput
Throughput is the number of requests a system can handle per unit of time. Higher is better. Measured in requests per second (RPS) or transactions per second (TPS).
Simple Analogy
Like a highway: throughput is how many cars pass through per hour. More lanes = higher throughput. Accidents (errors) = lower throughput.
Throughput vs Latency
⏱️
Latency
How fast is ONE request?
50ms per request
📊
Throughput
How MANY requests per second?
1,000 RPS
Key Insight: They're Related but Different
• Low latency doesn't guarantee high throughput (single-threaded server)
• High throughput can increase latency (queuing under load)
• Optimize for what matters most for your use case
• High throughput can increase latency (queuing under load)
• Optimize for what matters most for your use case
Factors Affecting Throughput
1
2
3
4
CPU Cores
More cores = more parallel processing
4 cores ≈ 4x throughput (ideal)
Thread/Connection Pool Size
More concurrent handlers = more simultaneous requests
Pool of 100 = handle 100 concurrent requests
App
→DB
Database Connections
DB is often the bottleneck
100 DB connections = max 100 concurrent queries
Memory
Caching and object allocation
More memory = more cache hits = higher throughput
Real-World Throughput Numbers
| System | Throughput | Notes |
|---|---|---|
| Single Node.js | 1K-10K RPS | Event loop, single thread |
| Single Go Server | 10K-100K RPS | Goroutines, multi-core |
| Redis | 100K+ RPS | In-memory, simple ops |
| PostgreSQL | 1K-50K QPS | Depends on query complexity |
| Kafka | 1M+ messages/sec | Per broker, sequential I/O |
3Availability
Availability is the percentage of time a system is operational and accessible. Expressed as "nines": 99.9% = "three nines".
Simple Analogy
Like a store's opening hours: if it's supposed to be open 24/7 but closes for 1 hour/week for cleaning, that's 99.4% availability. Customers arriving during that hour = downtime.
The Nines Table
| Availability | Downtime/Year | Downtime/Month | Typical Use |
|---|---|---|---|
| 99% (two 9s) | 3.65 days | 7.3 hours | Internal tools, batch jobs |
| 99.9% (three 9s) | 8.76 hours | 43.8 minutes | Standard SaaS products |
| 99.99% (four 9s) | 52.6 minutes | 4.4 minutes | E-commerce, financial |
| 99.999% (five 9s) | 5.26 minutes | 26 seconds | Critical infrastructure |
Calculating System Availability
Components in Series (all must work):
LB: 99.99%
×App: 99.9%
×DB: 99.95%
=99.84%
System availability = product of all component availabilities
Components in Parallel (any can work):
Server 1: 99%
Server 2: 99%
99.99%
Redundancy dramatically increases availability
How to Achieve High Availability
Redundancy
Multiple instances of every component
3 app servers, 2 DB replicas
Load Balancing
Distribute traffic, detect failures
Route around failed servers
Health Checks
Continuously monitor component health
Every 10s: /health endpoint
Auto-failover
Automatically switch to backup
DB primary fails → replica promoted
Multi-region
Survive entire datacenter failures
US-East + US-West + EU
Graceful Degradation
Partial functionality > total failure
Cache fails → still serve from DB
4Error Rate
Error Rate is the percentage of requests that result in errors. Lower is better. Includes both client errors (4xx) and server errors (5xx).
Types of Errors
4xx Client Errors
400 Bad Request - malformed input401 Unauthorized - not logged in403 Forbidden - no permission404 Not Found - resource doesn't exist429 Too Many Requests - rate limitedUsually client's fault. Monitor but don't alert heavily.
5xx Server Errors
500 Internal Server Error - bug/crash502 Bad Gateway - upstream failed503 Service Unavailable - overloaded504 Gateway Timeout - upstream slowYour fault. Alert immediately. Debug urgently.
Error Budget Concept
If your SLO is 99.9% availability, you have an "error budget" of 0.1%:
99.9% uptime
0.1% budget
Under Budget
Deploy freely, experiment
Near Budget
Be careful, slow down
Exhausted Budget
Freeze deploys, fix reliability
Healthy Error Rate Targets
| Metric | Good | Acceptable | Critical |
|---|---|---|---|
| 5xx Rate | < 0.01% | < 0.1% | > 1% |
| Timeout Rate | < 0.1% | < 0.5% | > 2% |
| 4xx Rate | < 1% | < 5% | > 10% |
5Putting It All Together
The Trade-offs
⚖️
Latency vs Throughput
Batching improves throughput but increases latency for individual requests
Example: Send 1 email immediately (low latency) vs batch 100 emails (high throughput)
⚖️
Availability vs Consistency
Replication for availability can lead to stale reads
Example: Read from replica (faster, always available) vs read from primary (consistent)
⚖️
Latency vs Cost
Lower latency often requires more resources
Example: Global CDN (low latency, high cost) vs single region (higher latency, lower cost)
Typical SLOs by Service Type
| Service Type | Latency (p99) | Availability | Error Rate |
|---|---|---|---|
| User-facing API | < 200ms | 99.9% | < 0.1% |
| Payment Service | < 500ms | 99.99% | < 0.01% |
| Search Service | < 100ms | 99.9% | < 0.5% |
| Batch Processing | N/A (throughput matters) | 99% | < 1% |
6Key Takeaways
1Latency = time for one request. Measure in percentiles (p50, p90, p99), not averages.
2Throughput = requests per second. Depends on CPU, connections, memory.
3Availability = uptime percentage. Redundancy + failover = more 9s.
4Error Rate = failures / total. 5xx = your problem, 4xx = usually client.
5These metrics trade off against each other. Optimize for what matters.
6Define SLOs (Service Level Objectives) for each metric.