Module 0 - Core Concepts

Core Performance Metrics

The key numbers that define system health: Latency, Throughput, Availability, and Error Rate.

1Latency

Latency is the time between when a request is sent and when a response is received. Lower is better. Users notice anything above 100ms.

Simple Analogy

Like ordering food: latency is the time from placing your order until food arrives at your table. It includes: waiter walking to kitchen + cooking time + waiter bringing food back.

Types of Latency

Network Latency

Time for data to travel across the network

• Physical distance

• Number of hops

• Network congestion

• Bandwidth

Example: NYC → London: ~70ms minimum (speed of light)

Application Latency

Time spent processing in your code

• Algorithm complexity

• CPU speed

• Memory access

• Code efficiency

Example: JSON parsing, business logic, serialization

Database Latency

Time to execute queries and return data

• Query complexity

• Index usage

• Data volume

• Connection pool

Example: Simple lookup: 1-5ms, Complex join: 50-500ms

Queue/IO Latency

Time waiting in queues or for I/O operations

• Queue depth

• Disk speed

• Thread pool size

• Contention

Example: Disk read: 1-10ms (SSD), 10-100ms (HDD)

Latency Breakdown: Typical Web Request

DNS Lookup

Domain → IP address

1-50ms

TCP Handshake

3-way handshake

10-100ms

TLS Handshake

SSL certificate exchange

30-100ms

Request Transfer

Send HTTP request

5-50ms

Server Processing

Your application code

10-500ms

Response Transfer

Return response

10-100ms

Total: 66ms - 900ms typical (varies widely based on distance and complexity)

Measuring Latency: Percentiles Matter

Average latency hides problems. Use percentiles:

p50

Median - half are faster

45ms

p90

90% are faster

120ms

p99

99% are faster

450ms

p99.9

99.9% are faster

1200ms

Why p99 Matters

At 1M requests/day, p99 = 450ms means 10,000 users experience 450ms+ latency daily. For Amazon, every 100ms latency costs 1% in sales. High percentiles = real user pain.

How to Reduce Latency

Caching

Store computed results

10x-100x improvement (Redis: 0.5ms vs DB: 50ms)

CDN

Serve content closer to users

50-200ms saved (Edge servers worldwide)

Connection Pooling

Reuse database connections

20-50ms saved per request (Avoid TCP handshake)

Async Processing

Don't wait for slow operations

Remove from critical path (Queue emails, notifications)

Database Indexing

Speed up queries

10x-1000x faster queries (B-tree index on search columns)

Compression

Reduce data transfer size

2x-10x less data (gzip responses)

2Throughput

Throughput is the number of requests a system can handle per unit of time. Higher is better. Measured in requests per second (RPS) or transactions per second (TPS).

Simple Analogy

Like a highway: throughput is how many cars pass through per hour. More lanes = higher throughput. Accidents (errors) = lower throughput.

Throughput vs Latency

Latency

How fast is ONE request?

50ms per request

Throughput

How MANY requests per second?

1,000 RPS

Key Insight: They're Related but Different

• Low latency doesn't guarantee high throughput (single-threaded server)
• High throughput can increase latency (queuing under load)
• Optimize for what matters most for your use case

Factors Affecting Throughput

CPU Cores

More cores = more parallel processing

4 cores ≈ 4x throughput (ideal)

Thread/Connection Pool Size

More concurrent handlers = more simultaneous requests

Pool of 100 = handle 100 concurrent requests

App

→

Database Connections

DB is often the bottleneck

100 DB connections = max 100 concurrent queries

Memory

Caching and object allocation

More memory = more cache hits = higher throughput

Real-World Throughput Numbers

System	Throughput	Notes
Single Node.js	1K-10K RPS	Event loop, single thread
Single Go Server	10K-100K RPS	Goroutines, multi-core
Redis	100K+ RPS	In-memory, simple ops
PostgreSQL	1K-50K QPS	Depends on query complexity
Kafka	1M+ messages/sec	Per broker, sequential I/O

3Availability

Availability is the percentage of time a system is operational and accessible. Expressed as "nines": 99.9% = "three nines".

Simple Analogy

Like a store's opening hours: if it's supposed to be open 24/7 but closes for 1 hour/week for cleaning, that's 99.4% availability. Customers arriving during that hour = downtime.

The Nines Table

Availability	Downtime/Year	Downtime/Month	Typical Use
99% (two 9s)	3.65 days	7.3 hours	Internal tools, batch jobs
99.9% (three 9s)	8.76 hours	43.8 minutes	Standard SaaS products
99.99% (four 9s)	52.6 minutes	4.4 minutes	E-commerce, financial
99.999% (five 9s)	5.26 minutes	26 seconds	Critical infrastructure

Calculating System Availability

Components in Series (all must work):

LB: 99.99%

App: 99.9%

DB: 99.95%

99.84%

System availability = product of all component availabilities

Components in Parallel (any can work):

Server 1: 99%

Server 2: 99%

= 1 - (0.01 × 0.01) =

99.99%

Redundancy dramatically increases availability

How to Achieve High Availability

Redundancy

Multiple instances of every component

3 app servers, 2 DB replicas

Load Balancing

Distribute traffic, detect failures

Route around failed servers

Health Checks

Continuously monitor component health

Every 10s: /health endpoint

Auto-failover

Automatically switch to backup

DB primary fails → replica promoted

Multi-region

Survive entire datacenter failures

US-East + US-West + EU

Graceful Degradation

Partial functionality > total failure

Cache fails → still serve from DB

4Error Rate

Error Rate is the percentage of requests that result in errors. Lower is better. Includes both client errors (4xx) and server errors (5xx).

Types of Errors

4xx Client Errors

400 Bad Request - malformed input

401 Unauthorized - not logged in

403 Forbidden - no permission

404 Not Found - resource doesn't exist

429 Too Many Requests - rate limited

Usually client's fault. Monitor but don't alert heavily.

5xx Server Errors

500 Internal Server Error - bug/crash

502 Bad Gateway - upstream failed

503 Service Unavailable - overloaded

504 Gateway Timeout - upstream slow

Your fault. Alert immediately. Debug urgently.

Error Budget Concept

If your SLO is 99.9% availability, you have an "error budget" of 0.1%:

99.9% uptime

0.1% budget

Under Budget

Deploy freely, experiment

Near Budget

Be careful, slow down

Exhausted Budget

Freeze deploys, fix reliability

Healthy Error Rate Targets

Metric	Good	Acceptable	Critical
5xx Rate	< 0.01%	< 0.1%	> 1%
Timeout Rate	< 0.1%	< 0.5%	> 2%
4xx Rate	< 1%	< 5%	> 10%

5Putting It All Together

The Trade-offs

⚖️

Latency vs Throughput

Batching improves throughput but increases latency for individual requests

Example: Send 1 email immediately (low latency) vs batch 100 emails (high throughput)

⚖️

Availability vs Consistency

Replication for availability can lead to stale reads

Example: Read from replica (faster, always available) vs read from primary (consistent)

⚖️

Latency vs Cost

Lower latency often requires more resources

Example: Global CDN (low latency, high cost) vs single region (higher latency, lower cost)

Typical SLOs by Service Type

Service Type	Latency (p99)	Availability	Error Rate
User-facing API	< 200ms	99.9%	< 0.1%
Payment Service	< 500ms	99.99%	< 0.01%
Search Service	< 100ms	99.9%	< 0.5%
Batch Processing	N/A (throughput matters)	99%	< 1%

6Try It: Latency vs Throughput Simulator

Use this highway analogy to understand how latency and throughput interact. Cars represent requests, lanes represent servers. Add lanes (horizontal scaling) or increase speed (vertical scaling) to see the effects.

🚗 Latency vs Throughput Simulator

Highway analogy: Cars = Requests, Lanes = Servers

Lanes (Servers)

↑ Lanes = ↑ Throughput (Horizontal scaling)

Car Speed (CPU Power)50%

↑ Speed = ↓ Latency (Vertical scaling)

Traffic (Request Load)30%

↑ Traffic can cause congestion

START

END

Click Start to begin simulation

40ms

Avg Latency

Throughput/min

Completed

In Transit

Key Insights

+More lanes = More throughput (horizontal scaling)

+Faster cars = Lower latency AND more throughput

!Too much traffic = Congestion = Latency spikes

×Lanes ≠ Lower latency - latency is about speed

7Key Takeaways

1Latency = time for one request. Measure in percentiles (p50, p90, p99), not averages.

2Throughput = requests per second. Depends on CPU, connections, memory.

3Availability = uptime percentage. Redundancy + failover = more 9s.

4Error Rate = failures / total. 5xx = your problem, 4xx = usually client.

5These metrics trade off against each other. Optimize for what matters.

6Define SLOs (Service Level Objectives) for each metric.

8Interview Follow-up Questions

Interview Follow-up Questions

Common follow-up questions interviewers ask

8Test Your Understanding

Test Your Understanding

5 questions

Your service has an average latency of 100ms and P99 latency of 800ms. What does this tell you?

Which formula correctly represents throughput?

A service has 99.9% availability. How much downtime is allowed per month?

Which is NOT typically included in error rate calculations?

Batching database writes typically:

0 of 5 answered