Module 7 - Observability

Logging Best Practices

Good logs save you hours of debugging. Bad logs waste everyone's time.

1The Black Box Recorder Analogy

Simple Analogy

An airplane's black box records everything: altitude, speed, conversations. When something goes wrong, investigators replay the recording to understand what happened. Your logs are your application's black box-they should tell the complete story.

Logging is recording events, errors, and state changes in your application. Good logs enable debugging, auditing, and understanding system behavior.

2Log Levels

TRACEExtremely detailed. Every function entry/exit. Usually disabled in production.

DEBUGDetailed technical info for developers. Disabled in production.

INFONormal operations. User logged in, order placed, job completed.

WARNUnexpected but handled. Retry succeeded, deprecated API used.

ERRORSomething failed. Request failed, exception caught.

FATALApplication cannot continue. Database unreachable, out of memory.

3What Makes a Good Log

Bad Log

Error occurred

No context. What error? Where? When?

Good Log

[2024-01-15 10:23:45] [ERROR] [order-service] [req:abc123] Failed to process order ord_456 for user usr_789: PaymentDeclined - Insufficient funds

Essential Fields

Timestamp

When did it happen? Use ISO 8601 with timezone.

Level

Severity: INFO, WARN, ERROR, etc.

Service Name

Which service generated this log?

Request ID

Correlate logs across services.

User/Account ID

Who was affected?

Resource IDs

Order ID, transaction ID, etc.

4Structured Logging

Structured logging outputs logs as JSON instead of plain text. Makes parsing, searching, and alerting much easier.

Unstructured

2024-01-15 10:23:45 ERROR Failed to process order ord_456

Structured (JSON)

{"ts":"2024-01-15T10:23:45Z",
 "level":"error",
 "msg":"Failed to process order",
 "order_id":"ord_456",
 "user_id":"usr_789"}

5Common Mistakes

Logging Sensitive Data

Never log passwords, tokens, credit cards, PII. Mask or omit.

Too Much Logging

Logging every variable floods storage and hides important logs.

Too Little Context

'Error occurred' tells you nothing. Include IDs, state, reason.

Wrong Log Level

Don't use ERROR for expected conditions. Reserve for actual failures.

Missing Correlation ID

Without request IDs, you can't trace a request across services.

6Real-World Dry Run

Scenario: Debugging a failed order

User reports: Order didn't complete

Support gives you order_id: ord_789

Search logs: order_id=ord_789

Find all logs for this order across services

Follow the trail

API Gateway → Order Service → Payment Service → FAIL

Find the error

Payment Service: 'Card declined: insufficient_funds'

Root cause found

User's card was declined. Not a bug-expected behavior.

Good logs made this 5-minute investigation. Bad logs would make this hours of guessing.

7Key Takeaways

1Use log levels correctly: DEBUG for dev, INFO/WARN/ERROR for prod

2Include context: timestamp, request ID, user ID, resource IDs

3Structured logging (JSON) enables search and alerting

4Never log sensitive data: passwords, tokens, PII

5Correlation IDs let you trace requests across services

?Quiz

1. User login failed due to wrong password. Which log level?

2. What enables tracing a request across multiple microservices?