Module 7 - Observability

Logging Best Practices

Good logs save you hours of debugging. Bad logs waste everyone's time.

1The Black Box Recorder Analogy

Simple Analogy
An airplane's black box records everything: altitude, speed, conversations. When something goes wrong, investigators replay the recording to understand what happened. Your logs are your application's black box-they should tell the complete story.

Logging is recording events, errors, and state changes in your application. Good logs enable debugging, auditing, and understanding system behavior.

2Log Levels

TRACEExtremely detailed. Every function entry/exit. Usually disabled in production.
DEBUGDetailed technical info for developers. Disabled in production.
INFONormal operations. User logged in, order placed, job completed.
WARNUnexpected but handled. Retry succeeded, deprecated API used.
ERRORSomething failed. Request failed, exception caught.
FATALApplication cannot continue. Database unreachable, out of memory.

3What Makes a Good Log

Bad Log
Error occurred

No context. What error? Where? When?

Good Log
[2024-01-15 10:23:45] [ERROR] [order-service] [req:abc123] Failed to process order ord_456 for user usr_789: PaymentDeclined - Insufficient funds

Essential Fields

Timestamp

When did it happen? Use ISO 8601 with timezone.

Level

Severity: INFO, WARN, ERROR, etc.

Service Name

Which service generated this log?

Request ID

Correlate logs across services.

User/Account ID

Who was affected?

Resource IDs

Order ID, transaction ID, etc.

4Structured Logging

Structured logging outputs logs as JSON instead of plain text. Makes parsing, searching, and alerting much easier.

Unstructured
2024-01-15 10:23:45 ERROR Failed to process order ord_456
Structured (JSON)
{"ts":"2024-01-15T10:23:45Z", "level":"error", "msg":"Failed to process order", "order_id":"ord_456", "user_id":"usr_789"}

5Common Mistakes

Logging Sensitive Data

Never log passwords, tokens, credit cards, PII. Mask or omit.

Too Much Logging

Logging every variable floods storage and hides important logs.

Too Little Context

'Error occurred' tells you nothing. Include IDs, state, reason.

Wrong Log Level

Don't use ERROR for expected conditions. Reserve for actual failures.

Missing Correlation ID

Without request IDs, you can't trace a request across services.

6Real-World Dry Run

Scenario: Debugging a failed order

1
User reports: Order didn't complete
Support gives you order_id: ord_789
2
Search logs: order_id=ord_789
Find all logs for this order across services
3
Follow the trail
API Gateway → Order Service → Payment Service → FAIL
4
Find the error
Payment Service: 'Card declined: insufficient_funds'
5
Root cause found
User's card was declined. Not a bug-expected behavior.

Good logs made this 5-minute investigation. Bad logs would make this hours of guessing.

7Key Takeaways

1Use log levels correctly: DEBUG for dev, INFO/WARN/ERROR for prod
2Include context: timestamp, request ID, user ID, resource IDs
3Structured logging (JSON) enables search and alerting
4Never log sensitive data: passwords, tokens, PII
5Correlation IDs let you trace requests across services

?Quiz

1. User login failed due to wrong password. Which log level?

2. What enables tracing a request across multiple microservices?