Module 7 — Observability
Logging Best Practices
Good logs are your first line of defense when debugging production issues.
1The Flight Recorder Analogy
💡 Simple Analogy
A plane's black box records everything—not just crashes. When something goes wrong, investigators can replay events leading up to it.
Your logs are the black box for your system. When production breaks at 3 AM, good logs tell you what happened.
Your logs are the black box for your system. When production breaks at 3 AM, good logs tell you what happened.
2Log Levels
ERROR
Something failed that shouldn't. Needs attention. Page someone.
Example: Database connection failed, payment processing error
WARN
Unexpected but handled. Might become error if continues.
Example: High memory usage, slow query, retry succeeded
INFO
Important business events. What happened, not how.
Example: User logged in, order placed, deployment started
DEBUG
Detailed diagnostic info. Disabled in production usually.
Example: Request payload, function entry/exit, variable values
TRACE
Very detailed. For deep debugging only.
Example: Every iteration in a loop, every cache check
3What to Log
Do Log
- • Requests: method, path, status, duration
- • Business events: order placed, user registered
- • Errors with stack traces and context
- • External service calls and responses
- • State changes and important decisions
Don't Log
- • Passwords, tokens, API keys
- • Personal data (PII): SSN, credit cards
- • Every loop iteration
- • Successful health checks
- • Sensitive business data
4Structured Logging
Structured logging outputs logs in a parseable format (JSON) instead of plain text. This enables powerful querying and analysis.
Unstructured (Bad)
User john logged in from 192.168.1.1 at 2024-01-15 10:30:00Structured (Good)
{"event":"login","user":"john","ip":"192.168.1.1","time":"..."}5Log Aggregation
With many servers, you need centralized logging:
| Tool | Type | Best For |
|---|---|---|
| ELK Stack | Self-hosted | Full control, powerful search |
| Datadog | SaaS | Full observability platform |
| Splunk | Enterprise | Large scale, compliance |
| CloudWatch | AWS | AWS native integration |
6Key Takeaways
1Use appropriate log levels: ERROR for failures, INFO for events, DEBUG for diagnostics.
2Structured logging (JSON) enables powerful search and analysis.
3Include context: request ID, user ID, timestamps.
4Never log secrets: passwords, tokens, PII.
5Use centralized logging (ELK, Datadog) for multi-server systems.
6Good logs are essential for debugging production issues.