Module 1 — Data Storage
Choosing the Right Database
Decision framework for selecting databases based on your requirements.
1The Tool Analogy
💡 Simple Analogy
You wouldn't use a hammer to cut wood or a saw to drive nails. Different tools for different jobs.
Databases are the same: PostgreSQL for complex queries, Redis for caching,MongoDB for flexible documents, Cassandra for massive writes.
There's no "best" database—only the best for your specific use case.
Databases are the same: PostgreSQL for complex queries, Redis for caching,MongoDB for flexible documents, Cassandra for massive writes.
There's no "best" database—only the best for your specific use case.
2Decision Framework
Do you need ACID transactions?
Yes →SQL databases (PostgreSQL, MySQL)
No →Consider NoSQL for flexibility/scale
Is your schema well-defined and stable?
Yes →SQL databases
No →Document stores (MongoDB) for flexibility
Do you need complex queries with JOINs?
Yes →SQL databases
No →NoSQL can work with simple queries
Is write throughput critical (millions/sec)?
Yes →Wide-column (Cassandra) or Key-value (Redis)
No →Most databases will work
Do you need real-time, sub-millisecond reads?
Yes →In-memory (Redis, Memcached)
No →Disk-based databases are fine
Are you modeling relationships/graphs?
Yes →Graph database (Neo4j)
No →Relational or document stores
3Database by Use Case
E-commerce (orders, inventory)
PostgreSQL / MySQL
ACID for transactions, complex queries for reporting
Social media (posts, feeds)
PostgreSQL + Redis
SQL for relations, Redis for caching feeds
Real-time chat
Redis + PostgreSQL
Redis for presence/messages, SQL for history
IoT sensor data
TimescaleDB / InfluxDB
Time-series optimized, high write throughput
Content management
MongoDB
Flexible schema for varied content types
Session storage
Redis
Fast reads, built-in expiration
Search
Elasticsearch
Full-text search, relevance scoring
Recommendations
Neo4j + Redis
Graph for relationships, Redis for caching
4Quick Reference Table
| Database | Type | Best For | Avoid When |
|---|---|---|---|
| PostgreSQL | SQL | Complex queries, JSON, GIS | Need massive horizontal scale |
| MySQL | SQL | Read-heavy, web apps | Complex JSON operations |
| MongoDB | Document | Flexible schema, rapid dev | Complex transactions |
| Redis | Key-Value | Caching, sessions, queues | Complex queries, large data |
| Cassandra | Wide-Column | Massive writes, time-series | Ad-hoc queries, JOINs |
| DynamoDB | Key-Value | Serverless, auto-scale | Complex queries, cost-sensitive |
| Neo4j | Graph | Relationships, recommendations | Simple CRUD operations |
| Elasticsearch | Search | Full-text search, logs | Primary data store |
5Polyglot Persistence
Polyglot Persistence means using multiple databases in one system, each optimized for specific data types and access patterns.
Example: E-commerce Platform
PostgreSQL
Orders, users, products (ACID)
Redis
Sessions, cart, cache
Elasticsearch
Product search
S3
Product images
Complexity Trade-off
More databases = more operational complexity. Start simple (one SQL database), then add specialized databases as specific needs arise.
6Key Takeaways
1No "best" database—choose based on access patterns and requirements.
2Start with PostgreSQL for most applications—it handles 90% of cases.
3Add Redis when you need caching or fast ephemeral storage.
4Use specialized databases (Elasticsearch, Neo4j) for specific needs.
5Polyglot persistence: multiple databases for different data types.
6In interviews: explain WHY you chose a database, not just which one.