Module 1 - Data Storage
Choosing the Right Database
Decision framework for selecting databases based on your requirements.
1The Tool Analogy
💡 Simple Analogy
You wouldn't use a hammer to cut wood or a saw to drive nails. Different tools for different jobs.
Databases are the same: PostgreSQL for complex queries, Redis for caching,MongoDB for flexible documents, Cassandra for massive writes.
There's no "best" database-only the best for your specific use case.
Databases are the same: PostgreSQL for complex queries, Redis for caching,MongoDB for flexible documents, Cassandra for massive writes.
There's no "best" database-only the best for your specific use case.
2Decision Framework
Do you need ACID transactions?
Yes →SQL databases (PostgreSQL, MySQL)
No →Consider NoSQL for flexibility/scale
Is your schema well-defined and stable?
Yes →SQL databases
No →Document stores (MongoDB) for flexibility
Do you need complex queries with JOINs?
Yes →SQL databases
No →NoSQL can work with simple queries
Is write throughput critical (millions/sec)?
Yes →Wide-column (Cassandra) or Key-value (Redis)
No →Most databases will work
Do you need real-time, sub-millisecond reads?
Yes →In-memory (Redis, Memcached)
No →Disk-based databases are fine
Are you modeling relationships/graphs?
Yes →Graph database (Neo4j)
No →Relational or document stores
3Database by Use Case
E-commerce (orders, inventory)
PostgreSQL / MySQL
ACID for transactions, complex queries for reporting
Social media (posts, feeds)
PostgreSQL + Redis
SQL for relations, Redis for caching feeds
Real-time chat
Redis + PostgreSQL
Redis for presence/messages, SQL for history
IoT sensor data
TimescaleDB / InfluxDB
Time-series optimized, high write throughput
Content management
MongoDB
Flexible schema for varied content types
Session storage
Redis
Fast reads, built-in expiration
Search
Elasticsearch
Full-text search, relevance scoring
Recommendations
Neo4j + Redis
Graph for relationships, Redis for caching
4Quick Reference Table
| Database | Type | Best For | Avoid When |
|---|---|---|---|
| PostgreSQL | SQL | Complex queries, JSON, GIS | Need massive horizontal scale |
| MySQL | SQL | Read-heavy, web apps | Complex JSON operations |
| MongoDB | Document | Flexible schema, rapid dev | Complex transactions |
| Redis | Key-Value | Caching, sessions, queues | Complex queries, large data |
| Cassandra | Wide-Column | Massive writes, time-series | Ad-hoc queries, JOINs |
| DynamoDB | Key-Value | Serverless, auto-scale | Complex queries, cost-sensitive |
| Neo4j | Graph | Relationships, recommendations | Simple CRUD operations |
| Elasticsearch | Search | Full-text search, logs | Primary data store |
5Polyglot Persistence
Polyglot Persistence means using multiple databases in one system, each optimized for specific data types and access patterns.
Example: E-commerce Platform
PostgreSQL
Orders, users, products (ACID)
Redis
Sessions, cart, cache
Elasticsearch
Product search
S3
Product images
Complexity Trade-off
More databases = more operational complexity. Start simple (one SQL database), then add specialized databases as specific needs arise.
6Key Takeaways
1No "best" database-choose based on access patterns and requirements.
2Start with PostgreSQL for most applications-it handles 90% of cases.
3Add Redis when you need caching or fast ephemeral storage.
4Use specialized databases (Elasticsearch, Neo4j) for specific needs.
5Polyglot persistence: multiple databases for different data types.
6In interviews: explain WHY you chose a database, not just which one.