Module 1 - Data Storage

Choosing the Right Database

Decision framework for selecting databases based on your requirements.

1The Tool Analogy

💡 Simple Analogy
You wouldn't use a hammer to cut wood or a saw to drive nails. Different tools for different jobs.

Databases are the same: PostgreSQL for complex queries, Redis for caching,MongoDB for flexible documents, Cassandra for massive writes.

There's no "best" database-only the best for your specific use case.

2Decision Framework

Do you need ACID transactions?
Yes →SQL databases (PostgreSQL, MySQL)
No →Consider NoSQL for flexibility/scale
Is your schema well-defined and stable?
Yes →SQL databases
No →Document stores (MongoDB) for flexibility
Do you need complex queries with JOINs?
Yes →SQL databases
No →NoSQL can work with simple queries
Is write throughput critical (millions/sec)?
Yes →Wide-column (Cassandra) or Key-value (Redis)
No →Most databases will work
Do you need real-time, sub-millisecond reads?
Yes →In-memory (Redis, Memcached)
No →Disk-based databases are fine
Are you modeling relationships/graphs?
Yes →Graph database (Neo4j)
No →Relational or document stores

3Database by Use Case

E-commerce (orders, inventory)
PostgreSQL / MySQL
ACID for transactions, complex queries for reporting
Social media (posts, feeds)
PostgreSQL + Redis
SQL for relations, Redis for caching feeds
Real-time chat
Redis + PostgreSQL
Redis for presence/messages, SQL for history
IoT sensor data
TimescaleDB / InfluxDB
Time-series optimized, high write throughput
Content management
MongoDB
Flexible schema for varied content types
Session storage
Redis
Fast reads, built-in expiration
Search
Elasticsearch
Full-text search, relevance scoring
Recommendations
Neo4j + Redis
Graph for relationships, Redis for caching

4Quick Reference Table

DatabaseTypeBest ForAvoid When
PostgreSQLSQLComplex queries, JSON, GISNeed massive horizontal scale
MySQLSQLRead-heavy, web appsComplex JSON operations
MongoDBDocumentFlexible schema, rapid devComplex transactions
RedisKey-ValueCaching, sessions, queuesComplex queries, large data
CassandraWide-ColumnMassive writes, time-seriesAd-hoc queries, JOINs
DynamoDBKey-ValueServerless, auto-scaleComplex queries, cost-sensitive
Neo4jGraphRelationships, recommendationsSimple CRUD operations
ElasticsearchSearchFull-text search, logsPrimary data store

5Polyglot Persistence

Polyglot Persistence means using multiple databases in one system, each optimized for specific data types and access patterns.

Example: E-commerce Platform

PostgreSQL
Orders, users, products (ACID)
Redis
Sessions, cart, cache
Elasticsearch
Product search
S3
Product images
Complexity Trade-off

More databases = more operational complexity. Start simple (one SQL database), then add specialized databases as specific needs arise.

6Key Takeaways

1No "best" database-choose based on access patterns and requirements.
2Start with PostgreSQL for most applications-it handles 90% of cases.
3Add Redis when you need caching or fast ephemeral storage.
4Use specialized databases (Elasticsearch, Neo4j) for specific needs.
5Polyglot persistence: multiple databases for different data types.
6In interviews: explain WHY you chose a database, not just which one.