Module 1 — Data Storage
Database Indexing
Make your queries 100x faster by telling the database where to look.
1The Library Analogy
Simple Analogy
Imagine a library with 1 million books:
Without Index (Card Catalog)
"Find Harry Potter" → Walk through every shelf, check every book. Could take hours.
With Index
Look up "Harry Potter" in catalog → "Shelf 42, Row 3" → Walk directly there. Takes seconds.
2What is an Index?
A database index is a data structure that improves the speed of data retrieval. It's a sorted copy of selected columns with pointers to the actual rows.
Visual: How an Index Works
users table (unordered)
| row_id | name | |
|---|---|---|
| 1 | Charlie | c@mail.com |
| 2 | Alice | a@mail.com |
| 3 | Bob | b@mail.com |
| 4 | Diana | d@mail.com |
Index on name (sorted)
| name | → row_id |
|---|---|
| Alice | → 2 |
| Bob | → 3 |
| Charlie | → 1 |
| Diana | → 4 |
Binary search on sorted index: O(log n)
3Types of Indexes
B-Tree Index
Most common type. Balanced tree structure. Good for equality and range queries.
Structure
50
25
75
10
30
60
90
Pros
+ Efficient for =, <, >, BETWEEN
+ Self-balancing
+ Good for most use cases
Cons
- Overhead on writes (tree rebalancing)
- Not ideal for full-text search
CREATE INDEX idx_user_name ON users(name);Hash Index
Uses hash function. Extremely fast for exact match, useless for ranges.
Structure
key →
hash()
→
bucket 0
bucket 1 ←
bucket 2
Pros
+ O(1) for equality lookups
+ Very fast for exact match
Cons
- Cannot do range queries
- Cannot do ORDER BY
- Not supported in all DBs
CREATE INDEX idx_session USING HASH ON sessions(session_id);Composite (Multi-column) Index
Index on multiple columns. Order matters! Left-to-right usage.
Structure
| last_name | first_name | → row |
|---|---|---|
| Adams | Alice | → 5 |
| Adams | Bob | → 2 |
| Baker | Carol | → 1 |
Pros
+ Multiple columns in one index
+ Supports queries on leftmost columns
Cons
- Must query leftmost column first
- Larger index size
CREATE INDEX idx_name ON users(last_name, first_name);Full-Text Index
For text search. Tokenizes words, supports stemming and relevance scoring.
Structure
"The quick brown fox" →
the
quick
brown
fox
Pros
+ Search within text content
+ Relevance ranking
+ Stemming (run → running)
Cons
- Large index size
- Complex to tune
CREATE FULLTEXT INDEX idx_content ON articles(title, body);4Index Selection: What to Index
Good Candidates
Primary Key
Always indexed automatically
e.g., user_id
Foreign Keys
JOINs become much faster
e.g., order.user_id → users.id
WHERE clause columns
Filtering is common
e.g., WHERE status = 'active'
ORDER BY columns
Avoid sorting large datasets
e.g., ORDER BY created_at DESC
High cardinality
Many unique values
e.g., email, username
Poor Candidates
Low cardinality
Few unique values, index won't help
e.g., gender (M/F), boolean flags
Frequently updated
Index must update on every write
e.g., view_count, last_active
Small tables
Full scan is fast enough
e.g., < 1000 rows
Never queried
Wasted space and write overhead
e.g., internal_notes
5Index Usage: EXPLAIN
Use EXPLAIN to see if your query uses indexes:
EXPLAIN SELECT * FROM users WHERE email = 'alice@mail.com'; -- Good output (using index): Index Scan using idx_users_email on users Index Cond: (email = 'alice@mail.com'::text) -- Bad output (no index): Seq Scan on users Filter: (email = 'alice@mail.com'::text) Rows Removed by Filter: 999999
Index Scan / Index Only Scan
Using index. Fast.
Seq Scan (Sequential Scan)
Reading every row. Slow for large tables.
Common Pitfall
Indexes won't be used if you apply functions: WHERE LOWER(email) = 'alice' won't use index on email. Create a functional index instead: CREATE INDEX ON users(LOWER(email))
6The Cost of Indexes
Indexes speed up reads but slow down writes. Every INSERT, UPDATE, DELETE must also update all indexes.
Trade-off Visualization
Read Performance
No idx
With idx
Faster ↑
Write Performance
No idx
1 idx
5 idx
Faster ↑
Rule of thumb: For read-heavy workloads (90% reads), add more indexes. For write-heavy workloads (lots of INSERTs), be conservative.
7Covering Index (Index-Only Scan)
A covering index includes all columns needed by a query, avoiding the table lookup entirely:
Non-Covering (2 lookups)
-- Index on (email) SELECT name, email FROM users WHERE email = 'alice@mail.com'; 1. Look up email in index → get row_id 2. Go to table → fetch name
Covering (1 lookup)
-- Index on (email, name) or INCLUDE SELECT name, email FROM users WHERE email = 'alice@mail.com'; 1. Look up email in index → name is there! No table access needed
CREATE INDEX idx_email_name ON users(email) INCLUDE (name);8Key Takeaways
1Index = sorted data structure with pointers. Turns O(n) scans into O(log n) lookups.
2B-Tree is most common. Good for =, <, >, BETWEEN, ORDER BY.
3Composite index order matters. (A, B) helps WHERE A=1 but not WHERE B=2 alone.
4Use EXPLAIN to verify index usage. Seq Scan = bad for large tables.
5Indexes slow down writes. Balance read vs write performance.
6In interviews: know when to add indexes, how to verify they're used, and trade-offs.