Module 1 — Data Storage

Database Indexing

Make your queries 100x faster by telling the database where to look.

1The Library Analogy

Simple Analogy
Imagine a library with 1 million books:
Without Index (Card Catalog)
"Find Harry Potter" → Walk through every shelf, check every book. Could take hours.
With Index
Look up "Harry Potter" in catalog → "Shelf 42, Row 3" → Walk directly there. Takes seconds.

2What is an Index?

A database index is a data structure that improves the speed of data retrieval. It's a sorted copy of selected columns with pointers to the actual rows.

Visual: How an Index Works

users table (unordered)
row_idnameemail
1Charliec@mail.com
2Alicea@mail.com
3Bobb@mail.com
4Dianad@mail.com
Index on name (sorted)
name→ row_id
Alice→ 2
Bob→ 3
Charlie→ 1
Diana→ 4
Binary search on sorted index: O(log n)

3Types of Indexes

B-Tree Index

Most common type. Balanced tree structure. Good for equality and range queries.

Structure
50
25
75
10
30
60
90
Pros
+ Efficient for =, <, >, BETWEEN
+ Self-balancing
+ Good for most use cases
Cons
- Overhead on writes (tree rebalancing)
- Not ideal for full-text search
CREATE INDEX idx_user_name ON users(name);

Hash Index

Uses hash function. Extremely fast for exact match, useless for ranges.

Structure
key →
hash()
bucket 0
bucket 1 ←
bucket 2
Pros
+ O(1) for equality lookups
+ Very fast for exact match
Cons
- Cannot do range queries
- Cannot do ORDER BY
- Not supported in all DBs
CREATE INDEX idx_session USING HASH ON sessions(session_id);

Composite (Multi-column) Index

Index on multiple columns. Order matters! Left-to-right usage.

Structure
last_namefirst_name→ row
AdamsAlice→ 5
AdamsBob→ 2
BakerCarol→ 1
Pros
+ Multiple columns in one index
+ Supports queries on leftmost columns
Cons
- Must query leftmost column first
- Larger index size
CREATE INDEX idx_name ON users(last_name, first_name);

Full-Text Index

For text search. Tokenizes words, supports stemming and relevance scoring.

Structure
"The quick brown fox" →
the
quick
brown
fox
Pros
+ Search within text content
+ Relevance ranking
+ Stemming (run → running)
Cons
- Large index size
- Complex to tune
CREATE FULLTEXT INDEX idx_content ON articles(title, body);

4Index Selection: What to Index

Good Candidates

Primary Key
Always indexed automatically
e.g., user_id
Foreign Keys
JOINs become much faster
e.g., order.user_id → users.id
WHERE clause columns
Filtering is common
e.g., WHERE status = 'active'
ORDER BY columns
Avoid sorting large datasets
e.g., ORDER BY created_at DESC
High cardinality
Many unique values
e.g., email, username

Poor Candidates

Low cardinality
Few unique values, index won't help
e.g., gender (M/F), boolean flags
Frequently updated
Index must update on every write
e.g., view_count, last_active
Small tables
Full scan is fast enough
e.g., < 1000 rows
Never queried
Wasted space and write overhead
e.g., internal_notes

5Index Usage: EXPLAIN

Use EXPLAIN to see if your query uses indexes:

EXPLAIN SELECT * FROM users WHERE email = 'alice@mail.com';

-- Good output (using index):
Index Scan using idx_users_email on users
  Index Cond: (email = 'alice@mail.com'::text)

-- Bad output (no index):
Seq Scan on users
  Filter: (email = 'alice@mail.com'::text)
  Rows Removed by Filter: 999999
Index Scan / Index Only Scan
Using index. Fast.
Seq Scan (Sequential Scan)
Reading every row. Slow for large tables.
Common Pitfall

Indexes won't be used if you apply functions: WHERE LOWER(email) = 'alice' won't use index on email. Create a functional index instead: CREATE INDEX ON users(LOWER(email))

6The Cost of Indexes

Indexes speed up reads but slow down writes. Every INSERT, UPDATE, DELETE must also update all indexes.

Trade-off Visualization

Read Performance
No idx
With idx
Faster ↑
Write Performance
No idx
1 idx
5 idx
Faster ↑
Rule of thumb: For read-heavy workloads (90% reads), add more indexes. For write-heavy workloads (lots of INSERTs), be conservative.

7Covering Index (Index-Only Scan)

A covering index includes all columns needed by a query, avoiding the table lookup entirely:

Non-Covering (2 lookups)
-- Index on (email)
SELECT name, email FROM users 
WHERE email = 'alice@mail.com';

1. Look up email in index → get row_id
2. Go to table → fetch name
Covering (1 lookup)
-- Index on (email, name) or INCLUDE
SELECT name, email FROM users 
WHERE email = 'alice@mail.com';

1. Look up email in index → name is there!
No table access needed
CREATE INDEX idx_email_name ON users(email) INCLUDE (name);

8Key Takeaways

1Index = sorted data structure with pointers. Turns O(n) scans into O(log n) lookups.
2B-Tree is most common. Good for =, <, >, BETWEEN, ORDER BY.
3Composite index order matters. (A, B) helps WHERE A=1 but not WHERE B=2 alone.
4Use EXPLAIN to verify index usage. Seq Scan = bad for large tables.
5Indexes slow down writes. Balance read vs write performance.
6In interviews: know when to add indexes, how to verify they're used, and trade-offs.