Mission Compile - System Design Learning Platform

1The Library Analogy

Simple Analogy

Imagine a library with 1 million books:

Without Index (Card Catalog)

"Find Harry Potter" → Walk through every shelf, check every book. Could take hours.

With Index

Look up "Harry Potter" in catalog → "Shelf 42, Row 3" → Walk directly there. Takes seconds.

2What is an Index?

A database index is a data structure that improves the speed of data retrieval. It's a sorted copy of selected columns with pointers to the actual rows.

Visual: How an Index Works

users table (unordered)

row_id	name	email
1	Charlie	c@mail.com
2	Alice	a@mail.com
3	Bob	b@mail.com
4	Diana	d@mail.com

Index on name (sorted)

name	→ row_id
Alice	→ 2
Bob	→ 3
Charlie	→ 1
Diana	→ 4

Binary search on sorted index: O(log n)

3Types of Indexes

B-Tree Index

Most common type. Balanced tree structure. Good for equality and range queries.

Structure

50

25

75

10

30

60

90

Pros

+ Efficient for =, <, >, BETWEEN

+ Self-balancing

+ Good for most use cases

Cons

- Overhead on writes (tree rebalancing)

- Not ideal for full-text search

CREATE INDEX idx_user_name ON users(name);

Hash Index

Uses hash function. Extremely fast for exact match, useless for ranges.

Structure

key →

hash()

→

bucket 0

bucket 1 ←

bucket 2

Pros

+ O(1) for equality lookups

+ Very fast for exact match

Cons

- Cannot do range queries

- Cannot do ORDER BY

- Not supported in all DBs

CREATE INDEX idx_session USING HASH ON sessions(session_id);

Composite (Multi-column) Index

Index on multiple columns. Order matters! Left-to-right usage.

Structure

last_name	first_name	→ row
Adams	Alice	→ 5
Adams	Bob	→ 2
Baker	Carol	→ 1

Pros

+ Multiple columns in one index

+ Supports queries on leftmost columns

Cons

- Must query leftmost column first

- Larger index size

CREATE INDEX idx_name ON users(last_name, first_name);

Full-Text Index

For text search. Tokenizes words, supports stemming and relevance scoring.

Structure

"The quick brown fox" →

the

quick

brown

fox

Pros

+ Search within text content

+ Relevance ranking

+ Stemming (run → running)

Cons

- Large index size

- Complex to tune

CREATE FULLTEXT INDEX idx_content ON articles(title, body);

4Index Selection: What to Index

Good Candidates

Primary Key

Always indexed automatically

e.g., user_id

Foreign Keys

JOINs become much faster

e.g., order.user_id → users.id

WHERE clause columns

Filtering is common

e.g., WHERE status = 'active'

ORDER BY columns

Avoid sorting large datasets

e.g., ORDER BY created_at DESC

High cardinality

Many unique values

e.g., email, username

Poor Candidates

Low cardinality

Few unique values, index won't help

e.g., gender (M/F), boolean flags

Frequently updated

Index must update on every write

e.g., view_count, last_active

Small tables

Full scan is fast enough

e.g., < 1000 rows

Never queried

Wasted space and write overhead

e.g., internal_notes

5Index Usage: EXPLAIN

Use EXPLAIN to see if your query uses indexes:

EXPLAIN SELECT * FROM users WHERE email = 'alice@mail.com';

-- Good output (using index):
Index Scan using idx_users_email on users
  Index Cond: (email = 'alice@mail.com'::text)

-- Bad output (no index):
Seq Scan on users
  Filter: (email = 'alice@mail.com'::text)
  Rows Removed by Filter: 999999

Index Scan / Index Only Scan

Using index. Fast.

Seq Scan (Sequential Scan)

Reading every row. Slow for large tables.

Common Pitfall

Indexes won't be used if you apply functions: WHERE LOWER(email) = 'alice' won't use index on email. Create a functional index instead: CREATE INDEX ON users(LOWER(email))

6The Cost of Indexes

Indexes speed up reads but slow down writes. Every INSERT, UPDATE, DELETE must also update all indexes.

Trade-off Visualization

Read Performance

No idx

With idx

Faster ↑

Write Performance

No idx

1 idx

5 idx

Faster ↑

Rule of thumb: For read-heavy workloads (90% reads), add more indexes. For write-heavy workloads (lots of INSERTs), be conservative.

7Covering Index (Index-Only Scan)

A covering index includes all columns needed by a query, avoiding the table lookup entirely:

Non-Covering (2 lookups)

-- Index on (email)
SELECT name, email FROM users 
WHERE email = 'alice@mail.com';

1. Look up email in index → get row_id
2. Go to table → fetch name

Covering (1 lookup)

-- Index on (email, name) or INCLUDE
SELECT name, email FROM users 
WHERE email = 'alice@mail.com';

1. Look up email in index → name is there!
No table access needed

CREATE INDEX idx_email_name ON users(email) INCLUDE (name);

8Key Takeaways

1Index = sorted data structure with pointers. Turns O(n) scans into O(log n) lookups.

2B-Tree is most common. Good for =, <, >, BETWEEN, ORDER BY.

3Composite index order matters. (A, B) helps WHERE A=1 but not WHERE B=2 alone.

4Use EXPLAIN to verify index usage. Seq Scan = bad for large tables.

5Indexes slow down writes. Balance read vs write performance.

6In interviews: know when to add indexes, how to verify they're used, and trade-offs.

9Interview Follow-up Questions

Interview Follow-up Questions

Common follow-up questions interviewers ask

10Test Your Understanding

Test Your Understanding

5 questions

1

What is the primary benefit of a database index?

2

You have an index on (city, created_at). Which query will use this index efficiently?

3

What is a major downside of having too many indexes?

4

What does EXPLAIN ANALYZE show that EXPLAIN doesn't?

5

When would a hash index be preferred over a B-tree index?

0 of 5 answered

Database Indexing

1The Library Analogy

2What is an Index?

Visual: How an Index Works

3Types of Indexes

B-Tree Index

Hash Index

Composite (Multi-column) Index

Full-Text Index

4Index Selection: What to Index

Good Candidates

Poor Candidates

5Index Usage: EXPLAIN

6The Cost of Indexes

Trade-off Visualization

7Covering Index (Index-Only Scan)

8Key Takeaways

9Interview Follow-up Questions

Interview Follow-up Questions

10Test Your Understanding

Test Your Understanding