Module 1 - Data Storage

LSM Trees vs B-Trees

Two fundamental data structures that power modern databases.

1The Notebook Analogy

Simple Analogy

B-Tree (Organized Binder): Like a well-organized binder with tabs. Finding anything is fast-just flip to the right section. But inserting in the middle means rearranging pages.

LSM Tree (Append-Only Notebook): Just write notes at the end. Super fast to add new entries. But finding old notes requires checking multiple notebooks and merging them periodically.

2B-Trees: Read-Optimized

B-Tree: A self-balancing tree where each node can have multiple children. Used by most traditional databases (PostgreSQL, MySQL InnoDB, Oracle).

Fast Reads

O(log n) lookups. Data is always organized for quick access.

In-Place Updates

Updates modify data directly on disk.

Predictable Performance

Consistent latency for both reads and writes.

Space Efficient

No duplicate data, minimal storage overhead.

Downside: Random I/O for writes. Each write may touch multiple pages across disk.

3LSM Trees: Write-Optimized

LSM Tree (Log-Structured Merge Tree): Writes go to an in-memory buffer (memtable), then flush to sorted files on disk (SSTables). Used by Cassandra, RocksDB, LevelDB, InfluxDB.

Fast Writes

Sequential I/O-just append to memory and flush to disk.

High Throughput

Write-heavy workloads are significantly faster.

Compression

SSTables compress well, saving disk space.

Time-Series Friendly

Perfect for append-heavy workloads like logs and metrics.

Downside: Reads may check multiple levels. Write amplification from compaction.

4How LSM Works

Write to Memtable

In-memory sorted structure (often a red-black tree)

Flush to SSTable

When memtable is full, write sorted data to disk

Compaction

Background process merges SSTables, removes duplicates

Read Path

Check memtable, then each SSTable level (use bloom filters)

5Comparison

Aspect	B-Tree	LSM Tree
Writes	Random I/O	Sequential I/O (faster)
Reads	Single lookup	May check multiple levels
Space	Fixed overhead	Temp duplicates, better compression
Best For	Read-heavy OLTP	Write-heavy, time-series
Examples	PostgreSQL, MySQL	Cassandra, RocksDB

6Key Takeaways

1B-Trees optimize for reads with in-place updates

2LSM Trees optimize for writes with sequential I/O

3LSM uses compaction to merge sorted files in background

4Bloom filters help LSM avoid checking non-existent keys

5Choose based on your read/write ratio-most OLTP uses B-Tree

?Quiz

1. A logging system writes millions of events per second with rare reads. Best choice?

2. What structure does LSM use to avoid checking all SSTables on reads?