Module 1 - Data Storage

LSM Trees vs B-Trees

Two fundamental data structures that power modern databases.

1The Notebook Analogy

Simple Analogy
B-Tree (Organized Binder): Like a well-organized binder with tabs. Finding anything is fast-just flip to the right section. But inserting in the middle means rearranging pages.

LSM Tree (Append-Only Notebook): Just write notes at the end. Super fast to add new entries. But finding old notes requires checking multiple notebooks and merging them periodically.

2B-Trees: Read-Optimized

B-Tree: A self-balancing tree where each node can have multiple children. Used by most traditional databases (PostgreSQL, MySQL InnoDB, Oracle).

Fast Reads

O(log n) lookups. Data is always organized for quick access.

In-Place Updates

Updates modify data directly on disk.

Predictable Performance

Consistent latency for both reads and writes.

Space Efficient

No duplicate data, minimal storage overhead.

Downside: Random I/O for writes. Each write may touch multiple pages across disk.

3LSM Trees: Write-Optimized

LSM Tree (Log-Structured Merge Tree): Writes go to an in-memory buffer (memtable), then flush to sorted files on disk (SSTables). Used by Cassandra, RocksDB, LevelDB, InfluxDB.

Fast Writes

Sequential I/O-just append to memory and flush to disk.

High Throughput

Write-heavy workloads are significantly faster.

Compression

SSTables compress well, saving disk space.

Time-Series Friendly

Perfect for append-heavy workloads like logs and metrics.

Downside: Reads may check multiple levels. Write amplification from compaction.

4How LSM Works

1
Write to Memtable
In-memory sorted structure (often a red-black tree)
2
Flush to SSTable
When memtable is full, write sorted data to disk
3
Compaction
Background process merges SSTables, removes duplicates
4
Read Path
Check memtable, then each SSTable level (use bloom filters)

5Comparison

AspectB-TreeLSM Tree
WritesRandom I/OSequential I/O (faster)
ReadsSingle lookupMay check multiple levels
SpaceFixed overheadTemp duplicates, better compression
Best ForRead-heavy OLTPWrite-heavy, time-series
ExamplesPostgreSQL, MySQLCassandra, RocksDB

6Key Takeaways

1B-Trees optimize for reads with in-place updates
2LSM Trees optimize for writes with sequential I/O
3LSM uses compaction to merge sorted files in background
4Bloom filters help LSM avoid checking non-existent keys
5Choose based on your read/write ratio-most OLTP uses B-Tree

?Quiz

1. A logging system writes millions of events per second with rare reads. Best choice?

2. What structure does LSM use to avoid checking all SSTables on reads?