Vector Clocks
Tracking causality and detecting conflicts in distributed systems.
1The Email Thread Analogy
A vector clock is an array of counters, one per node. It captures the partial ordering of events and detects concurrent (conflicting) updates.
2Why Not Physical Timestamps?
Clock Skew
Different machines have different times. Up to seconds of drift.
NTP Jumps
Clock synchronization can jump backward. event2 < event1 timestamp.
Precision Limits
Two events in same millisecond: which came first?
Causality Lost
Timestamps don't capture 'happened-before' relationships.
3How Vector Clocks Work
Comparing vectors: VC1 < VC2 if all counters in VC1 <= VC2 and at least one is <. If neither < the other, they're concurrent (conflict!).
4Real-World Dry Run
Scenario: Alice and Bob edit a shared document
| Event | Alice VC | Bob VC |
|---|---|---|
| Initial | [A:0, B:0] | [A:0, B:0] |
| Alice edits | [A:1, B:0] | [A:0, B:0] |
| Bob edits (concurrent) | [A:1, B:0] | [A:0, B:1] |
| Sync: CONFLICT! | [A:1, B:0] vs [A:0, B:1] - neither dominates | |
| Merge + Alice edits | [A:2, B:1] | [A:1, B:1] |
Conflict Detection
[A:1, B:0] and [A:0, B:1] are concurrent. System must resolve: last-write-wins, merge, or ask user.
5In Practice
Amazon Dynamo
Uses vector clocks for conflict detection. Client resolves conflicts.
Riak
Version vectors (optimized vector clocks) for key versioning.
CRDTs
Build on vector clock concepts for automatic merge.
Git
Similar concept: commits form a DAG with merge commits.
6Key Takeaways
?Quiz
1. VC1=[A:2, B:1], VC2=[A:1, B:2]. Relationship?
2. Why not use physical timestamps for ordering?