Module 3 - Async Processing

Apache Kafka Deep Dive

The distributed event streaming platform powering real-time data pipelines.

1What is Kafka?

Apache Kafka is a distributed event streaming platform. It's a commit log that stores events durably, allows multiple consumers to read, and handles millions of events per second.

1M+ msg/s
Throughput
< 10ms
Latency
Days to Forever
Retention

2Core Concepts

TopicNamed stream of events. Like a table in a database.
PartitionTopic is split into partitions for parallelism. Each partition is ordered.
OffsetPosition of a message in a partition. Consumers track their offset.
ProducerPublishes messages to topics.
ConsumerReads messages from topics.
Consumer GroupGroup of consumers that share partition assignment. Each partition → one consumer.
BrokerKafka server. A cluster has multiple brokers.

3How It Works

Data Flow

ProducerTopic (Partitions)Consumer Group
1
Producer sends message
Message goes to a partition based on key (or round-robin)
2
Broker stores message
Appended to partition log, replicated to followers
3
Consumer polls
Consumer fetches messages from its assigned partitions
4
Offset committed
Consumer saves its position to resume after restart

4Key Features

Durability

Messages persisted to disk, replicated across brokers

Ordering

Guaranteed order within a partition (not across)

Replay

Consumers can rewind and reprocess old messages

Scalability

Add partitions and brokers for horizontal scaling

High Throughput

Sequential disk I/O, zero-copy, batching

Exactly-Once

Idempotent producers + transactional consumers

5When to Use Kafka

Event Sourcing

Store all events, derive state from event log

Log Aggregation

Collect logs from many services

Stream Processing

Real-time transformations with Kafka Streams

Data Integration

Connect systems with Kafka Connect

6Key Takeaways

1Kafka is a distributed commit log for event streaming
2Topics split into partitions for parallelism
3Ordering guaranteed within a partition, not across
4Consumer groups enable parallel consumption
5Messages are persisted and replayable

?Quiz

1. How many consumers in a group can read from one partition?

2. Kafka guarantees message ordering: