Module 3 — Asynchronous Processing

Stream Processing Basics

Process continuous data flows in real-time instead of waiting to batch process later.

1The Assembly Line vs Batch Factory

Simple Analogy
Imagine making cars:

Batch Processing: Wait until you have 1000 car orders, then build them all at once. Fast per-unit but customers wait days.

Stream Processing: Build each car as orders come in, continuously. Customers get cars faster, and you can adapt to changes in real-time.

Stream Processing analyzes and transforms data continuously as it arrives, rather than storing it first and processing in batches. Enables real-time insights and reactions.

2Batch vs Stream Processing

AspectBatch ProcessingStream Processing
DataBounded (finite)Unbounded (continuous)
LatencyMinutes to hoursMilliseconds to seconds
ProcessingAll at onceRecord by record
Use CaseReports, ETL, ML trainingAlerts, dashboards, fraud detection
ExampleDaily sales reportReal-time stock ticker

3Stream Processing Demo

Watch events being processed in real-time. Events are filtered, transformed, and aggregated as they flow.

Real-Time Event Stream
0
Events Processed
0
Purchases (filtered)
$0
Avg Value (aggregated)

4Core Concepts

Event/RecordA single piece of data in the stream. Has a timestamp and payload.
StreamContinuous, unbounded sequence of events over time.
WindowGroup events by time (tumbling, sliding, session) for aggregations.
StateData kept between events (counters, aggregations). Must handle failures.
WatermarkMarker indicating all events up to a certain time have arrived.

5Windowing Strategies

How to group continuous events for aggregations like counts, sums, averages:

Tumbling Window

Window 1
Window 2
Window 3

Fixed-size, non-overlapping. Events belong to exactly one window.

Sliding Window

Win 1
Win 2
Win 3

Fixed-size, overlapping. Events can belong to multiple windows.

Session Window

Session 1
gap
Session 2
gap
Session 3

Dynamic size based on activity gaps. Good for user sessions.

6Common Operations

Filter

Keep only events matching a condition

events.filter(e => e.type === 'purchase')
Map

Transform each event

events.map(e => ({ ...e, total: e.price * e.qty }))
Aggregate

Combine events in a window

window.reduce((sum, e) => sum + e.amount, 0)
Join

Combine two streams by key

orders.join(users).on('userId')
Group By

Partition stream by key

events.groupBy(e => e.region)
Deduplicate

Remove duplicate events

events.distinctBy(e => e.eventId)

7Tools & Technologies

ToolTypeBest For
Apache KafkaEvent Streaming PlatformHigh-throughput, durable storage
Apache FlinkStream ProcessorComplex event processing, SQL
Apache Spark StreamingMicro-batchBatch + stream unified API
AWS KinesisManaged StreamingAWS integration, managed
Kafka StreamsLibraryJVM apps, embedded processing

8Real-World Use Cases

Fraud Detection

Analyze transactions in real-time, flag suspicious patterns instantly

Latency: < 100ms
Real-Time Analytics

Live dashboards showing current metrics, user activity, sales

Latency: < 1s
IoT Monitoring

Process sensor data continuously, trigger alerts on anomalies

Latency: < 500ms
Log Processing

Aggregate logs, detect errors, alert on-call immediately

Latency: < 5s

9Key Takeaways

1Stream processing handles data as it arrives—real-time, not batch.
2Unbounded data: streams never end, unlike batch files.
3Windows group events for aggregations: tumbling, sliding, session.
4State management is critical—must handle failures and recovery.
5Kafka for streaming platform, Flink for complex processing.
6Use stream processing when latency matters: fraud, alerts, dashboards.