[Avg. reading time: 8 minutes]

Batch vs Streaming

Batch Processing

Batch means collect first, process later.

  • Works on large chunks of accumulated data
  • High throughput, cheaper, simpler
  • Results are not real-time
  • Typically minutes, hours, or days delayed

Examples:

  • Daily or weekly sales reports
  • End-of-day stock portfolio reconciliation
  • Monthly billing cycles
  • ETL pipelines that refresh a data warehouse

Use cases

  • Data does not need to be acted on immediately
  • A few minutes or hours of delay is acceptable
  • You’re cleaning, transforming, aggregating large datasets

Stream Processing

Streaming means process events the moment they arrive.

  • Low-latency (milliseconds to seconds)
  • Continuous, event-by-event processing
  • Ideal for real-time analytics and alerting
  • Stateful systems maintain event history or running context

Examples:

  • Stock price updates
  • Fraud detection for credit cards
  • Real-time gaming leaderboards
  • IoT sensor monitoring

Use cases

  • You need instant reactions
  • Delays cause risk, loss, or bad UX

Micro Batch

Micro-batching groups incoming events into tiny batches and processes each mini-batch as a unit, giving near real-time outputs without true event-by-event streaming.

Micro-batch is not full streaming, and not full batch.

It’s a hybrid model where data is processed in very small batches at very short intervals (usually 100 ms to a few seconds).

Batch processing, but done so frequently that it feels like streaming.

Example: Realtime vs Microbatch

Credit Card Fraud Detection (Realtime)

Fraud scoring must be event-by-event or at worst sub-second.

  • The bank must decide immediately: approve or decline
  • Customer is standing at a checkout counter
  • Delay = blocked transaction or fraud slipping through
  • Regulatory requirements often demand immediate response

Credit Card Payment Posting (Micro Batch)

When a customer makes a payment toward their balance (online, app, ACH, etc), updating the backend systems does not require millisecond consistency.

Even if the balance updates with a 1-minute delay:

  • No fraud risk
  • No UX problem
  • No operational impact
                 +------------------------------+
                 |         STREAMING            |
                 | Event → Process → Output     |
                 | Latency: milliseconds        |
                 +------------------------------+

                 +------------------------------+
                 |        MICRO-BATCH           |
                 | Tiny windows → Process       |
                 | Latency: 0.5–10 seconds      |
                 +------------------------------+

                 +------------------------------+
                 |            BATCH             |
                 | Accumulate → Process         |
                 | Latency: minutes–hours       |
                 +------------------------------+

Why Redis Pub/Sub is NOT real streaming

Redis Pub/Sub is often mistaken as “streaming”, but:

  • Messages are not persisted
  • No replay capability
  • No consumer groups
  • No fault tolerance
  • If a subscriber goes offline — the message is gone

Use cases

  • Lightweight notifications
  • Chat-like message passing
  • Ephemeral real-time signals

Not suitable

  • Analytics
  • Compliance or auditing
  • Durable event logs
  • Replaying data
  • Multi-consumer systems

#batch #streaming #pubsubVer 5.5.9

Last change: 2025-12-03