[Avg. reading time: 5 minutes]
Types of Streaming
Stateless Streaming
- Processes each record independently
- No memory of previous events
- Simple transformations and filtering
- Highly scalable
Examples of Stateless
- Unit conversion (Celsius to Fahrenheit) for each reading
- Data validation (checking if temperature is within realistic range)
- Simple transformations (rounding values)
- Filtering (removing invalid readings)
- Basic alerting (if current temperature exceeds threshold)
Use Cases:
- You only need to process current readings
- Simple transformations are sufficient
- Horizontal scaling is important
- Memory resources are limited
Stateful Streaming:
- Maintains state across events
- Enables complex processing like windowing and aggregations
- Requires state management strategies
- Good for pattern detection and trend analysis
Examples of Stateful
- Calculating moving averages of temperature
- Detecting temperature trends over time
- Computing daily min/max temperatures
- Identifying temperature patterns
- Calculating rate of temperature change
- Detecting anomalies based on historical patterns
- Unusual suspicious financial activity
Use Cases:
- You need historical context
- Analyzing patterns or trends
- Computing moving averages
- Detecting anomalies
- Time-window based analysis is required
Different Ingestion Services
Stream Processing Frameworks:
Structured Streaming (Databricks/Apache Spark)
A processing framework for handling streaming data Part of Apache Spark ecosystem
Message Brokers/Event Streaming Platforms:
Apache Kafka (Open Source)
- Distributed event streaming platform
- Self-managed
Amazon MSK
- Managed Kafka service
- AWS managed version of Kafka
Amazon Kinesis
- AWS native streaming service
- Different from Kafka-based solutions
Azure Event Hubs
- Cloud-native event streaming service
- Azure’s equivalent to Kafka