Apache Samza
Apache Samza is a distributed stream processing framework that provides a simple API for building stateful stream processing applications. It integrates with Apache Kafka for messaging and supports both stream and batch processing.
APIs
Apache Samza
Samza provides a high-level Streams API and low-level Task API in Java/Scala for stateful stream processing, with a REST API for job management and integration with Kafka, HDFS,...
Capabilities
Features
Native Apache Kafka consumer/producer for stream processing
Runs on Apache YARN for resource management and fault tolerance
Local state stores with RocksDB for low-latency stateful computations
Transactional state stores for exactly-once semantics
Run on YARN, Kubernetes, or standalone
Fluent API and SQL support for stream transformations
Use Cases
Real-time processing of Kafka event streams
Windowed aggregations over streaming data
Join multiple Kafka streams for enrichment
Process CDC events from databases in real time
Integrations
Primary messaging system for Samza input and output streams
Resource management and job scheduling on Hadoop
HDFS integration for checkpoint storage
Embedded state store for local stateful processing