Apache Samza logo

Apache Samza

Apache Samza is a distributed stream processing framework that provides a simple API for building stateful stream processing applications. It integrates with Apache Kafka for messaging and supports both stream and batch processing.

1 APIs 1 Capabilities 6 Features
Big DataHadoopKafkaStream ProcessingStreamingApacheOpen Source

APIs

Apache Samza

Samza provides a high-level Streams API and low-level Task API in Java/Scala for stateful stream processing, with a REST API for job management and integration with Kafka, HDFS,...

Capabilities

Features

Kafka Integration

Native Apache Kafka consumer/producer for stream processing

YARN Execution

Runs on Apache YARN for resource management and fault tolerance

Stateful Processing

Local state stores with RocksDB for low-latency stateful computations

Exactly-Once Processing

Transactional state stores for exactly-once semantics

Flexible Deployment

Run on YARN, Kubernetes, or standalone

High Level API

Fluent API and SQL support for stream transformations

Use Cases

Event Stream Processing

Real-time processing of Kafka event streams

Stateful Aggregations

Windowed aggregations over streaming data

Stream Joins

Join multiple Kafka streams for enrichment

Change Data Capture

Process CDC events from databases in real time

Integrations

Apache Kafka

Primary messaging system for Samza input and output streams

Apache YARN

Resource management and job scheduling on Hadoop

Apache Hadoop

HDFS integration for checkpoint storage

RocksDB

Embedded state store for local stateful processing

Semantic Vocabularies

Apache Samza Context

8 classes · 17 properties

JSON-LD

API Governance Rules

Apache Samza API Rules

6 rules · 4 errors 2 warnings

SPECTRAL

Resources

👥
GitHubOrganization
GitHubOrganization
🔗
Documentation
Documentation
🔗
SpectralRules
SpectralRules
🔗
Vocabulary
Vocabulary
🔗
NaftikoCapability
NaftikoCapability
🔗
JSON-LD
JSON-LD