Apache Storm
Apache Storm is a free and open-source distributed real-time computation system that makes it easy to reliably process unbounded streams of data at scale. It provides a simple programming model (topologies with spouts and bolts), guaranteed message processing, horizontal scalability, and fault tolerance. Storm integrates with queuing and database technologies including Apache Kafka and Apache Cassandra and is governed by the Apache Software Foundation.
APIs
Apache Storm REST API
The Storm UI REST API provides HTTP endpoints for monitoring and managing Storm clusters, topologies, and components. It exposes cluster summary, topology listing, topology deta...
Apache Storm Topology API
The Storm Topology API provides Java and other language bindings for building real-time processing topologies composed of spouts (data sources) and bolts (processing units). It ...
Features
At-least-once processing guarantees through ack/fail tracking mechanism.
Horizontally scalable stream processing topologies with configurable parallelism.
High-level micro-batch processing abstraction with stateful streaming and exactly-once semantics.
Distributed Remote Procedure Calls for synchronous distributed computation.
Tumbling and sliding window processing over bounded time or count windows.
Topology components written in Java, Python, Ruby, and other languages via Multilang protocol.
Use Cases
Continuous computation over live event streams for operational dashboards.
Real-time data transformation and enrichment pipelines.
Online scoring of ML models against streaming feature data.
Low-latency fraud detection rules applied to transaction streams.
Integrations
Kafka Spout for consuming messages from Kafka topics as Storm data sources.
CassandraBolt for writing processed stream data to Cassandra.
HiveBolt for streaming inserts into Apache Hive tables.
Redis integration for stateful lookups and caching in Storm bolts.
ElasticsearchBolt for indexing stream data into Elasticsearch.