Apache Flume
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log and event data. It provides a simple and flexible architecture based on streaming data flows with pluggable sources, channels, and sinks, plus a REST monitoring API for agent metrics.
APIs
Apache Flume Monitoring API
REST API for monitoring Apache Flume agents, retrieving component metrics for sources, channels, and sinks, and accessing agent health information.
Apache Flume Java API
Java API for building custom Flume sources, channels, sinks, and interceptors. Provides interfaces for developing pluggable data ingestion components.
Capabilities
Apache Flume Log Collection
Capability for monitoring Apache Flume log collection agents — tracking source throughput, channel fill levels, and sink drain rates. Designed for data engineers and platform op...
Run with NaftikoFeatures
Extensible source architecture supporting Avro, Thrift, Exec, Taildir, Kafka, HTTP, Syslog, and custom sources.
Multiple channel implementations including memory, file-backed, and Kafka-backed channels for different durability requirements.
Write events to HDFS, HBase, Solr, Elasticsearch, Kafka, and custom sink destinations.
Aggregate events from multiple agent sources into a single destination for centralized log collection.
Route events from a single source to multiple channel/sink combinations for parallel processing.
Event transformation interceptors for filtering, enrichment, and routing based on event content.
TLS encryption support across Avro, Thrift, Kafka, HTTP, and Syslog components.
HTTP monitoring endpoint exposing source, channel, and sink metrics for agent health monitoring.
Chain multiple Flume agents via Avro/Thrift RPC for tiered log aggregation architectures.
Use Cases
Collect application logs from hundreds of servers and aggregate them into HDFS, Kafka, or Elasticsearch.
Tail application log files in real time using Taildir source for immediate event processing.
Ingest RFC-3164 and RFC-5424 syslog events from network devices into centralized storage.
Bridge Kafka topics to HDFS or other storage for batch analytics on streaming event data.
Build tiered data collection with edge collectors forwarding to aggregation agents and final destinations.
Integrations
Kafka source and channel for consuming events, and Kafka sink for writing events to topics.
Primary sink for writing log data to Hadoop Distributed File System for batch analytics.
HBase sink for writing events directly to HBase tables for random-access analytics.
Solr sink for indexing log events for full-text search capabilities.
Elasticsearch sink for indexing and searching aggregated log data.