Apache Flume logo

Apache Flume

Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log and event data. It provides a simple and flexible architecture based on streaming data flows with pluggable sources, channels, and sinks, plus a REST monitoring API for agent metrics.

2 APIs 1 Capabilities 9 Features
ApacheData CollectionETLLog AggregationOpen SourceStreaming

APIs

Apache Flume Monitoring API

REST API for monitoring Apache Flume agents, retrieving component metrics for sources, channels, and sinks, and accessing agent health information.

Apache Flume Java API

Java API for building custom Flume sources, channels, sinks, and interceptors. Provides interfaces for developing pluggable data ingestion components.

Capabilities

Apache Flume Log Collection

Capability for monitoring Apache Flume log collection agents — tracking source throughput, channel fill levels, and sink drain rates. Designed for data engineers and platform op...

Run with Naftiko

Features

Pluggable Sources

Extensible source architecture supporting Avro, Thrift, Exec, Taildir, Kafka, HTTP, Syslog, and custom sources.

Durable Channels

Multiple channel implementations including memory, file-backed, and Kafka-backed channels for different durability requirements.

Multi-Destination Sinks

Write events to HDFS, HBase, Solr, Elasticsearch, Kafka, and custom sink destinations.

Fan-In Consolidation

Aggregate events from multiple agent sources into a single destination for centralized log collection.

Fan-Out Distribution

Route events from a single source to multiple channel/sink combinations for parallel processing.

Interceptors

Event transformation interceptors for filtering, enrichment, and routing based on event content.

SSL/TLS Security

TLS encryption support across Avro, Thrift, Kafka, HTTP, and Syslog components.

Monitoring REST API

HTTP monitoring endpoint exposing source, channel, and sink metrics for agent health monitoring.

Multi-Hop Flows

Chain multiple Flume agents via Avro/Thrift RPC for tiered log aggregation architectures.

Use Cases

Centralized Log Aggregation

Collect application logs from hundreds of servers and aggregate them into HDFS, Kafka, or Elasticsearch.

Real-Time Log Tailing

Tail application log files in real time using Taildir source for immediate event processing.

Syslog Ingestion

Ingest RFC-3164 and RFC-5424 syslog events from network devices into centralized storage.

Kafka Event Ingestion

Bridge Kafka topics to HDFS or other storage for batch analytics on streaming event data.

Multi-Tier Architectures

Build tiered data collection with edge collectors forwarding to aggregation agents and final destinations.

Integrations

Apache Kafka

Kafka source and channel for consuming events, and Kafka sink for writing events to topics.

Apache HDFS

Primary sink for writing log data to Hadoop Distributed File System for batch analytics.

Apache HBase

HBase sink for writing events directly to HBase tables for random-access analytics.

Apache Solr

Solr sink for indexing log events for full-text search capabilities.

Elasticsearch

Elasticsearch sink for indexing and searching aggregated log data.

Semantic Vocabularies

Apache Flume Monitoring Context

2 classes · 18 properties

JSON-LD

API Governance Rules

Apache Flume API Rules

7 rules · 4 errors 2 warnings 1 info

SPECTRAL

Resources

🔗
Documentation
Documentation
🚀
GettingStarted
GettingStarted
👥
GitHubOrganization
GitHubOrganization
👥
GitHubRepository
GitHubRepository
🔗
SpectralRules
SpectralRules
🔗
Vocabulary
Vocabulary
🔗
NaftikoCapability
NaftikoCapability