Apache Flume logo

Apache Flume

Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log and event data. It provides a simple and flexible architecture based on streaming data flows with pluggable sources, channels, and sinks, plus a REST monitoring API for agent metrics.

2 APIs 1 Capabilities 9 Features 46.1 / 100 developing
ApacheData CollectionETLLog AggregationOpen SourceStreaming

API Rating

46.1/ 100
developing
Scored 2026-05-20 · rubric v0.3
Discoverability80.0
Contract Quality63.2
Governance47.4
Operational Transparency36.8
Developer Ergonomics19.6
Commercial Clarity39.5

APIs

Apache Flume Monitoring API

REST API for monitoring Apache Flume agents, retrieving component metrics for sources, channels, and sinks, and accessing agent health information.

Apache Flume Java API

Java API for building custom Flume sources, channels, sinks, and interceptors. Provides interfaces for developing pluggable data ingestion components.

Capabilities

Apache Flume Monitoring API — Monitoring

Apache Flume Monitoring API — Monitoring. 1 operations. Lead operation: Apache Flume Get Agent Metrics. Self-contained Naftiko capability covering one Apache Flume business surf...

Run with Naftiko

Features

Pluggable Sources

Extensible source architecture supporting Avro, Thrift, Exec, Taildir, Kafka, HTTP, Syslog, and custom sources.

Durable Channels

Multiple channel implementations including memory, file-backed, and Kafka-backed channels for different durability requirements.

Multi-Destination Sinks

Write events to HDFS, HBase, Solr, Elasticsearch, Kafka, and custom sink destinations.

Fan-In Consolidation

Aggregate events from multiple agent sources into a single destination for centralized log collection.

Fan-Out Distribution

Route events from a single source to multiple channel/sink combinations for parallel processing.

Interceptors

Event transformation interceptors for filtering, enrichment, and routing based on event content.

SSL/TLS Security

TLS encryption support across Avro, Thrift, Kafka, HTTP, and Syslog components.

Monitoring REST API

HTTP monitoring endpoint exposing source, channel, and sink metrics for agent health monitoring.

Multi-Hop Flows

Chain multiple Flume agents via Avro/Thrift RPC for tiered log aggregation architectures.

Use Cases

Centralized Log Aggregation

Collect application logs from hundreds of servers and aggregate them into HDFS, Kafka, or Elasticsearch.

Real-Time Log Tailing

Tail application log files in real time using Taildir source for immediate event processing.

Syslog Ingestion

Ingest RFC-3164 and RFC-5424 syslog events from network devices into centralized storage.

Kafka Event Ingestion

Bridge Kafka topics to HDFS or other storage for batch analytics on streaming event data.

Multi-Tier Architectures

Build tiered data collection with edge collectors forwarding to aggregation agents and final destinations.

Integrations

Apache Kafka

Kafka source and channel for consuming events, and Kafka sink for writing events to topics.

Apache HDFS

Primary sink for writing log data to Hadoop Distributed File System for batch analytics.

Apache HBase

HBase sink for writing events directly to HBase tables for random-access analytics.

Apache Solr

Solr sink for indexing log events for full-text search capabilities.

Elasticsearch

Elasticsearch sink for indexing and searching aggregated log data.

Semantic Vocabularies

Apache Flume Monitoring Context

2 classes · 18 properties

JSON-LD

API Governance Rules

Apache Flume API Rules

7 rules · 4 errors 2 warnings 1 info

SPECTRAL

Resources

🔗
Documentation
Documentation
🚀
GettingStarted
GettingStarted
👥
GitHubOrganization
GitHubOrganization
👥
GitHubRepository
GitHubRepository
🔗
SpectralRules
SpectralRules
🔗
Vocabulary
Vocabulary

Sources

Raw ↑
aid: apache-flume
name: Apache Flume
description: Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving
  large amounts of log and event data. It provides a simple and flexible architecture based on streaming data flows with pluggable
  sources, channels, and sinks, plus a REST monitoring API for agent metrics.
type: Index
position: Consumer
access: 3rd-Party
image: https://kinlane-productions2.s3.amazonaws.com/apis-json/apis-json-logo.jpg
tags:
- Apache
- Data Collection
- ETL
- Log Aggregation
- Open Source
- Streaming
created: '2026-03-16'
modified: '2026-05-19'
url: https://raw.githubusercontent.com/api-evangelist/apache-flume/refs/heads/main/apis.yml
specificationVersion: '0.19'
apis:
- aid: apache-flume:apache-flume-monitoring-api
  name: Apache Flume Monitoring API
  description: REST API for monitoring Apache Flume agents, retrieving component metrics for sources, channels, and sinks,
    and accessing agent health information.
  humanURL: https://flume.apache.org/FlumeUserGuide.html
  baseURL: http://localhost:41414
  tags:
  - Monitoring
  - Metrics
  - REST API
  properties:
  - type: Documentation
    url: https://flume.apache.org/FlumeUserGuide.html
  - type: OpenAPI
    url: openapi/apache-flume-monitoring-openapi.yml
  - type: JSONSchema
    url: json-schema/flume-monitoring-agent-metrics-schema.json
  - type: JSONLD
    url: json-ld/apache-flume-monitoring-context.jsonld
  - type: NaftikoCapability
    url: capabilities/monitoring-monitoring.yaml
- aid: apache-flume:apache-flume-java-api
  name: Apache Flume Java API
  description: Java API for building custom Flume sources, channels, sinks, and interceptors. Provides interfaces for developing
    pluggable data ingestion components.
  humanURL: https://flume.apache.org/FlumeDeveloperGuide.html
  tags:
  - Java
  - SDK
  - Extension
  properties:
  - type: Documentation
    url: https://flume.apache.org/FlumeDeveloperGuide.html
  - type: SDK
    url: https://search.maven.org/artifact/org.apache.flume/flume-ng-core
    title: Java SDK (Maven Central)
common:
- type: Documentation
  url: https://flume.apache.org/documentation.html
- type: GettingStarted
  url: https://flume.apache.org/FlumeUserGuide.html
- type: GitHubOrganization
  url: https://github.com/apache
- type: GitHubRepository
  url: https://github.com/apache/flume
- type: SpectralRules
  url: rules/apache-flume-spectral-rules.yml
- type: Vocabulary
  url: vocabulary/apache-flume-vocabulary.yaml
- type: Features
  data:
  - name: Pluggable Sources
    description: Extensible source architecture supporting Avro, Thrift, Exec, Taildir, Kafka, HTTP, Syslog, and custom sources.
  - name: Durable Channels
    description: Multiple channel implementations including memory, file-backed, and Kafka-backed channels for different durability
      requirements.
  - name: Multi-Destination Sinks
    description: Write events to HDFS, HBase, Solr, Elasticsearch, Kafka, and custom sink destinations.
  - name: Fan-In Consolidation
    description: Aggregate events from multiple agent sources into a single destination for centralized log collection.
  - name: Fan-Out Distribution
    description: Route events from a single source to multiple channel/sink combinations for parallel processing.
  - name: Interceptors
    description: Event transformation interceptors for filtering, enrichment, and routing based on event content.
  - name: SSL/TLS Security
    description: TLS encryption support across Avro, Thrift, Kafka, HTTP, and Syslog components.
  - name: Monitoring REST API
    description: HTTP monitoring endpoint exposing source, channel, and sink metrics for agent health monitoring.
  - name: Multi-Hop Flows
    description: Chain multiple Flume agents via Avro/Thrift RPC for tiered log aggregation architectures.
- type: UseCases
  data:
  - name: Centralized Log Aggregation
    description: Collect application logs from hundreds of servers and aggregate them into HDFS, Kafka, or Elasticsearch.
  - name: Real-Time Log Tailing
    description: Tail application log files in real time using Taildir source for immediate event processing.
  - name: Syslog Ingestion
    description: Ingest RFC-3164 and RFC-5424 syslog events from network devices into centralized storage.
  - name: Kafka Event Ingestion
    description: Bridge Kafka topics to HDFS or other storage for batch analytics on streaming event data.
  - name: Multi-Tier Architectures
    description: Build tiered data collection with edge collectors forwarding to aggregation agents and final destinations.
- type: Integrations
  data:
  - name: Apache Kafka
    description: Kafka source and channel for consuming events, and Kafka sink for writing events to topics.
  - name: Apache HDFS
    description: Primary sink for writing log data to Hadoop Distributed File System for batch analytics.
  - name: Apache HBase
    description: HBase sink for writing events directly to HBase tables for random-access analytics.
  - name: Apache Solr
    description: Solr sink for indexing log events for full-text search capabilities.
  - name: Elasticsearch
    description: Elasticsearch sink for indexing and searching aggregated log data.
maintainers:
- FN: Kin Lane
  email: [email protected]