Apache BookKeeper logo

Apache BookKeeper

Apache BookKeeper is a scalable, fault-tolerant, and low-latency storage service optimized for real-time workloads developed by the Apache Software Foundation. It provides a simple log-oriented storage abstraction called ledgers for reliable, replicated storage of sequential data. BookKeeper is used as the durable log storage layer in Apache Pulsar and other distributed messaging and stream processing systems. It provides a Java client API and an HTTP Admin REST API for cluster management, bookie monitoring, and auto-recovery operations.

2 APIs 1 Capabilities 8 Features
ApacheDistributed SystemsLog StorageOpen SourceStorageStreaming

APIs

Apache BookKeeper Admin API

The Apache BookKeeper HTTP Admin API provides REST endpoints for managing and monitoring BookKeeper clusters, bookies, ledgers, and auto-recovery operations. It enables programm...

Apache BookKeeper Java Client API

The BookKeeper Java client API provides programmatic access for creating, writing, reading, and managing ledgers. It supports both the legacy LedgerHandle API and the newer Ledg...

Capabilities

Bookkeeper Cluster Management

Workflow capability for managing and monitoring Apache BookKeeper clusters, including bookie health checks, ledger inspection, and auto-recovery operations.

Run with Naftiko

Features

Ledger Storage

Append-only log segments called ledgers provide the foundational storage primitive for reliable sequential data storage.

Ensemble Replication

Data is written to a configurable ensemble of bookies with write quorum and ack quorum parameters for fault tolerance.

Auto-Recovery

Built-in under-replication detection and automatic ledger re-replication when bookie nodes fail.

HTTP Admin API

RESTful HTTP Admin API for managing ledgers, bookies, cluster configuration, and triggering recovery operations.

Metrics Export

Prometheus-format metrics endpoint for monitoring bookie performance and storage utilization.

Auditor Election

ZooKeeper-based leader election for the auditor role responsible for detecting under-replicated ledgers.

Garbage Collection

Configurable garbage collection for reclaiming storage from deleted or expired ledger data.

Journal and Ledger Storage

Separate journal and ledger storage paths optimized for sequential write throughput and random read performance.

Use Cases

Durable Log Storage

Serve as the replicated, durable write-ahead log for Apache Pulsar topics and distributed streaming systems.

Distributed Transaction Logs

Store distributed transaction log segments for systems requiring exactly-once semantics and durable commit records.

Metadata Store

Persist metadata and configuration data for distributed systems requiring consistent, replicated storage.

Stream Processing Storage

Provide low-latency, high-throughput sequential storage for real-time stream processing pipelines.

Cluster Administration

Monitor and manage BookKeeper clusters using the HTTP Admin API for operational visibility and recovery.

Integrations

Apache Pulsar

BookKeeper serves as the durable log storage layer for Apache Pulsar messaging topics.

Apache ZooKeeper

ZooKeeper is used for bookie coordination, auditor election, and cluster metadata management.

Apache Hadoop

BookKeeper can be used with Hadoop ecosystem tools for reliable log storage alongside HDFS.

Prometheus

BookKeeper exports Prometheus-format metrics for cluster monitoring and alerting.

Grafana

Grafana dashboards consume BookKeeper Prometheus metrics for operational visibility.

Semantic Vocabularies

Apache Bookkeeper Context

9 classes · 23 properties

JSON-LD

API Governance Rules

Apache BookKeeper API Rules

12 rules · 3 errors 8 warnings 1 info

SPECTRAL

Resources

👥
GitHubOrganization
GitHubOrganization
👥
GitHubRepository
GitHubRepository
🔗
Documentation
Documentation
🚀
GettingStarted
GettingStarted
💬
Support
Support
📜
TermsOfService
TermsOfService
📄
ChangeLog
ChangeLog
🔗
SpectralRules
SpectralRules
🔗
Vocabulary
Vocabulary
🔗
NaftikoCapability
NaftikoCapability