Apache BookKeeper
Apache BookKeeper is a scalable, fault-tolerant, and low-latency storage service optimized for real-time workloads developed by the Apache Software Foundation. It provides a simple log-oriented storage abstraction called ledgers for reliable, replicated storage of sequential data. BookKeeper is used as the durable log storage layer in Apache Pulsar and other distributed messaging and stream processing systems. It provides a Java client API and an HTTP Admin REST API for cluster management, bookie monitoring, and auto-recovery operations.
APIs
Apache BookKeeper Admin API
The Apache BookKeeper HTTP Admin API provides REST endpoints for managing and monitoring BookKeeper clusters, bookies, ledgers, and auto-recovery operations. It enables programm...
Apache BookKeeper Java Client API
The BookKeeper Java client API provides programmatic access for creating, writing, reading, and managing ledgers. It supports both the legacy LedgerHandle API and the newer Ledg...
Capabilities
Bookkeeper Cluster Management
Workflow capability for managing and monitoring Apache BookKeeper clusters, including bookie health checks, ledger inspection, and auto-recovery operations.
Run with NaftikoFeatures
Append-only log segments called ledgers provide the foundational storage primitive for reliable sequential data storage.
Data is written to a configurable ensemble of bookies with write quorum and ack quorum parameters for fault tolerance.
Built-in under-replication detection and automatic ledger re-replication when bookie nodes fail.
RESTful HTTP Admin API for managing ledgers, bookies, cluster configuration, and triggering recovery operations.
Prometheus-format metrics endpoint for monitoring bookie performance and storage utilization.
ZooKeeper-based leader election for the auditor role responsible for detecting under-replicated ledgers.
Configurable garbage collection for reclaiming storage from deleted or expired ledger data.
Separate journal and ledger storage paths optimized for sequential write throughput and random read performance.
Use Cases
Serve as the replicated, durable write-ahead log for Apache Pulsar topics and distributed streaming systems.
Store distributed transaction log segments for systems requiring exactly-once semantics and durable commit records.
Persist metadata and configuration data for distributed systems requiring consistent, replicated storage.
Provide low-latency, high-throughput sequential storage for real-time stream processing pipelines.
Monitor and manage BookKeeper clusters using the HTTP Admin API for operational visibility and recovery.
Integrations
BookKeeper serves as the durable log storage layer for Apache Pulsar messaging topics.
ZooKeeper is used for bookie coordination, auditor election, and cluster metadata management.
BookKeeper can be used with Hadoop ecosystem tools for reliable log storage alongside HDFS.
BookKeeper exports Prometheus-format metrics for cluster monitoring and alerting.
Grafana dashboards consume BookKeeper Prometheus metrics for operational visibility.